Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Amit Kapila
On Sat, Jan 18, 2014 at 2:48 AM, Andrew Dunstan  wrote:
>
> On 01/17/2014 03:15 PM, Tom Lane wrote:
>
>> The other possibility I was contemplating is that "export a const
>> variable" doesn't actually work for some reason.  We're not in the habit
>> of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
>> it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
>> I'd rather avoid the technique altogether.

Is there any specific reason why adding PGDLLIMPORT for exporting
const variables is not good, when we are already doing for other variables
like InterruptHoldoffCount, CritSectionCount?

On searching in code, I found that for few const variables we do
extern PGDLLIMPORT. For example:
extern PGDLLIMPORT const int NumScanKeywords;
extern PGDLLIMPORT const char *debug_query_string;

>> The least-unlike-other-Postgres-code approach would be to go ahead and
>> export the struct so that the size computation could be provided as a
>> #define in the same header.  Robert stated a couple days ago that he
>> didn't foresee much churn in this struct, so that doesn't seem
>> unacceptable.
>>
>> Another possibility is to refactor so that testing an allocation request
>> against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
>> not some random code in a contrib module.  It's not immediately apparent
>> to me why it's good code modularization to have a contrib module
>> responsible for checking sizes based on the sizeof a struct it's not
>> supposed to have any access to.
>
> Or maybe we could expose the value via a function rather than a const
> variable.

All of above suggested alternatives will fix this problem. However after
fixing this problem, when I ran the test further it crashed the bgworker
and I found that reason was there are some other variables
(ImmediateInterruptOK, MyBgworkerEntry) used in test module without
PGDLLIMPORT.

After adding PGDLLIMPORT to variables (ImmediateInterruptOK,
MyBgworkerEntry, shm_mq_minimum_size) both the tests defined in
contrib module passed.


Attached patch fixes the problems related to test_shm_mq for me.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_test_shm_mq.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 6:15 PM, David Rowley  wrote:

> On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug  wrote:
>
>> * Don't we need to check for volatile function in the filter expression
>> too?
>>
>
>>
> I did manual testing on this before and the volatility test for the
> aggregate arguments seems to cover this. I didn't look into why but it just
> did. I've not test this again since your refactoring. I could test this
> easily before when my numeric case was changing the results because of the
> dscale problem, I noticed that if I did FILTER(WHERE random() > 0) that the
> extra trailing zeros would disappear.
> The problem now is that it's pretty hard to determine if an inverse
> transition took place, the only way we can really tell is performance. I'll
> see if I can invent a new test case for this by creating a user defined
> aggregate as you described. I'm thinking just append '+' to a string for
> transitions and '-' to a string for inverse transitions, then just make
> sure we only have a string of '+'s when doing something like filter(where
> random() >= 0).
>
>
>
I've added a test case for this and it seem work as expected:
https://github.com/david-rowley/postgres/commit/43a5021e8f8ae1af272e7e21a842d1b0d5cbe577

Regards

David Rowley


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
I wrote:
> However, I believe this code was Alvaro's to begin with, so I'd be
> interested in his opinion on how important it is for transformRelOptions
> to allow more than one namespace per call.

> Actually, I'm kind of wondering why the code has a concept of namespaces
> at all, given the fact that it fails to store them in the resulting array.
> It seems impossible to verify after the fact that an option was given with
> the right namespace, so isn't the feature pretty much pointless?

After thinking about it some more, I realize that the intended usage
pattern is to call transformRelOptions once for each allowed namespace.
Since each call would ignore options outside its namespace, that would
result in any options with invalid namespace names being silently ignored;
so the fix was to add a parameter saying which namespaces are valid,
and then each call checks that, independently of extracting the options
it cares about.

This design seems a bit odd to me; it's certainly wasting effort, since
each namespace check after the first one is redundant.  I'm inclined to
propose that we factor out the namespace validity checking into a separate
function, such that you do something like

void checkOptionNamespaces(List *defList, const char * const validnsps[])

just once, and then call transformRelOptions for each of the desired
namespaces; transformRelOptions's validnsps argument goes away.  In at
least some places this looks like it would net out cleaner; for instance,
there is no good reason why callers that are trying to extract "toast"
options should have to know which other namespaces are allowed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 2:20 PM, Florian Pflug  wrote:

> First, I've go the feeling that I should somehow update the commitfest app,
> but I don't really know in which way. Should I put myself in as a reviewer,
> or as a second author? Or neither? Suggestions welcome...
>
>
We I guess you're both now, but it's a bit weird to be author and reviewer
so I've put your name against author too, hopefully Dean can review our
combined results and we can review each other's work at the same time.


> On Jan17, 2014, at 23:34 , David Rowley  wrote:
> > The test turned out to become:
> >   if (state->expectedScale == -1)
> >   state->expectedScale = X.dscale;
> >   else if (state->expectedScale != X.dscale)
> >   state->expectedScale = -2;
> >
> > In do_numeric_accum, then when we do the inverse I just check if
> > expectedScale < 0 then return NULL.
>
> Ok, so this will rescan if and only if the dscales of all inputs match.
> I still that's overly pessimistic - we've only got a problem when we
> removed the input with the largest dscale, no? So my suggestion would be
>
>   state->maxDScale = MAX(X.dscale, state->maxDScale);
>
> in do_numeric_accum, and in the inverse
>
>   if (state->maxDScane == X.dscale)
> return PG_RETURN_NULL;
>
> I'd think that this avoids more restarts without about the same effort,
> but I haven't tried this though, so maybe I'm missing something.
>
>
This is not quite right as it means if all the values are the same then
we reject inverse transitions since state->maxScale will always be equal to
X.dscale.
But you are right about the overly strict code I've put in, we should allow
values with a less than the maximum dscale to be unaggregated without
complaint. To implement this I needed a maxScaleCounter variable too so we
only reject when the maxScaleCounter gets back to 0 again.

Note that after this fix the results for my quick benchmark now look like:

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,2);
select sum(num) over(order by num rows between current row and unbounded
following) from num; -- 113 ms
select sum(num / 10) over(order by num rows between current row and
unbounded following) from num; -- 156ms
select sum(num / 1) over(order by num rows between current row and
unbounded following) from num; -- 144 ms

So it seems to be much less prone to falling back to brute force forward
transitions.
It also seems the / 10 version must have had to previously do 1 brute force
rescan but now it looks like it can do it in 1 scan.

> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> > tracker array, but I'm just not quite sure what extra real use cases it
> > covers that the one above does not. Perhaps my way won't perform inverse
> > transitions if the query did sum(numericCol / 10) or so.
>
> Dunno how many people SUM over numerics with different dscales. Easily
> possible that it's few enough to not be worth fussing over.
>
>
Going by Tom's comments in the post above this is possible just by having
an unconstrained numeric column, but I guess there's still a good chance
that even those unconstrained numbers have the same scale or at least the
scale will likely not vary wildly enough to make us have to perform brute
force forward transitions for each row.


> > create table num (num numeric(10,2) not null);
> > insert into num (num) select * from generate_series(1,2);
> > select sum(num) over(order by num rows between current row and unbounded
> following) from num; -- 124ms
> > select sum(num / 10) over(order by num rows between current row and
> unbounded following) from num; -- 254ms
> > select sum(num / 1) over(order by num rows between current row and
> unbounded following) from num; -- 108156.917 ms
> >
> > The divide by 1 case is slow because of that weird 20 trailing zero
> > instead of 16 when dividing a numeric by 1 and that causes the inverse
> > transition function to return NULL because the scale changes.
> >
> > I've not tested an unpatched version yet to see how that divide by 1
> query
> > performs on that but I'll get to that soon.
>
> So without the patch, all three queries should perform simiarly, i.e. take
> about 10 seconds, right? If so, the improvement is fantastic!
>
>
Well, it's actually 100 seconds, not 10. I tested the worse case
performance against an unpatched head and got 107 seconds instead of the
108. So I'm guessing that's pretty good as worse case is not really any
worse and the worse case is pretty hard to get to. I guess the results
would have to all have a different scale with the biggest scale on the
first aggregated values... Reaching that worse case just seems impossible
in a real world workload.


> > I'm thinking that the idea about detecting the numeric range with
> floating
> > point types and performing an inverse transition providing the range has
> > not gone beyond some defined danger zone is not material for this
> patch

Re: [HACKERS] 9.3.2 Client-only installation per docs fails creating directory.

2014-01-17 Thread Peter Eisentraut
On Thu, 2014-01-16 at 15:59 -0600, Mike Blackwell wrote:
> Per the docs (15.4 Installation Procedure, Client-only installation),
> after running make, the following should work:
> 
> 
> gmake -C src/bin install
> gmake -C src/include install
> gmake -C src/interfaces install
> gmake -C doc install

Fixed, thanks.

(It works if you have NLS enabled, which is why this was probably not
reported before.)





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make various variables read-only (const)

2014-01-17 Thread Tom Lane
Oskari Saarenmaa  writes:
> On Sun, Dec 22, 2013 at 09:43:57PM -0500, Robert Haas wrote:
>> - Why change the API of transformRelOptions()?

> The comment was changed to reflect the new API, I modified
> transformRelOptions to only accept a single valid namespace to make things
> simpler in the calling code.  Nothing used more than one valid namespace
> anyway, and it allows us to just use a constant "toast" without having to
> create a 2 char* array with a NULL.

That may be true today, but the code was obviously designed to allow for
more than one namespace in the future.  I'm inclined to reject this part
of the patch, or at least rework it to const-ify the existing data
structure rather than editorialize on what's needed.

However, I believe this code was Alvaro's to begin with, so I'd be
interested in his opinion on how important it is for transformRelOptions
to allow more than one namespace per call.

Actually, I'm kind of wondering why the code has a concept of namespaces
at all, given the fact that it fails to store them in the resulting array.
It seems impossible to verify after the fact that an option was given with
the right namespace, so isn't the feature pretty much pointless?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array_length(anyarray)

2014-01-17 Thread Marko Tiikkaja

On 1/12/14, 5:53 AM, I wrote:

On 1/9/14, 2:57 PM, Dean Rasheed wrote:

How it should behave for multi-dimensional arrays is less clear, but
I'd argue that it should return the total number of elements, i.e.
cardinality('{{1,2},{3,4}}'::int[][]) = 4. That would make it
consistent with the choices we've already made for unnest() and
ordinality:
   - cardinality(foo) = (select count(*) from unnest(foo)).
   - unnest with ordinality would always result in ordinals in the range
[1, cardinality].


Ignoring my proposal, this seems like the most reasonable option.  I'll
send an updated patch along these lines.


Here's the patch as promised.  Thoughts?


Regards,
Marko Tiikkaja
*** a/doc/src/sgml/array.sgml
--- b/doc/src/sgml/array.sgml
***
*** 338,343  SELECT array_length(schedule, 1) FROM sal_emp WHERE name = 
'Carol';
--- 338,356 
  2
  (1 row)
  
+ 
+  cardinality returns the total number of elements in an
+  array across all dimensions.  It is effectively the number of rows a call to
+  unnest would yield:
+ 
+ 
+ SELECT cardinality(schedule) FROM sal_emp WHERE name = 'Carol';
+ 
+  cardinality 
+ -
+4
+ (1 row)
+ 
   
   
  
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 11009,11014  SELECT NULLIF(value, '(none)') ...
--- 11009,11017 
  array_upper


+ cardinality
+   
+   
  string_to_array


***
*** 11167,11172  SELECT NULLIF(value, '(none)') ...
--- 11170,11186 
 
  
   
+   cardinality(anyarray)
+  
+ 
+ int
+ returns the total number of elements in the array, or 0 if the 
array is empty
+ cardinality(ARRAY[[1,2],[3,4]])
+ 4
+
+
+ 
+  
string_to_array(text, 
text , text)
   
  
*** a/src/backend/utils/adt/arrayfuncs.c
--- b/src/backend/utils/adt/arrayfuncs.c
***
*** 1740,1745  array_length(PG_FUNCTION_ARGS)
--- 1740,1766 
  }
  
  /*
+  * array_cardinality:
+  *returns the total number of elements in an array
+  */
+ Datum
+ array_cardinality(PG_FUNCTION_ARGS)
+ {
+   ArrayType  *v = PG_GETARG_ARRAYTYPE_P(0);
+   int*dimv;
+   int i;
+   int cardinality;
+ 
+   dimv = ARR_DIMS(v);
+   cardinality = 1;
+   for (i = 0; i < ARR_NDIM(v); i++)
+   cardinality *= dimv[i];
+ 
+   PG_RETURN_INT32(cardinality);
+ }
+ 
+ 
+ /*
   * array_ref :
   *  This routine takes an array pointer and a subscript array and returns
   *  the referenced item as a Datum.  Note that for a pass-by-reference
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
***
*** 840,845  DATA(insert OID = 2092 (  array_upper PGNSP PGUID 12 1 0 0 
0 f f f f t f i 2
--- 840,847 
  DESCR("array upper dimension");
  DATA(insert OID = 2176 (  array_length   PGNSP PGUID 12 1 0 0 0 f f f 
f t f i 2 0 23 "2277 23" _null_ _null_ _null_ _null_ array_length _null_ _null_ 
_null_ ));
  DESCR("array length");
+ DATA(insert OID = 3179 (  cardinalityPGNSP PGUID 12 1 0 0 0 f f f f t f i 
1 0 23 "2277" _null_ _null_ _null_ _null_ array_cardinality _null_ _null_ 
_null_ ));
+ DESCR("array cardinality");
  DATA(insert OID = 378 (  array_appendPGNSP PGUID 12 1 0 0 0 f f f f f f i 
2 0 2277 "2277 2283" _null_ _null_ _null_ _null_ array_push _null_ _null_ 
_null_ ));
  DESCR("append element onto end of array");
  DATA(insert OID = 379 (  array_prepend   PGNSP PGUID 12 1 0 0 0 f f f 
f f f i 2 0 2277 "2283 2277" _null_ _null_ _null_ _null_ array_push _null_ 
_null_ _null_ ));
*** a/src/include/utils/array.h
--- b/src/include/utils/array.h
***
*** 204,209  extern Datum array_dims(PG_FUNCTION_ARGS);
--- 204,210 
  extern Datum array_lower(PG_FUNCTION_ARGS);
  extern Datum array_upper(PG_FUNCTION_ARGS);
  extern Datum array_length(PG_FUNCTION_ARGS);
+ extern Datum array_cardinality(PG_FUNCTION_ARGS);
  extern Datum array_larger(PG_FUNCTION_ARGS);
  extern Datum array_smaller(PG_FUNCTION_ARGS);
  extern Datum generate_subscripts(PG_FUNCTION_ARGS);
*** a/src/test/regress/expected/arrays.out
--- b/src/test/regress/expected/arrays.out
***
*** 1455,1460  select array_length(array[[1,2,3], [4,5,6]], 3);
--- 1455,1502 
   
  (1 row)
  
+ select cardinality(NULL::int[]);
+  cardinality 
+ -
+ 
+ (1 row)
+ 
+ select cardinality('{}'::int[]);
+  cardinality 
+ -
+1
+ (1 row)
+ 
+ select cardinality(array[1,2,3]);
+  cardinality 
+ -
+3
+ (1 row)
+ 
+ select cardinality('[2:4]={5,6,7}'::int[]);
+  cardinality 
+ -
+3
+ (1 row)
+ 
+ select cardinality('{{1,2}}'::int[]);
+  cardinality 
+ -
+2
+ (1 row)
+ 
+ select cardinality('{

Re: [HACKERS] Minor improvements to sslinfo contrib module

2014-01-17 Thread Tom Lane
Gurjeet Singh  writes:
> Please find attached the patch that fixes a couple of comments, and adds
> 'static' qualifier to functions that are not used anywhere else in the code
> base.

Committed, with the trivial fix that the function definitions have to be
marked static too.  (gcc doesn't complain about this, which I think may be
correct per C standard; but some other compilers do whine about that sort
of inconsistency.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
>>> I think this approach is fundamentally broken, because it can't reasonably
>>> cope with any case more complicated than "%zu" or "%zd".

>> Am I just too tired, or am I not getting how INT64_FORMAT currently
>> allows the arguments to be used posititional?

> It doesn't, which is one of the reasons for not allowing it in
> translatable strings (the other being lack of standardization of the
> strings that would be subject to translation).

On second thought, that answer was too glib.  There's no need for %n$
in the format strings *in the source code*, so INT64_FORMAT isn't getting
in the way from that perspective.  However, expand_fmt_string() is
necessarily applied to formats *after* they've been through gettext(),
so it has to expect that it might see %n$ in the now-translated strings.

It's still the case that we need a policy against INT64_FORMAT in
translatable strings, though, because what's passed to the gettext
mechanism has to be a fixed string not something with platform
dependencies in it.  Or so I would assume, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund  writes:
> On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
>> I think this approach is fundamentally broken, because it can't reasonably
>> cope with any case more complicated than "%zu" or "%zd".

> Am I just too tired, or am I not getting how INT64_FORMAT currently
> allows the arguments to be used posititional?

It doesn't, which is one of the reasons for not allowing it in
translatable strings (the other being lack of standardization of the
strings that would be subject to translation).  Adding 'z' will only
fix this for cases where what we want to print is really a size_t.
That's a usefully large set, but not all the cases by any means.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
First, I've go the feeling that I should somehow update the commitfest app,
but I don't really know in which way. Should I put myself in as a reviewer,
or as a second author? Or neither? Suggestions welcome...

On Jan17, 2014, at 23:34 , David Rowley  wrote:
> The test turned out to become:
>   if (state->expectedScale == -1)
>   state->expectedScale = X.dscale;
>   else if (state->expectedScale != X.dscale)
>   state->expectedScale = -2;
> 
> In do_numeric_accum, then when we do the inverse I just check if
> expectedScale < 0 then return NULL.

Ok, so this will rescan if and only if the dscales of all inputs match.
I still that's overly pessimistic - we've only got a problem when we
removed the input with the largest dscale, no? So my suggestion would be

  state->maxDScale = MAX(X.dscale, state->maxDScale);

in do_numeric_accum, and in the inverse

  if (state->maxDScane == X.dscale)
return PG_RETURN_NULL;

I'd think that this avoids more restarts without about the same effort,
but I haven't tried this though, so maybe I'm missing something.

> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> tracker array, but I'm just not quite sure what extra real use cases it
> covers that the one above does not. Perhaps my way won't perform inverse
> transitions if the query did sum(numericCol / 10) or so.

Dunno how many people SUM over numerics with different dscales. Easily
possible that it's few enough to not be worth fussing over.

> create table num (num numeric(10,2) not null);
> insert into num (num) select * from generate_series(1,2);
> select sum(num) over(order by num rows between current row and unbounded 
> following) from num; -- 124ms
> select sum(num / 10) over(order by num rows between current row and unbounded 
> following) from num; -- 254ms
> select sum(num / 1) over(order by num rows between current row and unbounded 
> following) from num; -- 108156.917 ms
> 
> The divide by 1 case is slow because of that weird 20 trailing zero
> instead of 16 when dividing a numeric by 1 and that causes the inverse
> transition function to return NULL because the scale changes.
> 
> I've not tested an unpatched version yet to see how that divide by 1 query
> performs on that but I'll get to that soon.

So without the patch, all three queries should perform simiarly, i.e. take
about 10 seconds, right? If so, the improvement is fantastic!

> I'm thinking that the idea about detecting the numeric range with floating
> point types and performing an inverse transition providing the range has
> not gone beyond some defined danger zone is not material for this patch...
> I think it would be not a great deal of work to code, but the testing involved
> would probably make this patch not possible for 9.4

Yeah, I never imagined that this would happen for 9.4.

> The latest version of the patch is attached.

OK, there are a few more comments

* build_aggregate_fnexprs() should allow NULL to be passed for invtransfn_oid,
  I think. I don't quite like that we construct that even for plain aggregates,
  and avoiding requires just an additional if.

* Don't we need to check for volatile function in the filter expression too?

* As it stands, I don't think intXand_inv and intXor_inv are worth it, since
  the case where they return non-NULL is awefully slim (only for inputs
  containing only 1 respectively only zeros). We should either track the number
  of zeros and ones per bit, which would allow us to always perform inverse
  transitions, or just drop them. 

* Quite a few of the inverse transition functions are marked strict, yet
  contain code to handle NULL inputs. You can just remove that code - the system
  makes sure that strict functions never receive NULL arguments. Affected are,
  AFAICS numeric_accum_inv, numeric_avg_accum_inv, int2_accum_inv, 
  int4_accum_inv, int8_accum_inv, int8_avg_accum_inv, int2_sum_inv, 
int4_sum_inv,
  int8_sum_inv. Not sure that list is exhaustive...

* For any of the new inverse transition functions, I'd be inclined to just
  elog() if they're called directly and not as an aggregate. In particular
  those which check for that anyway, plus the *smaller_inv and *larger_inv
  ones. I don't see why anyone would ever want to call these directly - and
  if we elog() now, we can easy change these functions later, because no 
external
  code can depend on them. E.g., maybe someone wants to keep the top N elements
  in the MIN/MAX aggregates one day...

* The number of new regression tests seems a bit excessive. I don't think there
  really a policy what to test and what not, but in this case I think it 
suffices
  if we test the basic machinery, plus the more complex functions. But e.g. most
  of the SUM and AVG aggregates use numeric_accum or numeric_avg_accum 
internally,
  and the wrapper code basically just does datatype conversion, so testing a few
  cases seems enough there. What I think we *should* test, but don't do 
curre

Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund  writes:
> On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
>> I think a better solution approach is to teach our src/port/snprintf.c
>> about the z flag, then extend configure's checking to force use of our
>> snprintf if the platform's version doesn't handle z.

> Hm. I had thought about that, but dismissed it because I thought people
> would argue about it being too invasive...

How so?  It'd be a lot less invasive than what we'd have to do to use
'z' flags the other way.

> If we're going there, we should just eliminate expand_fmt_string() from
> elog.c and test for it in configure too, right?

If you mean "let's rely on glibc for %m", the answer is "not bloody
likely".  See useful_strerror(), which is functionality we'd lose if
we short-circuit that.

>> You suggest below that we could invent some additional
>> macros to support that; but since the "z" flag is in C99, there really
>> ought to be only a small minority of platforms where it doesn't work.

> Well, maybe just a minority numberwise, but one of them being windows
> surely makes it count in number of installations...

Agreed, but I believe we're already using src/port/snprintf.c on Windows
because of lack of %n$ support (which is also required by C99).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Andres Freund
On 2014-01-17 16:18:49 -0800, Greg Stark wrote:
> On Fri, Jan 17, 2014 at 9:14 AM, Andres Freund  wrote:
> > The scheme that'd allow us is the following:
> > When postgres reads a data page, it will continue to first look up the
> > page in its shared buffers, if it's not there, it will perform a page
> > cache backed read, but instruct that read to immediately remove from the
> > page cache afterwards (new API or, posix_fadvise() or whatever). As long
> > as it's in shared_buffers, postgres will not need to issue new reads, so
> > there's no no benefit keeping it in the page cache.
> > If the page is dirtied, it will be written out normally telling the
> > kernel to forget about the caching the page (using 3) or possibly direct
> > io).
> > When a page in postgres's buffers (which wouldn't be set to very large
> > values) isn't needed anymore and *not* dirty, it will seed the kernel
> > page cache with the current data.
> 
> This seems like mostly an exact reimplementation of DIRECT_IO
> semantics using posix_fadvise.

Not at all. The significant bits are a) the kernel uses the pagecache
for writeback of dirty data and more importantly b) uses the pagecache
to cache data not in postgres's s_b.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > [ 0001-Add-support-for-printing-Size-arguments-to-elog-erep.patch ]
> 
> I think this approach is fundamentally broken, because it can't reasonably
> cope with any case more complicated than "%zu" or "%zd".  While it's
> arguable that we can get along without the ability to specify field width,
> since the [U]INT64_FORMAT macros never allowed that, it is surely going
> to be the case that translators will need to insert "n$" flags in the
> translated versions of messages.

Am I just too tired, or am I not getting how INT64_FORMAT currently
allows the arguments to be used posititional?

Admittedly most places currently just cast down the value avoiding the
need for INT64_FORMAT in many places.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
On 2014-01-17 14:18:55 -0500, Tom Lane wrote:
> I wrote:
> > Meh.  This isn't needed if we do what I suggest above, but in any case
> > I don't approve of removing the existing [U]INT64_FORMAT macros.
> > That breaks code that doesn't need to get broken, probably including
> > third-party modules.
> 
> After looking more closely I see you didn't actually *remove* those
> macros, just define them in a different place/way.  So the above objection
> is just noise, sorry.  (Though I think it'd be notationally cleaner to let
> configure continue to define the macros; it doesn't need to do anything as
> ugly as CppAsString2() to concatenate...)

I prefer having configure just define the lenght modifier since that
allows to define further macros containing formats. But I think defining
them as strings instead row literals as I had might make it a bit less ugly...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Andres Freund
Hi,

On 2014-01-17 13:50:08 -0500, Tom Lane wrote:
> I think a better solution approach is to teach our src/port/snprintf.c
> about the z flag, then extend configure's checking to force use of our
> snprintf if the platform's version doesn't handle z.  While it might be
> objected that this creates a need for a run-time check in configure,
> we already have one to check if snprintf handles "n$", so this approach
> doesn't really make life any worse for cross-compiles.

Hm. I had thought about that, but dismissed it because I thought people
would argue about it being too invasive...
If we're going there, we should just eliminate expand_fmt_string() from
elog.c and test for it in configure too, right?

> You suggest below that we could invent some additional
> macros to support that; but since the "z" flag is in C99, there really
> ought to be only a small minority of platforms where it doesn't work.

Well, maybe just a minority numberwise, but one of them being windows
surely makes it count in number of installations...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Greg Stark
On Fri, Jan 17, 2014 at 9:14 AM, Andres Freund  wrote:
> The scheme that'd allow us is the following:
> When postgres reads a data page, it will continue to first look up the
> page in its shared buffers, if it's not there, it will perform a page
> cache backed read, but instruct that read to immediately remove from the
> page cache afterwards (new API or, posix_fadvise() or whatever). As long
> as it's in shared_buffers, postgres will not need to issue new reads, so
> there's no no benefit keeping it in the page cache.
> If the page is dirtied, it will be written out normally telling the
> kernel to forget about the caching the page (using 3) or possibly direct
> io).
> When a page in postgres's buffers (which wouldn't be set to very large
> values) isn't needed anymore and *not* dirty, it will seed the kernel
> page cache with the current data.

This seems like mostly an exact reimplementation of DIRECT_IO
semantics using posix_fadvise.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Tom Lane
David Rowley  writes:
> I just thought that my idea was good enough and very cheap too, won't all
> numerics that are actually stored in a column have the same scale anyway?

No; unconstrained numeric columns (ie, if you just say "numeric") don't
force their contents to any particular scale.  It might be that we don't
have to optimize for that case, since it's not in the SQL spec, but it
is definitely supported by PG.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 10:42 AM, David Rowley  wrote:

>
>> You could do better than that - the numeric problem amounts to tracking
>> the maximum
>> scale AFAICS, so it'd be sufficient to restart if we remove a value whose
>> scale equals
>> the current maximum. But if we do SUM(numeric) at all, I think we should
>> do so
>> without requiring restarts - I still think the overhead of tracking the
>> maximum scale
>> within the aggregated values isn't that bad. If we zero the dscale
>> counters lazily,
>> the numbers of entries we have to zero is bound by the maximum dscale we
>> encounter.
>> Since actually summing N digits is probably more expensive than zeroing
>> them, and since
>> we pay the zeroing overhead only once per aggregation but save
>> potentially many
>> summations, I'm pretty sure we come out ahead by quite some margin.
>>
>>
> We'll work that out, I don't think it will take very long to code up your
> idea either.
> I just thought that my idea was good enough and very cheap too, won't all
> numerics that are actually stored in a column have the same scale anyway?
> Is it not only been a problem because we've been testing with random
> numeric literals the whole time?
>
> The test turned out to become:
> if (state->expectedScale == -1)
> state->expectedScale = X.dscale;
>  else if (state->expectedScale != X.dscale)
> state->expectedScale = -2;
>
> In do_numeric_accum, then when we do the inverse I just check if
> expectedScale < 0 then return NULL.
>
> I'm not set on it, and I'm willing to try the lazy zeroing of the scale
> tracker array, but I'm just not quite sure what extra real use cases it
> covers that the one above does not. Perhaps my way won't perform inverse
> transitions if the query did sum(numericCol / 10) or so.
>
> I'll be committing this to the github repo very soon, so we can hack away
> at the scale tracking code once it's back in.
>
>
Ok, we're back up to 86 aggregate function / type combinations with inverse
transition functions.
I've commited my latest work up to github and here's a fresh patch which I
will need to do more tests on.

The test (below) that used to fail a few versions ago is back in there and
it's now passing.

SELECT SUM(n::numeric) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND
UNBOUNDED FOLLOWING)
  FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n);

In this case it won't use inverse transitions because the forward
transition function detects that the scale is not fixed.

I tested throwing some numerics into a table and I'm pretty happy with the
results.

create table num (num numeric(10,2) not null);
insert into num (num) select * from generate_series(1,2);
select sum(num) over(order by num rows between current row and unbounded
following) from num; -- 124ms
select sum(num / 10) over(order by num rows between current row and
unbounded following) from num; -- 254ms
select sum(num / 1) over(order by num rows between current row and
unbounded following) from num; -- 108156.917 ms

The divide by 1 case is slow because of that weird 20 trailing zero instead
of 16 when dividing a numeric by 1 and that causes the inverse transition
function to return NULL because the scale changes.

I've not tested an unpatched version yet to see how that divide by 1 query
performs on that but I'll get to that soon.

I'm thinking that the idea about detecting the numeric range with floating
point types and performing an inverse transition providing the range has
not gone beyond some defined danger zone is not material for this patch...
I think it would be not a great deal of work to code, but the testing
involved would probably make this patch not possible for 9.4

The latest version of the patch is attached.

Regards

David Rowley


inverse_transition_functions_v2.7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Triggers on foreign tables

2014-01-17 Thread Noah Misch
On Tue, Jan 07, 2014 at 12:11:28PM +0100, Ronan Dunklau wrote:
> Since the last discussion about it 
> (http://www.postgresql.org/message-id/cadyhksugp6ojb1pybtimop3s5fg_yokguto-7rbcltnvaj5...@mail.gmail.com),
>  I
> finally managed to implement row triggers for foreign tables.

> For statement-level triggers, nothing has changed since the last patch.
> Statement-level triggers are just allowed in the command checking.
> 
> For row-level triggers, it works as follows:
> 
> - during rewrite phase, every attribute of the foreign table is duplicated as
> a resjunk entry if a trigger is defined on the relation. These entries are
> then used to store the old values for a tuple. There is room for improvement
> here: we could check if any trigger is in fact a row-level trigger

The need here is awfully similar to a need of INSTEAD OF triggers on views.
For them, we add a single "wholerow" resjunk TLE.  Is there a good reason to
do it differently for triggers on foreign tables?

> - for after triggers, the whole queuing mechanism is bypassed for foreign
> tables. This is IMO acceptable, since foreign tables cannot have constraints
> or constraints triggers, and thus have not need for deferrable execution. This
> design avoids the need for storing and retrieving/identifying remote tuples
> until the query or transaction end.

Whether an AFTER ROW trigger is deferred determines whether it runs at the end
of the firing query or at the end of the firing query's transaction.  In all
cases, every BEFORE ROW trigger of a given query fires before any AFTER ROW
trigger of the same query.  SQL requires that.  This proposal would give
foreign table AFTER ROW triggers a novel firing time; let's not do that.

I think the options going forward are either (a) design a way to queue foreign
table AFTER ROW triggers such that we can get the old and/or new rows at the
end of the query or (b) not support AFTER ROW triggers on foreign tables for
the time being.

It's not clear to me whether SQL/MED contemplates triggers on foreign tables.
Its  General Rules do mention the possibility of
a reference from a trigger column list.  On the other hand, I see nothing
overriding the fact that CREATE TRIGGER only targets base tables.  Is this
clearer to anyone else?  (This is a minor point, mainly bearing on the
Compatibility section of the CREATE TRIGGER documentation.)

> - the duplicated resjunk attributes are identified by being:
>   - marked as resjunk in the targetlist
>   - not being system or whole-row attributes (varno > 0)
> 
> There is still one small issue with the attached patch: modifications to the
> tuple performed by the foreign data wrapper (via the returned TupleTableSlot
> in ExecForeignUpdate and ExecForeignInsert hooks) are not visible to the AFTER
> trigger. This could be fixed by merging the planslot containing the resjunk
> columns with the returned slot before calling the trigger, but I'm not really
> sure how to safely perform that. Any advice ?

Currently, FDWs are permitted to skip returning columns not actually
referenced by any RETURNING clause.  I would change that part of the API
contract to require returning all columns when an AFTER ROW trigger is
involved.  You can't get around doing that by merging old column values,
because, among other reasons, an INSERT does not have those values at all.

On Tue, Jan 07, 2014 at 05:31:10PM +0100, Ronan Dunklau wrote:
> --- a/contrib/postgres_fdw/expected/postgres_fdw.out
> +++ b/contrib/postgres_fdw/expected/postgres_fdw.out

> +NOTICE:  TG_NAME: trig_row_after
> +NOTICE:  TG_WHEN: AFTER
> +NOTICE:  TG_LEVEL: ROW
> +NOTICE:  TG_OP: DELETE
> +NOTICE:  TG_RELID::regclass: rem1
> +NOTICE:  TG_RELNAME: rem1
> +NOTICE:  TG_TABLE_NAME: rem1
> +NOTICE:  TG_TABLE_SCHEMA: public
> +NOTICE:  TG_NARGS: 2
> +NOTICE:  TG_ARGV: [23, skidoo]
> +NOTICE:  OLD: (2,bye)
> +NOTICE:  TG_NAME: trig_row_before
> +NOTICE:  TG_WHEN: BEFORE
> +NOTICE:  TG_LEVEL: ROW
> +NOTICE:  TG_OP: DELETE
> +NOTICE:  TG_RELID::regclass: rem1
> +NOTICE:  TG_RELNAME: rem1
> +NOTICE:  TG_TABLE_NAME: rem1
> +NOTICE:  TG_TABLE_SCHEMA: public
> +NOTICE:  TG_NARGS: 2
> +NOTICE:  TG_ARGV: [23, skidoo]
> +NOTICE:  OLD: (11,"bye remote")
> +NOTICE:  TG_NAME: trig_row_after
> +NOTICE:  TG_WHEN: AFTER
> +NOTICE:  TG_LEVEL: ROW
> +NOTICE:  TG_OP: DELETE
> +NOTICE:  TG_RELID::regclass: rem1
> +NOTICE:  TG_RELNAME: rem1
> +NOTICE:  TG_TABLE_NAME: rem1
> +NOTICE:  TG_TABLE_SCHEMA: public
> +NOTICE:  TG_NARGS: 2
> +NOTICE:  TG_ARGV: [23, skidoo]
> +NOTICE:  OLD: (11,"bye remote")
> +insert into rem1 values(1,'insert');

Would you trim the verbosity a bit?  Maybe merge several of the TG_ fields
onto one line, and remove the low-importance ones.  Perhaps issue one line
like this in place of all the current TG_ lines:

  NOTICE:  trig_row_after(23, skidoo) AFTER ROW DELETE ON rem1

> --- a/contrib/postgres_fdw/sql/postgres_fdw.sql
> +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
> @@ -384,3 +384,188 @@ insert into loc1(f2) values

Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 10:17 AM, Florian Pflug  wrote:

> On Jan17, 2014, at 20:34 , David Rowley  wrote:
> > On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug  wrote:
> >
> >> I've now shuffled things around so that we can use inverse transition
> functions
> >> even if only some aggregates provide them, and to allow inverse
> transition
> >> functions to force a restart by returning NULL. The rules regarding
> NULL handling
> >> are now the following
> >
> > Maybe this is me thinking out loud, but I'm just thinking about the
> numeric case again.
> > Since the patch can now handle inverse transition functions returning
> NULL when they
> > fail to perform inverse transitions, I'm wondering if we could add an
> "expectedscale"
> > to NumericAggState, set it to -1 initially, when we get the first value
> set the
> > expectedscale to the dscale of that numeric, then if we get anything but
> that value
> > we'll set the expectedscale back to -1 again, if we are asked to perform
> an inverse
> > transition with a expectedscale as -1 we'll return null, otherwise we
> can perform
> > the inverse transition...
>
> You could do better than that - the numeric problem amounts to tracking
> the maximum
> scale AFAICS, so it'd be sufficient to restart if we remove a value whose
> scale equals
> the current maximum. But if we do SUM(numeric) at all, I think we should
> do so
> without requiring restarts - I still think the overhead of tracking the
> maximum scale
> within the aggregated values isn't that bad. If we zero the dscale
> counters lazily,
> the numbers of entries we have to zero is bound by the maximum dscale we
> encounter.
> Since actually summing N digits is probably more expensive than zeroing
> them, and since
> we pay the zeroing overhead only once per aggregation but save potentially
> many
> summations, I'm pretty sure we come out ahead by quite some margin.
>
>
We'll work that out, I don't think it will take very long to code up your
idea either.
I just thought that my idea was good enough and very cheap too, won't all
numerics that are actually stored in a column have the same scale anyway?
Is it not only been a problem because we've been testing with random
numeric literals the whole time?

The test turned out to become:
if (state->expectedScale == -1)
state->expectedScale = X.dscale;
else if (state->expectedScale != X.dscale)
state->expectedScale = -2;

In do_numeric_accum, then when we do the inverse I just check if
expectedScale < 0 then return NULL.

I'm not set on it, and I'm willing to try the lazy zeroing of the scale
tracker array, but I'm just not quite sure what extra real use cases it
covers that the one above does not. Perhaps my way won't perform inverse
transitions if the query did sum(numericCol / 10) or so.

I'll be committing this to the github repo very soon, so we can hack away
at the scale tracking code once it's back in.

David Rowley


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 03:15 PM, Tom Lane wrote:


The other possibility I was contemplating is that "export a const
variable" doesn't actually work for some reason.  We're not in the habit
of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
I'd rather avoid the technique altogether.

The least-unlike-other-Postgres-code approach would be to go ahead and
export the struct so that the size computation could be provided as a
#define in the same header.  Robert stated a couple days ago that he
didn't foresee much churn in this struct, so that doesn't seem
unacceptable.

Another possibility is to refactor so that testing an allocation request
against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
not some random code in a contrib module.  It's not immediately apparent
to me why it's good code modularization to have a contrib module
responsible for checking sizes based on the sizeof a struct it's not
supposed to have any access to.





Or maybe we could expose the value via a function rather than a const 
variable.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 20:34 , David Rowley  wrote:
> On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug  wrote:
> 
>> I've now shuffled things around so that we can use inverse transition 
>> functions
>> even if only some aggregates provide them, and to allow inverse transition
>> functions to force a restart by returning NULL. The rules regarding NULL 
>> handling
>> are now the following
> 
> Maybe this is me thinking out loud, but I'm just thinking about the numeric 
> case again.
> Since the patch can now handle inverse transition functions returning NULL 
> when they
> fail to perform inverse transitions, I'm wondering if we could add an 
> "expectedscale"
> to NumericAggState, set it to -1 initially, when we get the first value set 
> the
> expectedscale to the dscale of that numeric, then if we get anything but that 
> value
> we'll set the expectedscale back to -1 again, if we are asked to perform an 
> inverse
> transition with a expectedscale as -1 we'll return null, otherwise we can 
> perform
> the inverse transition...

You could do better than that - the numeric problem amounts to tracking the 
maximum
scale AFAICS, so it'd be sufficient to restart if we remove a value whose scale 
equals
the current maximum. But if we do SUM(numeric) at all, I think we should do so
without requiring restarts - I still think the overhead of tracking the maximum 
scale
within the aggregated values isn't that bad. If we zero the dscale counters 
lazily,
the numbers of entries we have to zero is bound by the maximum dscale we 
encounter.
Since actually summing N digits is probably more expensive than zeroing them, 
and since
we pay the zeroing overhead only once per aggregation but save potentially many
summations, I'm pretty sure we come out ahead by quite some margin.

It'd be interesting to do float() similar to the way you describe, though. We 
might
not be able to guarantee that we yield exactly the same result as without 
inverse
aggregation, but we might be able to bound the error. That might make it 
acceptable
to do this - as Kevin pointed out, float is always an approximation anyway. I 
haven't
really thought that through, though...

Anyway, with time running out fast if we want to get this into 9.4, I think we 
should
focus on getting this into a committable state right now.

I've started to look over what you've pushed to github, and it looks mostly 
fine.
I have a few comments - mostly cosmetic stuff - that I'll post once I finished 
reading
through it. I also plan to do some basic performance testing to verify that my
reshuffling of eval_windowaggregates() doesn't hurt aggregates without an 
inverse
transition function.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pgcrypto: implement gen_random_uuid

2014-01-17 Thread Tom Lane
Emre Hasegeli  writes:
> 2014/1/9 Oskari Saarenmaa :
>> The only useful feature of the uuid-ossp module in my opinion is the
>> uuid_generate_v4 function and as uuid-ossp is more or less abandonware
>> people have had trouble building and installing it.  This patch implements
>> an alternative uuid v4 generation function in pgcrypto which could be moved
>> to core once there's a core PRNG with large enough internal state.

> It is a small but very useful patch. Installing uuid-ossp can be very hard
> on some systems. There is not much to review. The patch applies cleanly to
> HEAD. The function is generating valid UUID version 4. The code and
> the documentation style seems to fit in the pgcrypto extension. I am marking
> it as "Ready for Commiter".

> The problem is users probably would not look pgcrypto extension for
> UUID generator, especially when there is another extension with uuid in
> it's name. Also, UUID generator does not sound like a cryptographic function.
> It would be much better, if this would be in core with the UUID type. There
> is a reference on the UUID Type documentation page to the uuid-ossp
> extension. We can add a reference to pgcrypro extension in that page and
> consider adding a deprecation note to the uuid-ossp extension, if is is not
> possible to add the function to the core, for now.

Well, we're not pulling pgcrypto into core in the foreseeable future;
there are legal (export control) issues that make that too risky.
Even aside from that, there was general consensus when type uuid went
in that the various generation algorithms were, how shall I say it, too
intellectually unsatisfying to be part of the core code.  So I think from
a code standpoint this solution is just fine.  I agree that we need some
extra work on the documentation to point people towards this approach
instead of uuid-ossp, though.  I'll take care of that and commit.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Gregory Smith

On 1/17/14 10:37 AM, Mel Gorman wrote:
There is not an easy way to tell. To be 100%, it would require an 
instrumentation patch or a systemtap script to detect when a 
particular page is being written back and track the context. There are 
approximations though. Monitor nr_dirty pages over time.


I have a benchmarking wrapper for the pgbench testing program called 
pgbench-tools:  https://github.com/gregs1104/pgbench-tools  As of 
October, on Linux it now plots the "Dirty" value from /proc/meminfo over 
time.  You get that on the same time axis as the transaction latency 
data.  The report at the end includes things like the maximum amount of 
dirty memory observed during the test sampling. That doesn't tell you 
exactly what's happening to the level someone reworking the kernel logic 
might want, but you can easily see things like the database's checkpoint 
cycle reflected by watching the dirty memory total.  This works really 
well for monitoring production servers too.  I have a lot of data from a 
plugin for the Munin monitoring system that plots the same way.  Once 
you have some history about what's normal, it's easy to see when systems 
fall behind in a way that's ruining writes, and the high water mark 
often correlates with bad responsiveness periods.


Another recent change is that pgbench for the upcoming PostgreSQL 9.4 
now allows you to specify a target transaction rate.  Seeing the write 
latency behavior with that in place is far more interesting than 
anything we were able to watch with pgbench before.  The pgbench write 
tests we've been doing for years mainly told you the throughput rate 
when all of the caches were always as full as the database could make 
them, and tuning for that is not very useful. Turns out it's far more 
interesting to run at 50% of what the storage is capable of, then watch 
what happens to latency when you adjust things like the dirty_* parameters.


I've been working on the problem of how we can make a benchmark test 
case that acts enough like real busy PostgreSQL servers that we can 
share it with kernel developers, and then everyone has an objective way 
to measure changes.  These rate limited tests are working much better 
for that than anything I came up with before.


I am skeptical that the database will take over very much of this work 
and perform better than the Linux kernel does.  My take is that our most 
useful role would be providing test cases kernel developers can add to a 
performance regression suite.  Ugly "we never though that would happen" 
situations seems at the root of many of the kernel performance 
regressions people here get nailed by.


Effective I/O scheduling is very hard, and we are unlikely to ever out 
innovate the kernel hacking community by pulling more of that into the 
database.  It's already possible to experiment with moving in that 
direction with tuning changes.  Use a larger database shared_buffers 
value, tweak checkpoints to spread I/O out, and reduce things like 
dirty_ratio.  I do some of that, but I've learned it's dangerous to 
wander too far that way.


If instead you let Linux do even more work--give it a lot of memory to 
manage and room to re-order I/O--that can work out quite well. For 
example, I've seen a lot of people try to keep latency down by using the 
deadline scheduler and very low settings for the expire times.  Theory 
is great, but it never works out in the real world for me though.  
Here's the sort of deadline I deploy instead now:


echo 500  > ${DEV}/queue/iosched/read_expire
echo 30   > ${DEV}/queue/iosched/write_expire
echo 1048576  > ${DEV}/queue/iosched/writes_starved

These numbers look insane compared to the defaults, but I assure you 
they're from a server that's happily chugging through 5 to 10K 
transactions/second around the clock.  PostgreSQL forces writes out with 
fsync when they must go out, but this sort of tuning is basically giving 
up on it managing writes beyond that.  We really have no idea what order 
they should go out in.  I just let the kernel have a large pile of work 
queued up, and trust things like the kernel's block elevator and 
congestion code are smarter than the database can possibly be.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Alvaro Herrera
Andrew Dunstan escribió:
> 
> On 01/17/2014 02:54 PM, Alvaro Herrera wrote:

> >Hmm, this seems to contradict what's documented at the definition of
> >FLEXIBLE_ARRAY_MEMBER:

> Well, there's a bunch of these in the source:

AFAICT these are all defined with non-zero, non-empty array sizes, not
FLEXIBLE_ARRAY_MEMBER (which mq_ring is).  I don't know why that makes a
difference but apparently it does.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan  writes:
> On 01/17/2014 02:54 PM, Alvaro Herrera wrote:
>> Hmm, this seems to contradict what's documented at the definition of
>> FLEXIBLE_ARRAY_MEMBER:

> Well, there's a bunch of these in the source:

Yeah, I did the same grep and noted the same thing --- but on second look,
I think those are all structs that are defined without use of
FLEXIBLE_ARRAY_MEMBER, which OTOH *is* used in struct shm_mq.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane escribió:
>> I seem to recall that we've previously found that you have to write
>> MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;
>> to keep MSVC happy with a reference to an array member in offsetof.

> Hmm, this seems to contradict what's documented at the definition of
> FLEXIBLE_ARRAY_MEMBER:

Ah, I thought we had that issue documented somewhere, but failed to find
this comment, or I'd have known that was backwards.

The other possibility I was contemplating is that "export a const
variable" doesn't actually work for some reason.  We're not in the habit
of doing that elsewhere, so I don't find that theory outlandish.  Perhaps
it could be fixed by adding PGDLLIMPORT to the extern, but on the whole
I'd rather avoid the technique altogether.

The least-unlike-other-Postgres-code approach would be to go ahead and
export the struct so that the size computation could be provided as a
#define in the same header.  Robert stated a couple days ago that he
didn't foresee much churn in this struct, so that doesn't seem
unacceptable.

Another possibility is to refactor so that testing an allocation request
against shm_mq_minimum_size is the responsibility of storage/ipc/shm_mq.c,
not some random code in a contrib module.  It's not immediately apparent
to me why it's good code modularization to have a contrib module
responsible for checking sizes based on the sizeof a struct it's not
supposed to have any access to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 02:54 PM, Alvaro Herrera wrote:

Tom Lane escribió:

Andrew Dunstan  writes:

Not quite out of the woods yet. We're getting this regression failure on
Windows/MSVC (see
):
SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
'') from generate_series(1,400)), 1, 1);
! ERROR:  queue size must be at least 472262143 bytes

It looks like this is computing a bogus value:

const Size shm_mq_minimum_size =
MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;

I seem to recall that we've previously found that you have to write

MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;

to keep MSVC happy with a reference to an array member in offsetof.

Hmm, this seems to contradict what's documented at the definition of
FLEXIBLE_ARRAY_MEMBER:

/* Define to nothing if C supports flexible array members, and to 1 if it does
not. That way, with a declaration like `struct s { int n; double
d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
compilers. When computing the size of such an object, don't use 'sizeof
(struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
MSVC and with C++ compilers. */
#define FLEXIBLE_ARRAY_MEMBER /**/





Well, there's a bunch of these in the source:

   [andrew@emma pg_head]$ grep -r offsetof.*\\[0 src/backend
   src/backend/access/nbtree/nbtutils.c:size = offsetof(BTVacInfo,
   vacuums[0]);
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path->p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:if (npts <= 0 || npts >= (int32)
   ((INT_MAX - offsetof(PATH, p[0])) / sizeof(Point)))
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path->p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly->p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:if (npts <= 0 || npts >= (int32)
   ((INT_MAX - offsetof(POLYGON, p[0])) / sizeof(Point)))
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly->p[0]) * npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +base_size;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly->p[0]) * path->npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +sizeof(poly->p[0]) * 4;
   src/backend/utils/adt/geo_ops.c:size = offsetof(PATH, p[0])
   +sizeof(path->p[0]) * poly->npts;
   src/backend/utils/adt/geo_ops.c:size = offsetof(POLYGON, p[0])
   +base_size;
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:pgstat_send(&msg,
   offsetof(PgStat_MsgFuncstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:pgstat_send(&msg,
   offsetof(PgStat_MsgFuncstat, m_entry[0]) +
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgFuncpurge, m_functionid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgFuncpurge, m_functionid[0])
   src/backend/postmaster/pgstat.c:len =
   offsetof(PgStat_MsgTabpurge, m_tableid[0]) +sizeof(Oid);

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Alvaro Herrera
Tom Lane escribió:
> Andrew Dunstan  writes:
> > Not quite out of the woods yet. We're getting this regression failure on 
> > Windows/MSVC (see 
> > ):
> 
> >SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
> > '') from generate_series(1,400)), 1, 1);
> > ! ERROR:  queue size must be at least 472262143 bytes
> 
> It looks like this is computing a bogus value:
> 
> const Size shm_mq_minimum_size =
>   MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;
> 
> I seem to recall that we've previously found that you have to write
> 
>   MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;
> 
> to keep MSVC happy with a reference to an array member in offsetof.

Hmm, this seems to contradict what's documented at the definition of
FLEXIBLE_ARRAY_MEMBER:

/* Define to nothing if C supports flexible array members, and to 1 if it does
   not. That way, with a declaration like `struct s { int n; double
   d[FLEXIBLE_ARRAY_MEMBER]; };', the struct hack can be used with pre-C99
   compilers. When computing the size of such an object, don't use 'sizeof
   (struct s)' as it overestimates the size. Use 'offsetof (struct s, d)'
   instead. Don't use 'offsetof (struct s, d[0])', as this doesn't work with
   MSVC and with C++ compilers. */
#define FLEXIBLE_ARRAY_MEMBER /**/


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning off HOT/Cleanup sometimes

2014-01-17 Thread Robert Haas
On Wed, Jan 15, 2014 at 5:14 PM, Simon Riggs  wrote:
> We already know that HOT is ineffective in areas of high contention
> (previous thread by me). Prior experience was that smaller tables
> didn't show much apparent benefit from using HOT either; its
> effectiveness was limited to medium and large tables being updated.
> The two already stated use cases that would apply are these ones

Do you have a link to that previous thread?  I don't happen to recall
that conversation.

I've found that HOT can be very important on smaller tables, so I'm
skeptical of that as a general conclusion.  What I think might be true
is that if VACUUM is going to hit the table often enough to make you
happy, then you don't really need HOT.  In other words, if the update
rate is non-zero but low, not too much cruft will accumulate before
the table gets vacuumed, and you may be OK.  If the update rate is
high, though, I think disabling HOT will be painful on a table of any
size.  There might be exceptions, but I can't think of what the are
off-hand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] surprising to_timestamp behavior

2014-01-17 Thread Tom Lane
Jeevan Chalke  writes:
> Attached patch which fixes this issue.

> I have just tweaked the code around ignoring spaces in DCH_from_char()
> function.

> Adding to CommitFest 2014-01 (Open).

I went to review this, and found that there's not actually a patch
attached ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug  wrote:

>
> I've now shuffled things around so that we can use inverse transition
> functions
> even if only some aggregates provide them, and to allow inverse transition
> functions to force a restart by returning NULL. The rules regarding NULL
> handling
> are now the following
>
>
Maybe this is me thinking out loud, but I'm just thinking about the numeric
case again.
Since the patch can now handle inverse transition functions returning NULL
when they fail to perform inverse transitions, I'm wondering if we could
add an "expectedscale" to NumericAggState, set it to -1 initially, when we
get the first value set the expectedscale to the dscale of that numeric,
then if we get anything but that value we'll set the expectedscale back to
-1 again, if we are asked to perform an inverse transition with a
expectedscale as -1 we'll return null, otherwise we can perform the inverse
transition...

Thoughts?

Regards

David Rowley


Re: [HACKERS] [v9.4] row level security

2014-01-17 Thread Gregory Smith

On 12/13/13 11:40 PM, Craig Ringer wrote:

You may want to check out the updated writable security-barrier views patch.

http://www.postgresql.org/message-id/52ab112b.6020...@2ndquadrant.com

It may offer a path forward for the CF submission for RLS, letting us
get rid of the var/attr fiddling that many here objected to.


With my advocacy hat on, I'd like to revisit this idea now that there's 
a viable updatable security barrier view submission.  I thought the most 
serious showstopper feedback from the last CF's RLS submission was that 
this needed to be sorted out first.  Reworking KaiGai's submission to 
merge against Dean's new one makes it viable again in my mind, and I'd 
like to continue toward re-reviewing it as part of this CF in that 
light.  Admittedly it's not ideal to try and do that at the same time 
the barrier view patch is being modified, but I see that as a normal CF 
merge of things based on other people's submissions.


I mentioned advocacy because the budding new PostgreSQL test instances 
I'm seeing now will lose a lot of momentum if we end up with no user 
visible RLS features in 9.4.  The pieces we have now can assemble into 
something that's useful, and I don't think that goal is unreasonably far 
away.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal, patch: allow multiple plpgsql plugins

2014-01-17 Thread Pavel Stehule
2014/1/16 Marko Tiikkaja 

> Hi Pavel,
>
> First of all, thanks for working on this!
>
>
> On 1/12/14, 8:58 PM, Pavel Stehule wrote:
>
>> I still not happy with plugin_info - it is only per plugin now and should
>> be per plugin and per function.
>>
>
> I'm not sure I understand the point of plugin_info in the first place, but
> what would having a separate info per (plugin, function) achieve that can't
> be done otherwise?
>

First use case - I would to protect repeated call of plpgsql_check_function
in passive mode. Where I have to store information about successful first
start? It is related to the function instance, so function oid can be
ambiguous (for function with polymorphic parameters). When function
instance is destroyed, then this information should be destroyed. It is
impossible do this check from plugin. Second use case - attach session life
cycle plugin data with some function - for example for coverage
calculation. Inside plugin without function specific data you have to hold
a hash of all used function, and you have to search again and again. When
plpgsql hold this info in internal plpgsql function structures, then you
don't need search anything.




Regards

Pavel



>
>
> As for the current patch, I'd like to see improvements on a few things:
>
>   1) It doesn't currently compile because of extra semicolons in the
>  PLpgSQL_plugin struct.
>
>   2) The previous comment above the same struct still talk about the
>  rendezvous variable which is now gone.  The comment should be
>  updated to reflect the new API.
>
>   3) The same comment talks about how important it is to unregister a
>  plugin if its _PG_fini() is ever called, but the current API does
>  not support unregistering.  That should probably be added?  I'm not
>  sure when _PG_fini() would be called.
>
>   4) The comment  /* reserved for use by optional plugin */  seems a bit
>  weird in its new context.
>
>
> Regards,
> Marko Tiikkaja
>


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 8:20 AM, Magnus Hagander  wrote:
>> Robert Haas reported that setting it to 32MB can yield a considerable
>> performance benefit:
>>
>> http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com
>
> In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

I'm in favor of keeping the setting; I think that the auto-tuning has
largely eliminated the pain in this area for the majority of users,
but that doesn't mean we should deny someone who really wants to
squeeze the last drop of performance out of their system the
opportunity to poke at it manually.  I doubt it's the least useful
setting we have.  The test above shows 32MB beating 16MB, but I think
I did other tests where 16MB and 64MB came out the same.

Back when I was working heavily on performance, I did a number of
tests to try to understand what events cause latency spikes.  Many of
those events are related to write-ahead logging.  It turns out that
writing a page from PostgreSQL's WAL buffers to the OS cache is
usually pretty fast, unless the OS thinks we're dirtying data too
quickly and decides to slam on the brakes.  Calling fsync() to get the
data out to disk, though, is very slow.  However, both of those
operations are protected by the same lock (WALWriteLock), so it's
frequently the case that no more WAL buffer space can be freed up by
calling write() because the guy who has the lock is busy waiting for
an fsync().  That sucks, because there's no intrinsic reason why we
can't have one backend doing a write() while another backend is doing
an fsync().  On a related note, there's no real reason why the poor
bastard who writes the WAL record that fills a segment should be
forced to synchronously flush the segment instead of letting it be
done in the background, but right now he is.

I think if we fix these problems, the optimal value for wal_buffers is
likely to change; however, I'm not certain we'll be able to to
auto-tune it perfectly on day one.  Having a setting makes it easier
for people to experiment with different values, and I think that's
good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
I wrote:
> Meh.  This isn't needed if we do what I suggest above, but in any case
> I don't approve of removing the existing [U]INT64_FORMAT macros.
> That breaks code that doesn't need to get broken, probably including
> third-party modules.

After looking more closely I see you didn't actually *remove* those
macros, just define them in a different place/way.  So the above objection
is just noise, sorry.  (Though I think it'd be notationally cleaner to let
configure continue to define the macros; it doesn't need to do anything as
ugly as CppAsString2() to concatenate...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan  writes:
> Not quite out of the woods yet. We're getting this regression failure on 
> Windows/MSVC (see 
> ):

>SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), 
> '') from generate_series(1,400)), 1, 1);
> ! ERROR:  queue size must be at least 472262143 bytes

It looks like this is computing a bogus value:

const Size shm_mq_minimum_size =
MAXALIGN(offsetof(shm_mq, mq_ring)) + MAXIMUM_ALIGNOF;

I seem to recall that we've previously found that you have to write

MAXALIGN(offsetof(shm_mq, mq_ring[0])) + MAXIMUM_ALIGNOF;

to keep MSVC happy with a reference to an array member in offsetof.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Josh Berkus
Mel,

So we have a few interested parties.  What do we need to do to set up
the Collab session?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 12:55 PM, David Rowley wrote:
On Sat, Jan 18, 2014 at 6:50 AM, Andrew Dunstan > wrote:



On 01/17/2014 12:43 PM, Tom Lane wrote:

Andrew Dunstan mailto:and...@dunslane.net>> writes:

It looks like what we need to fix the immediate problem is
to mark
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only
one of these
symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add
PGDLLIMPORT.
Add that one and see what happens ...




OK, done.


Great, that fixes it on my end.

Thanks for fixing those.



Not quite out of the woods yet. We're getting this regression failure on 
Windows/MSVC (see 
):



*** 
c:/prog/bf/root/HEAD/pgsql.2612/contrib/test_shm_mq/expected/test_shm_mq.out
Fri Jan 17 13:20:53 2014
--- c:/prog/bf/root/HEAD/pgsql.2612/contrib/test_shm_mq/results/test_shm_mq.out 
Fri Jan 17 13:44:33 2014
***
*** 5,18 
  -- internal sanity tests fail.
  --
  SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), '') 
from generate_series(1,400)), 1, 1);
!  test_shm_mq
! -
!
! (1 row)
!
  SELECT test_shm_mq_pipelined(16384, (select 
string_agg(chr(32+(random()*96)::int), '') from generate_series(1,27)), 
200, 3);
!  test_shm_mq_pipelined
! ---
!
! (1 row)
!
--- 5,10 
  -- internal sanity tests fail.
  --
  SELECT test_shm_mq(32768, (select string_agg(chr(32+(random()*96)::int), '') 
from generate_series(1,400)), 1, 1);
! ERROR:  queue size must be at least 472262143 bytes
  SELECT test_shm_mq_pipelined(16384, (select 
string_agg(chr(32+(random()*96)::int), '') from generate_series(1,27)), 
200, 3);
! ERROR:  queue size must be at least 472262143 bytes




cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add %z support to elog/ereport?

2014-01-17 Thread Tom Lane
Andres Freund  writes:
> [ 0001-Add-support-for-printing-Size-arguments-to-elog-erep.patch ]

I think this approach is fundamentally broken, because it can't reasonably
cope with any case more complicated than "%zu" or "%zd".  While it's
arguable that we can get along without the ability to specify field width,
since the [U]INT64_FORMAT macros never allowed that, it is surely going
to be the case that translators will need to insert "n$" flags in the
translated versions of messages.

Another objection is that this only fixes the problem in elog/ereport
format strings, not anyplace else that it might be useful to print a
size_t value.  You suggest below that we could invent some additional
macros to support that; but since the "z" flag is in C99, there really
ought to be only a small minority of platforms where it doesn't work.
So I don't think we should be contorting our code with some nonstandard
notation for the case, if there's any way to avoid that.

I think a better solution approach is to teach our src/port/snprintf.c
about the z flag, then extend configure's checking to force use of our
snprintf if the platform's version doesn't handle z.  While it might be
objected that this creates a need for a run-time check in configure,
we already have one to check if snprintf handles "n$", so this approach
doesn't really make life any worse for cross-compiles.

> In patch 01, I've modified configure to not define [U]INT64_FORMAT
> directly, but rather just define INT64_LENGTH_MODIFIER as
> appropriate. The formats are now defined in c.h.
> INT64_LENGTH_MODIFIER is defined without quotes - I am not sure whether
> that was the right choice, it requires using CppAsString2() in some
> places. Maybe I should just have defined it with quotes instead. Happy
> to rejigger it if somebody thinks the other way would be better.

Meh.  This isn't needed if we do what I suggest above, but in any case
I don't approve of removing the existing [U]INT64_FORMAT macros.
That breaks code that doesn't need to get broken, probably including
third-party modules.  It'd be more sensible to just add an additional
macro for the flag character(s).  (And yeah, I'd put quotes in it.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> On 01/17/2014 01:05 PM, Alexander Korotkov wrote:
>
>> Seems to be fixed in the attached version of patch.
>> Another improvement in this version: only left-most segments and further
>> are re-encoded. Left part of page are left untouched.
>>
>
> I'm looking into this now. A few quick notes:
>
> * Even when you don't re-encode the whole page, you still WAL-log the
> whole page. While correct, it'd be a pretty obvious optimization to only
> WAL-log the modified part.
>

Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
appropriate place fro data but didn't notice that I still wrote whole page
to xlog record.


> * When new items are appended to the end of the page so that only the last
> existing compressed segment is re-encoded, and the page has to be split,
> the items aren't divided 50/50 on the pages. The loop that moves segments
> destined for the left page to the right won't move any existing, untouched,
> segments.
>

I think this loop will move one segment. However, it's too small.


>  !   /*
>> !* Didn't fit uncompressed. We'll have to encode them. Check if
>> both
>> !* new items and uncompressed items can be placed starting from
>> last
>> !* segment of page. Then re-encode only last segment of page.
>> !*/
>> !   minNewItem = newItems[0];
>> !   if (nolduncompressed == 0 &&
>> !   ginCompareItemPointers(&olduncompressed[0],
>> &minNewItem) < 0)
>> !   minNewItem = olduncompressed[0];
>>
>
> That looks wrong. If I'm understanding it right, it's trying to do
> minNewItem = Min(newItems[0], olduncompressed[0]). The test should be
> "nolduncompressed > 0 && ..."
>

Yes, definitely a bug.


> No need to send a new patch, I'll just fix those while reviewing...


Ok, thanks!

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Heikki Linnakangas

On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.


I'm looking into this now. A few quick notes:

* Even when you don't re-encode the whole page, you still WAL-log the 
whole page. While correct, it'd be a pretty obvious optimization to only 
WAL-log the modified part.


* When new items are appended to the end of the page so that only the 
last existing compressed segment is re-encoded, and the page has to be 
split, the items aren't divided 50/50 on the pages. The loop that moves 
segments destined for the left page to the right won't move any 
existing, untouched, segments.



!   /*
!* Didn't fit uncompressed. We'll have to encode them. Check if both
!* new items and uncompressed items can be placed starting from last
!* segment of page. Then re-encode only last segment of page.
!*/
!   minNewItem = newItems[0];
!   if (nolduncompressed == 0 &&
!   ginCompareItemPointers(&olduncompressed[0], &minNewItem) 
< 0)
!   minNewItem = olduncompressed[0];


That looks wrong. If I'm understanding it right, it's trying to do 
minNewItem = Min(newItems[0], olduncompressed[0]). The test should be 
"nolduncompressed > 0 && ..."


No need to send a new patch, I'll just fix those while reviewing...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Re: Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Mel Gorman
On Fri, Jan 17, 2014 at 06:14:37PM +0100, Andres Freund wrote:
> Hi Mel,
> 
> On 2014-01-17 16:31:48 +, Mel Gorman wrote:
> > Direct IO, buffered IO, double buffering and wishlists
> > --
> >3. Hint that a page should be dropped immediately when IO completes.
> >   There is already something like this buried in the kernel internals
> >   and sometimes called "immediate reclaim" which comes into play when
> >   pages are bgin invalidated. It should just be a case of investigating
> >   if that is visible to userspace, if not why not and do it in a
> >   semi-sensible fashion.
> 
> "bgin invalidated"?
> 

s/bgin/being/

I admit that "invalidated" in this context is very vague and I did
not explain myself. This paragraph should remind anyone familiar with
VM internals about what happens when invalidate_mapping_pages calls
deactivate_page and how PageReclaim pages are treated by both page reclaim
and end_page_writeback handler. It's similar but not identical to what
Postgres wants and is a reasonable starting position for an implementation.

> Generally, +1 on the capability to achieve such a behaviour from
> userspace.
> 
> >7. Allow userspace process to insert data into the kernel page cache
> >   without marking the page dirty. This would allow the application
> >   to request that the OS use the application copy of data as page
> >   cache if it does not have a copy already. The difficulty here
> >   is that the application has no way of knowing if something else
> >   has altered the underlying file in the meantime via something like
> >   direct IO. Granted, such activity has probably corrupted the database
> >   already but initial reactions are that this is not a safe interface
> >   and there are coherency concerns.
> 
> I was one of the people suggesting that capability in this thread (after
> pondering about it on the back on my mind for quite some time), and I
> first though it would never be acceptable for pretty much those
> reasons.
> But on second thought I don't think that line of argument makes too much
> sense. If such an API would require write permissions on the file -
> which it surely would - it wouldn't allow an application to do anything
> it previously wasn't able to.
> And I don't see the dangers of concurrent direct IO as anything
> new. Right now the page's contents reside in userspace memory and aren't
> synced in any way with either the page cache or the actual on disk
> state. And afaik there are already several data races if a file is
> modified and read both via the page cache and direct io.
> 

All of this is true.  The objections may not hold up over time and it may
be seem much more reasonable when/if the easier stuff is addressed.

> The scheme that'd allow us is the following:
> When postgres reads a data page, it will continue to first look up the
> page in its shared buffers, if it's not there, it will perform a page
> cache backed read, but instruct that read to immediately remove from the
> page cache afterwards (new API or, posix_fadvise() or whatever).
> As long
> as it's in shared_buffers, postgres will not need to issue new reads, so
> there's no no benefit keeping it in the page cache.
> If the page is dirtied, it will be written out normally telling the
> kernel to forget about the caching the page (using 3) or possibly direct
> io).
> When a page in postgres's buffers (which wouldn't be set to very large
> values) isn't needed anymore and *not* dirty, it will seed the kernel
> page cache with the current data.
> 

Ordinarily the initial read page could be discarded with fadvise but
the later write would cause the data to be read back in again which is a
waste. The details of avoiding that re-read are tricky from a core kernel
perspective because ordinarily the kernel at that point does not know if
the write is a full complete aligned write of an underlying filesystem
structure or not.  It may need a different write path which potentially
leads into needing changes to the address_space operations on a filesystem
basis -- that would get messy and be a Linux-specific extension. I have
not researched this properly at all, I could be way off but I have a
feeling the details get messy.

> Now, such a scheme wouldn't likely be zero-copy, but it would avoid
> double buffering.

It wouldn't be zero copy because minimally the data needs to be handed
over the filesystem for writing to the disk and the interface for that is
offset,length based, not page based. Maybe sometimes it will be zero copy
but it would be a filesystem-specific thing.

> I think the cost of buffer copying has been overstated
> in this thread... he major advantage is that all that could easily
> implemented in a very localized manner, without hurting other OSs and it
> could easily degrade on kernels not providing that capability, which
> would surely be the majority of installations for

Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread David Rowley
On Sat, Jan 18, 2014 at 6:50 AM, Andrew Dunstan  wrote:

>
> On 01/17/2014 12:43 PM, Tom Lane wrote:
>
>> Andrew Dunstan  writes:
>>
>>> It looks like what we need to fix the immediate problem is to mark
>>> set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these
>>> symbols we might need to make visible?
>>>
>> We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
>> Add that one and see what happens ...
>>
>>
>>
>
>
> OK, done.
>
>
Great, that fixes it on my end.

Thanks for fixing those.

Regards
David Rowley



> cheers
>
> andrew
>
>


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 12:43 PM, Tom Lane wrote:

Andrew Dunstan  writes:

It looks like what we need to fix the immediate problem is to mark
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these
symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
Add that one and see what happens ...





OK, done.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor improvements to sslinfo contrib module

2014-01-17 Thread Gurjeet Singh
Please find attached the patch that fixes a couple of comments, and adds
'static' qualifier to functions that are not used anywhere else in the code
base.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com 
diff --git a/contrib/sslinfo/sslinfo.c b/contrib/sslinfo/sslinfo.c
index 7a58470..18405f6 100644
--- a/contrib/sslinfo/sslinfo.c
+++ b/contrib/sslinfo/sslinfo.c
@@ -31,9 +31,10 @@ Datum		ssl_client_dn_field(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_field(PG_FUNCTION_ARGS);
 Datum		ssl_client_dn(PG_FUNCTION_ARGS);
 Datum		ssl_issuer_dn(PG_FUNCTION_ARGS);
-Datum		X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
-Datum		X509_NAME_to_text(X509_NAME *name);
-Datum		ASN1_STRING_to_text(ASN1_STRING *str);
+
+static Datum	X509_NAME_field_to_text(X509_NAME *name, text *fieldName);
+static Datum	X509_NAME_to_text(X509_NAME *name);
+static Datum	ASN1_STRING_to_text(ASN1_STRING *str);
 
 
 /*
@@ -51,7 +52,7 @@ ssl_is_used(PG_FUNCTION_ARGS)
 
 
 /*
- * Returns SSL cipher currently in use.
+ * Returns SSL version currently in use.
  */
 PG_FUNCTION_INFO_V1(ssl_version);
 Datum
@@ -77,7 +78,7 @@ ssl_cipher(PG_FUNCTION_ARGS)
 
 
 /*
- * Indicates whether current client have provided a certificate
+ * Indicates whether current client provided a certificate
  *
  * Function has no arguments.  Returns bool.  True if current session
  * is SSL session and client certificate is verified, otherwise false.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Tom Lane
Andrew Dunstan  writes:
> It looks like what we need to fix the immediate problem is to mark 
> set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these 
> symbols we might need to make visible?

We've generally relied on the buildfarm to cue us to add PGDLLIMPORT.
Add that one and see what happens ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 11:51 AM, Andrew Dunstan wrote:


On 01/17/2014 11:26 AM, David Rowley wrote:
I've not been able to build master for a few days now on visual 
studios due some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are 
generated so I've not yet managed to come up with a fix just yet.


However I have attached a fix for the many compiler warnings that 
currawong has been seeing since 9c14dd2



That seems perfe4ctly sane, so I have committed it. I'll look into the 
other issue.





It looks like what we need to fix the immediate problem is to mark 
set_latch_on_sigusr1 with PGDLLIMPORT. Is that the only one of these 
symbols we might need to make visible?


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Magnus Hagander
On Fri, Jan 17, 2014 at 5:51 PM, Andrew Dunstan  wrote:

>
> On 01/17/2014 11:26 AM, David Rowley wrote:
>
>> I've not been able to build master for a few days now on visual studios
>> due some problems in the new test_shm_mq contrib module.
>> I'm not too familiar with how the visual studio project files are
>> generated so I've not yet managed to come up with a fix just yet.
>>
>> However I have attached a fix for the many compiler warnings that
>> currawong has been seeing since 9c14dd2
>>
>
>
> That seems perfe4ctly sane, so I have committed it. I'll look into the
> other issue.
>

Thanks.

I could've sworn I committed it that way, but I see now that it's sitting
unstaged in my working directory...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Andres Freund
Hi Mel,

On 2014-01-17 16:31:48 +, Mel Gorman wrote:
> Direct IO, buffered IO, double buffering and wishlists
> --
>3. Hint that a page should be dropped immediately when IO completes.
>   There is already something like this buried in the kernel internals
>   and sometimes called "immediate reclaim" which comes into play when
>   pages are bgin invalidated. It should just be a case of investigating
>   if that is visible to userspace, if not why not and do it in a
>   semi-sensible fashion.

"bgin invalidated"?

Generally, +1 on the capability to achieve such a behaviour from
userspace.

>7. Allow userspace process to insert data into the kernel page cache
>   without marking the page dirty. This would allow the application
>   to request that the OS use the application copy of data as page
>   cache if it does not have a copy already. The difficulty here
>   is that the application has no way of knowing if something else
>   has altered the underlying file in the meantime via something like
>   direct IO. Granted, such activity has probably corrupted the database
>   already but initial reactions are that this is not a safe interface
>   and there are coherency concerns.

I was one of the people suggesting that capability in this thread (after
pondering about it on the back on my mind for quite some time), and I
first though it would never be acceptable for pretty much those
reasons.
But on second thought I don't think that line of argument makes too much
sense. If such an API would require write permissions on the file -
which it surely would - it wouldn't allow an application to do anything
it previously wasn't able to.
And I don't see the dangers of concurrent direct IO as anything
new. Right now the page's contents reside in userspace memory and aren't
synced in any way with either the page cache or the actual on disk
state. And afaik there are already several data races if a file is
modified and read both via the page cache and direct io.

The scheme that'd allow us is the following:
When postgres reads a data page, it will continue to first look up the
page in its shared buffers, if it's not there, it will perform a page
cache backed read, but instruct that read to immediately remove from the
page cache afterwards (new API or, posix_fadvise() or whatever). As long
as it's in shared_buffers, postgres will not need to issue new reads, so
there's no no benefit keeping it in the page cache.
If the page is dirtied, it will be written out normally telling the
kernel to forget about the caching the page (using 3) or possibly direct
io).
When a page in postgres's buffers (which wouldn't be set to very large
values) isn't needed anymore and *not* dirty, it will seed the kernel
page cache with the current data.

Now, such a scheme wouldn't likely be zero-copy, but it would avoid
double buffering. I think the cost of buffer copying has been overstated
in this thread... he major advantage is that all that could easily
implemented in a very localized manner, without hurting other OSs and it
could easily degrade on kernels not providing that capability, which
would surely be the majority of installations for the next couple of
cases.

So, I think such an interface would be hugely beneficial - and I'd be
surprised if other applications couldn't reuse it. And I don't think
it'd be all that hard to implement on the kernel side?

>   Dave Chinner asked "why, exactly, do you even need the kernel page
>   cache here?"  when Postgres already knows how and when data should
>   be written back to disk. The answer boiled down to "To let kernel do
>   the job that it is good at, namely managing the write-back of dirty
>   buffers to disk and to manage (possible) read-ahead pages". Postgres
>   has some ordering requirements but it does not want to be responsible
>   for all cache replacement and IO scheduling. Hannu Krosing summarised
>   it best as

The other part is that using the page cache for the majority of warm,
but not burning hot pages, allows the kernel to much more sensibly adapt
to concurrent workloads requiring memory in some form or other (possibly
giving it to other VMs when mostly idle and such).

>8. Allow copy-on-write of page-cache pages to anonymous. This would limit
>   the double ram usage to some extent. It's not as simple as having a
>   MAP_PRIVATE mapping of a file-backed page because presumably they want
>   this data in a shared buffer shared between Postgres processes. The
>   implementation details of something like this are hairy because it's
>   mmap()-like but not mmap() as it does not have the same writeback
>   semantics due to the write ordering requirements Postgres has for
>   database integrity.

>9. Hint that a page in an anonymous buffer is a copy of a page cache
>page and invalidate the page c

Re: [HACKERS] [PATCH] Filter error log statements by sqlstate

2014-01-17 Thread Tom Lane
Oskari Saarenmaa  writes:
> 17.01.2014 00:13, Tom Lane kirjoitti:
>> I find it hard to follow exactly what the use-case for this is; could
>> you make a defense of why we should carry a feature like this?

> I usually run PG with log_min_messages = warning and 
> log_min_error_statement = error to log any unexpected errors.  But as I 
> have a lot of check constraints in my database as well as a lot of 
> plpgsql and plpython code which raises exceptions on invalid user input 
> I also get tons of logs for statements causing "expected" errors which I 
> have to try to filter out with other tools.

But if the goal is to reduce clutter in your log, wouldn't you wish
to suppress the *entire* log entry for "expected" errors, not just the
SQL-statement field?  Certainly all the previous discussion about this
type of feature (and there has been plenty) has presumed that you'd want
to suppress whole entries.

>> In any case, I'm finding it hard to believe that filtering by individual
>> SQLSTATEs is a practical API.

> I don't know about others, but filtering by individual SQLSTATE is 
> exactly what I need - I don't want to suppress all plpgsql errors or 
> constraint violations, but I may want to suppress plpgsql RAISE 
> EXCEPTION and CHECK violations.

Meh.  Again, there's been lots of prior discussion, and I think there's
consensus that a filtering API based solely on a list of SQLSTATEs
wouldn't be widely adequate.  The most recent discussion I can find
about this is in this thread:

http://www.postgresql.org/message-id/flat/20131205204512.gd6...@eldon.alvh.no-ip.org#20131205204512.gd6...@eldon.alvh.no-ip.org

That thread references an older one

http://www.postgresql.org/message-id/flat/19791.1335902...@sss.pgh.pa.us#19791.1335902...@sss.pgh.pa.us

and I'm pretty sure that there are several others you could find with
a bit of digging.  The more ambitious proposals required decorating
ereport calls with a new kind of severity labeling, reflecting how
likely it'd be that DBAs would want to read about the particular
error in their logs.  That's be a lot of work though, and would require
us to make a lot of value judgements that might be wrong.  The main
alternative that was discussed was to filter on the basis of SQLSTATE
classes, and maybe relocate a few specific ERRCODEs to different classes
if it seemed that they were a lot unlike other things in their class.

It hasn't really been proven that SQLSTATE-class filtering would work
conveniently, but AFAICS the only way to prove or disprove that is for
somebody to code it up and use it in production.

What I'd suggest is that you revise this patch so that it can handle
filtering on the basis of either individual SQLSTATEs or whole classes,
the former being able to override the latter.  With a bit of wholesale
notation invention, an example could be

log_sqlstates = 'P0,!P0001,!42,42P05'

which would mean "log all messages in class P0, except don't log P0001;
don't log anything in class 42, except always log 42P05; for everything
else, log according to log_min_messages".

If you don't like that notation, feel free to propose another.  I did
not like the one you had to start with, though, because it looked like
it was actually changing the error severity level, not just the log or
don't log decision.

BTW, I'd suggest that this filtering only happen for messages < PANIC
level; it's pretty hard to believe that anybody would ever want to
suppress a PANIC report.

Another thought here is that if you're willing to have the filter
only able to *prevent* logging, and not to let it *cause* logging
of messages that would otherwise be suppressed by log_min_messages,
it could be implemented as a loadable module using the emit_log_hook.
An experimental feature, which is what this really is, is always a lot
easier sell in that format since anybody who finds it useless needn't
pay the overhead (which I'm still concerned about ...).  But I'm not
sure how important it might be to be able to upgrade a message's log
priority, so maybe that approach would be significantly less usable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] currawong is not a happy animal

2014-01-17 Thread Andrew Dunstan


On 01/17/2014 11:26 AM, David Rowley wrote:
I've not been able to build master for a few days now on visual 
studios due some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are 
generated so I've not yet managed to come up with a fix just yet.


However I have attached a fix for the many compiler warnings 
that currawong has been seeing since 9c14dd2



That seems perfe4ctly sane, so I have committed it. I'll look into the 
other issue.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (summary v2 2014-1-17)

2014-01-17 Thread Mel Gorman
On Wed, Jan 15, 2014 at 02:14:08PM +, Mel Gorman wrote:
> > One assumption would be that Postgres is perfectly happy with the current
> > kernel behaviour in which case our discussion here is done.
> 
> It has been demonstrated that this statement was farcical.  The thread is
> massive just from interaction with the LSF/MM program committee. I'm hoping
> that there will be Postgres representation at LSF/MM this year to bring
> the issues to a wider audience. I expect that LSF/MM can only commit to
> one person attending the whole summit due to limited seats but we could
> be more more flexible for the Postgres track itself so informal meetings
> can be arranged for the evenings and at collab summit.
> 

We still have not decided on a person that can definitely attend but we'll
get back to that shortly. I wanted to revise the summary mail so that
there is a record that can be easily digested without trawling through
archives. As before if I missed something important, prioritised poorly
or emphasised incorrectly then shout at me.

On testing of modern kernels


Josh Berkus claims that most people are using Postgres with 2.6.19 and
consequently there may be poor awareness of recent kernel developments.
This is a disturbingly large window of opportunity for problems to have
been introduced.

Minimally, Postgres has concerns about IO-related stalls which may or may
not exist in current kernels. There were indications that large writes
starve reads. There have been variants of this style of bug in the past but
it's unclear what the exact shape of this problem is and if IO-less dirty
throttling affected it. It is possible that Postgres was burned in the past
by data being written back from reclaim context in low memory situations.
That would have looked like massive stalls with drops in IO throughput
but it was fixed in relatively recent kernels. Any data on historical
tests would be helpful. Alternatively, a pgbench-based reproduction test
could potentially be used by people in the kernel community that track
performance over time and have access to a suitable testing rig.

Postgres bug reports and LKML
-

It is claimed that LKML does not welcome bug reports but it's less clear
what the basis of this claim is.  Is it because the reports are ignored? A
possible explanation is that they are simply getting lost in the LKML noise
and there would be better luck if the bug report was cc'd to a specific
subsystem list. A second possibility is the bug report is against an old
kernel and unless it is reproduced on a recent kernel the bug report will
be ignored. Finally it is possible that there is not enough data available
to debug the problem. The worst explanation is that to date the problem
has not been fixable but the details of this have been lost and are now
unknown. Is is possible that some of these bug reports can be refreshed
so at least there is a chance they get addressed?

Apparently there were changes to the reclaim algorithms that crippled
performance without any sysctls. The problem may be compounded by the
introduction of adaptive replacement cache in the shape of the thrash
detection patches currently being reviewed.  Postgres investigated the
use of ARC in the past and ultimately abandoned it. Details are in the
archives (http://www.Postgres.org/search/?m=1&q=arc&l=1&d=-1&s=r). I
have not read then, just noting they exist for future reference.

Sysctls to control VM behaviour are not popular as such tuning parameters
are often used as an excuse to not properly fix the problem. Would it be
possible to describe a test case that shows 2.6.19 performing well and a
modern kernel failing? That would give the VM people a concrete basis to
work from to either fix the problem or identify exactly what sysctls are
required to make this work.

I am confident that any bug related to VM reclaim in this area has been lost.
At least, I recall no instances of it being discussed on linux-mm and it
has not featured on LSF/MM during the last years.

IO Scheduling
-

Kevin Grittner has stated that it is known that the DEADLINE and NOOP
schedulers perform better than any alternatives for most database loads.
It would be desirable to quantify this for some test case and see can the
default scheduler cope in some way.

The deadline scheduler makes sense to a large extent though. Postgres
is sensitive to large latencies due to IO write spikes. It is at least
plausible that deadline would give more deterministic behaviour for
parallel reads in the presence of large writes assuming there were not
ordering problems between the reads/writes and the underlying filesystem.

For reference, these IO spikes can be massive. If the shared buffer is
completely dirtied in a short space of time then it could be 20-25% of
RAM being dirtied and writeback required in typical configurations. There
have been cases where it was worked around by limiting the size of the
shared

[HACKERS] currawong is not a happy animal

2014-01-17 Thread David Rowley
I've not been able to build master for a few days now on visual studios due
some problems in the new test_shm_mq contrib module.
I'm not too familiar with how the visual studio project files are generated
so I've not yet managed to come up with a fix just yet.

However I have attached a fix for the many compiler warnings that currawong
has been seeing since 9c14dd2


define_WIN32.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Hannu Krosing
On 01/17/2014 06:40 AM, Dave Chinner wrote:
> On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
>> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner  wrote:
>>> But there's something here that I'm not getting - you're talking
>>> about a data set that you want ot keep cache resident that is at
>>> least an order of magnitude larger than the cyclic 5-15 minute WAL
>>> dataset that ongoing operations need to manage to avoid IO storms.
>>> Where do these temporary files fit into this picture, how fast do
>>> they grow and why are do they need to be so large in comparison to
>>> the ongoing modifications being made to the database?
> [ snip ]
>
>> Temp files are something else again.  If PostgreSQL needs to sort a
>> small amount of data, like a kilobyte, it'll use quicksort.  But if it
>> needs to sort a large amount of data, like a terabyte, it'll use a
>> merge sort.[1] 
> IOWs the temp files contain data that requires transformation as
> part of a query operation. So, temp file size is bound by the
> dataset, 
Basically yes, though the size of the "dataset" can be orders of
magnitude bigger than the database in case of some queries.
> growth determined by data retreival and transformation
> rate.
>
> IOWs, there are two very different IO and caching requirements in
> play here and tuning the kernel for one actively degrades the
> performance of the other. Right, got it now.
Yes. A step in right solutions would be some way to tune this
on per-device basis, but as large part of this in linux seems
to be driven from the keeping-vm-clean side it guess it will
be far from simple.
>
> Cheers,
>
> Dave.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Logging SSL connections

2014-01-17 Thread Tom Lane
Magnus Hagander  writes:
> Applied, thanks!

Minor bikeshedding: the messages would read better, to my eye, as

"user=%s database=%s SSL enabled (protocol=%s, cipher=%s)"

Putting "enabled" where it is requires extra mental gymnastics on
the part of the reader.  And why the random change between "="
in one part of the string and ": " in the new part?  You could take
that last point a bit further and make it

"user=%s database=%s SSL=enabled (protocol=%s, cipher=%s)"

but I'm not sure if that's an improvement.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Funny representation in pg_stat_statements.query.

2014-01-17 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Hello, I noticed that pg_stat_statements.query can have funny values.

I don't think that's an acceptable reason for lobotomizing the parser's
ability to print error cursors, which is what your first patch does
(and without even any documentation that would keep someone from
changing it back).

The second patch might be all right, though I'm a bit disturbed by its
dependency on gram.h constants.  The numeric values of those change on a
regular basis, and who's to say that these are exactly the set of tokens
that we care about?

In the end, really the cleanest fix for this would be to get rid of the
grammar's translation of these special functions into hacky expressions.
They ought to get translated into some new node type(s) that could be
reverse-listed in standard form by ruleutils.c.  I've wanted to do that
for years, but never got around to it ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Mel Gorman
On Thu, Jan 16, 2014 at 04:30:59PM -0800, Jeff Janes wrote:
> On Wed, Jan 15, 2014 at 2:08 AM, Mel Gorman  wrote:
> 
> > On Tue, Jan 14, 2014 at 09:30:19AM -0800, Jeff Janes wrote:
> > > >
> > > > That could be something we look at. There are cases buried deep in the
> > > > VM where pages get shuffled to the end of the LRU and get tagged for
> > > > reclaim as soon as possible. Maybe you need access to something like
> > > > that via posix_fadvise to say "reclaim this page if you need memory but
> > > > leave it resident if there is no memory pressure" or something similar.
> > > > Not exactly sure what that interface would look like or offhand how it
> > > > could be reliably implemented.
> > > >
> > >
> > > I think the "reclaim this page if you need memory but leave it resident
> > if
> > > there is no memory pressure" hint would be more useful for temporary
> > > working files than for what was being discussed above (shared buffers).
> > >  When I do work that needs large temporary files, I often see physical
> > > write IO spike but physical read IO does not.  I interpret that to mean
> > > that the temporary data is being written to disk to satisfy either
> > > dirty_expire_centisecs or dirty_*bytes, but the data remains in the FS
> > > cache and so disk reads are not needed to satisfy it.  So a hint that
> > says
> > > "this file will never be fsynced so please ignore dirty_*bytes and
> > > dirty_expire_centisecs.
> >
> > It would be good to know if dirty_expire_centisecs or dirty ratio|bytes
> > were the problem here.
> 
> 
> Is there an easy way to tell?  I would guess it has to be at least
> dirty_expire_centisecs, if not both, as a very large sort operation takes a
> lot more than 30 seconds to complete.
> 

There is not an easy way to tell. To be 100%, it would require an
instrumentation patch or a systemtap script to detect when a particular page
is being written back and track the context. There are approximations though.
Monitor nr_dirty pages over time. If at the time of the stall there are fewer
dirty pages than allowed by dirty_ratio then the dirty_expire_centisecs
kicked in. That or monitor the process for stalls, when it stalls check
/proc/PID/stack and see if it's stuck in balance_dirty_pages or something
similar which would indicate the process hit dirty_ratio.

> > An interface that forces a dirty page to stay dirty
> > regardless of the global system would be a major hazard. It potentially
> > allows the creator of the temporary file to stall all other processes
> > dirtying pages for an unbounded period of time.
> 
> Are the dirty ratio/bytes limits the mechanisms by which adequate clean
> memory is maintained? 

Yes, for file-backed pages.

> I thought those were there just to but a limit on
> long it would take to execute a sync call should one be issued, and there
> were other setting which said how much clean memory to maintain.  It should
> definitely write out the pages if it needs the memory for other things,
> just not write them out due to fear of how long it would take to sync it if
> a sync was called.  (And if it needs the memory, it should be able to write
> it out quickly as the writes would be mostly sequential, not
> random--although how the kernel can believe me that that will always be the
> case could a problem)
> 

It has been suggested on more than one occasion that a more sensible
interface would be to "do not allow more dirty data than it takes N seconds
to writeback". The details of how to implement this are tricky and no one
has taken up the challenge yet.

> > I proposed in another part
> > of the thread a hint for open inodes to have the background writer thread
> > ignore dirty pages belonging to that inode. Dirty limits and fsync would
> > still be obeyed. It might also be workable for temporary files but the
> > proposal could be full of holes.
> >
> 
> If calling fsync would fail with an error, would that lower the risk of DoS?
> 

I do not understand the proposal. If there are pages that must remain
dirty and the kernel cannot touch then there will be the risk that
dirty_ratio number of pages are all untouchable and the system livelocks
until userspace takes an action.

That still leaves the possibility of flagging temp pages that should
only be written to disk if the kernel really needs to.

-- 
Mel Gorman
SUSE Labs


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Heikki Linnakangas
It occurs to me that this is very similar to the method I proposed in 
June to enforce a hard limit on WAL usage, to avoid PANIC caused by 
running out of disk space when writing WAL:


http://www.postgresql.org/message-id/51b095fe.6050...@vmware.com

Enforcing a global limit needs more book-keeping than rate limiting 
individual sesssions. But I'm hoping that the CHECK_WAL_BUDGET() calls 
could be used for both purposes.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Heikki Linnakangas

On 01/17/2014 05:20 PM, Simon Riggs wrote:

+   if (RelationNeedsWAL(indexrel))
+   CHECK_FOR_WAL_BUDGET();


I don't think we need the RelationNeedsWAL tests. If the relation is not 
WAL-logged, you won't write much WAL, so you should easily stay under 
the limit. And CHECK_FOR_WAL_BUDGET() better be cheap when you're below 
the limit.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Simon Riggs
On 16 January 2014 17:20, Simon Riggs  wrote:

> Thank you everyone for responding so positively to this proposal. It
> is something many users have complained about.
>
> In time, I think we might want both WAL Rate Limiting and I/O Rate
> Limiting. IMHO I/O rate limiting is harder and so this proposal is
> restricted solely to WAL rate limiting.
>
> I'm comfortable with a session level parameter. I was mulling over a
> wal_rate_limit_scope parameter but I think that is too much.
> I will follow Craig's proposal to define this in terms of MB/s, adding
> that I would calc from beginning of statement to now, so it is
> averaged rate. Setting of zero means unlimited. The rate would be
> per-session (in this release) rather than a globally calculated
> setting.

Parameter renamed to wal_rate_limit.

Not yet modified to produce this in MB/s

> I would like to call it CHECK_FOR_WAL_RATE_LIMIT()

Used CHECK_WAL_BUDGET() as proposed

> That seems like a cheap enough call to allow it to be added in more
> places, so my expanded list of commands touched are
>  CLUSTER/VACUUM FULL
>  ALTER TABLE (only in phase 3 table rewrite)
>  ALTER TABLE SET TABLESPACE
>  CREATE INDEX
> which are already there, plus new ones suggested/implied
>  COPY
>  CREATE TABLE AS SELECT
>  INSERT/UPDATE/DELETE

Added, no problems or technical difficulties encountered.

> Please let me know if I missed something (rather than debating what is
> or is not a "maintenance" command).
>
> Any other thoughts before I cut v2 ?

v2 attached

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


wal_rate_limiting.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Tom Lane
Andres Freund  writes:
> On 2014-01-17 09:04:54 -0500, Robert Haas wrote:
>> That having been said, I bet it could be done at the tail of
>> XLogInsert().

> I don't think there are many locations where this would be ok. Sleeping
> while holding exclusive buffer locks? Quite possibly inside a criticial
> section?

More or less by definition, you're always doing both when you call
XLogInsert.

> Surely not.

I agree.  It's got to be somewhere further up the call stack.

I'm inclined to think that what we ought to do is reconceptualize
vacuum_delay_point() as something a bit more generic, and sprinkle
calls to it in a few more places than now.

It's also interesting to wonder about the relationship to
CHECK_FOR_INTERRUPTS --- although I think that currently, we assume
that that's *cheap* (1 test and branch) as long as nothing is pending.
I don't want to see a bunch of arithmetic added to it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread Florian Pflug
On Jan17, 2014, at 15:14 , David Rowley  wrote:
> If you make any more changes would it be possible for you to base them on the 
> attached patch instead of the last one you sent?

Sure! The only reason I didn't rebase my last patch onto yours was that having 
the numeric stuff in there meant potentially better test coverage. Thanks for 
doing the merge!

> Perhaps if there's much more hacking to do we could start pushing to a guthub 
> repo to make it easier to work together on this.

Yeah, that seems like a good idea. Easier to keep track of then a bunch of 
patches floating around. Could you push your latest version?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Negative Transition Aggregate Functions (WIP)

2014-01-17 Thread David Rowley
On Fri, Jan 17, 2014 at 3:05 PM, Florian Pflug  wrote:

> I had some more fun with this, the result is v2.5 of the patch (attached).
> Changes are explained below.
>
>
Looks like you've been busy on this! Thank you for implementing all the
changes you talked about.
I've now started working the 2.5 patch you sent and I've ripped out all the
numeric inverse transition functions on that patch and implemented inverse
transitions for max and min using the new NULL returning method that you've
implemented. I've also made a very fast pass and fixed up a few comments
and some random white space. I've not gotten to the documents yet.

I did add a whole bunch of regression tests for all of the inverse
transition functions that are used for max and min. I'm just calling these
like:
SELECT int4larger_inv(3,2),int4larger_inv(3,3),int4larger_inv(3,4);
Rather than using the aggregate as a window function each time. I thought
it was better to validate each of those functions individually then put in
various tests that target a smaller number of aggregates with various
inputs.

A few things still to do that I'll try and get to soon.
1. More testing
2. Docs updated.
3. Perhaps I'll look at adding bitand and bitor inverse transition functions
4. ?

Thanks again for putting so much effort into this.
If you make any more changes would it be possible for you to base them on
the attached patch instead of the last one you sent?

Perhaps if there's much more hacking to do we could start pushing to a
guthub repo to make it easier to work together on this.

Regards

David Rowley


inverse_transition_functions_v2.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Andres Freund
On 2014-01-17 09:04:54 -0500, Robert Haas wrote:
> That having been said, I bet it could be done at the tail of
> XLogInsert().  if there are cases where that's not desirable, then add
> macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump
> a counter up and down.  When the counter is >0, XLogInsert() doesn't
> sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also
> considers sleeping.  I suspect only a few places would need to do
> this, like where we're holding one of the SLRU locks.

I don't think there are many locations where this would be ok. Sleeping
while holding exclusive buffer locks? Quite possibly inside a criticial
section?
Surely not.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL Rate Limiting

2014-01-17 Thread Robert Haas
On Thu, Jan 16, 2014 at 10:56 AM, Andres Freund  wrote:
> That'd be most places doing XLogInsert() if you want to be
> consistent. Each needing to be analyzed whether we're blocking important
> resources, determining where in the callstack we can do the
> CHECK_FOR_WAL_BUDGET().

And doing that probably wouldn't be a problem except for the fact that
this patch is showing up at the absolute very last minute with
multiple unresolved design considerations.  I'm with Tom: this should
be postponed to 9.5.

That having been said, I bet it could be done at the tail of
XLogInsert().  if there are cases where that's not desirable, then add
macros HOLDOFF_WAL_THROTTLING() and RESUME_WAL_THROTTLING() that bump
a counter up and down.  When the counter is >0, XLogInsert() doesn't
sleep; when RESUME_WAL_THROTTLING() drops the counter to 0, it also
considers sleeping.  I suspect only a few places would need to do
this, like where we're holding one of the SLRU locks.  Some testing
would be needed to figure out exactly which places cause problems, but
that's why it's a good idea to start discussion of your proposed
feature before January 14th.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Robert Haas
On Fri, Jan 17, 2014 at 7:34 AM, Jeff Layton  wrote:
> So this says to me that the WAL is a place where DIO should really be
> reconsidered. It's mostly sequential writes that need to hit the disk
> ASAP, and you need to know that they have hit the disk before you can
> proceed with other operations.

Ironically enough, we actually *have* an option to use O_DIRECT here.
But it doesn't work well.  See below.

> Also, is the WAL actually ever read under normal (non-recovery)
> conditions or is it write-only under normal operation? If it's seldom
> read, then using DIO for them also avoids some double buffering since
> they wouldn't go through pagecache.

This is the first problem: if replication is in use, then the WAL gets
read shortly after it gets written.  Using O_DIRECT bypasses the
kernel cache for the writes, but then the reads stink.  However, if
you configure wal_sync_method=open_sync and disable replication, then
you will in fact get O_DIRECT|O_SYNC behavior.

But that still doesn't work out very well, because now the guy who
does the write() has to wait for it to finish before he can do
anything else.  That's not always what we want, because WAL gets
written out from our internal buffers for multiple different reasons.
If we're forcing the WAL out to disk because of transaction commit or
because we need to write the buffer protected by a certain WAL record
only after the WAL hits the platter, then it's fine.  But sometimes
we're writing WAL just because we've run out of internal buffer space,
and we don't want to block waiting for the write to complete.  Opening
the file with O_SYNC deprives us of the ability to control the timing
of the sync relative to the timing of the write.

> Again, I think this discussion would really benefit from an outline of
> the different files used by pgsql, and what sort of data access
> patterns you expect with them.

I think I more or less did that in my previous email, but here it is
again in briefer form:

- WAL files are written (and sometimes read) sequentially and fsync'd
very frequently and it's always good to write the data out to disk as
soon as possible
- Temp files are written and read sequentially and never fsync'd.
They should only be written to disk when memory pressure demands it
(but are a good candidate when that situation comes up)
- Data files are read and written randomly.  They are fsync'd at
checkpoint time; between checkpoints, it's best not to write them
sooner than necessary, but when the checkpoint arrives, they all need
to get out to the disk without bringing the system to a standstill

We have other kinds of files, but off-hand I'm not thinking of any
that are really very interesting, apart from those.

Maybe it'll be useful to have hints that say "always write this file
to disk as quick as you can" and "always postpone writing this file to
disk for as long as you can" for WAL and temp files respectively.  But
the rule for the data files, which are the really important case, is
not so simple.  fsync() is actually a fine API except that it tends to
destroy system throughput.  Maybe what we need is just for fsync() to
be less aggressive, or a less aggressive version of it.  We wouldn't
mind waiting an almost arbitrarily long time for fsync to complete if
other processes could still get their I/O requests serviced in a
reasonable amount of time in the meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.consistent_into

2014-01-17 Thread Marti Raudsepp
On Wed, Jan 15, 2014 at 8:23 AM, Jim Nasby  wrote:
> Do we actually support = right now? We already support
>
> v_field := field FROM table ... ;
>
> and I think it's a bad idea to have different meaning for = and :=.

That was already discussed before. Yes, we support both = and := and
they have exactly the same behavior; I agree, we should keep them
equivalent.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:20, Magnus Hagander  wrote:
> On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown  wrote:
>>
>> On 17 January 2014 13:01, Magnus Hagander  wrote:
>> > Is there any real use-case for not setting wal_buffers to -1 these days?
>> >
>> > Or should we just remove it and use the -1 behaviour at all times?
>> >
>> > IIRC we discussed not keeping it at all when the autotune behavior was
>> > introduced, but said we wanted to keep it "just in case". If we're not
>> > ready
>> > to remove it, then does that just mean that we need to fix it so we can?
>>
>> Robert Haas reported that setting it to 32MB can yield a considerable
>> performance benefit:
>>
>>
>> http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com
>
>
> In that case, sholdn't the autotuning be changed to not limit it at 16MB? :)

Well, that's the question.  Do we have a heuristic sweet-spot that
folk would agree on?

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Jeff Layton
On Thu, 16 Jan 2014 20:48:24 -0500
Robert Haas  wrote:

> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner  wrote:
> > But there's something here that I'm not getting - you're talking
> > about a data set that you want ot keep cache resident that is at
> > least an order of magnitude larger than the cyclic 5-15 minute WAL
> > dataset that ongoing operations need to manage to avoid IO storms.
> > Where do these temporary files fit into this picture, how fast do
> > they grow and why are do they need to be so large in comparison to
> > the ongoing modifications being made to the database?
> 
> I'm not sure you've got that quite right.  WAL is fsync'd very
> frequently - on every commit, at the very least, and multiple times
> per second even there are no commits going on just to make sure we get
> it all down to the platter as fast as possible.  The thing that causes
> the I/O storm is the data file writes, which are performed either when
> we need to free up space in PostgreSQL's internal buffer pool (aka
> shared_buffers) or once per checkpoint interval (5-60 minutes) in any
> event.  The point of this system is that if we crash, we're going to
> need to replay all of the WAL to recover the data files to the proper
> state; but we don't want to keep WAL around forever, so we checkpoint
> periodically.  By writing all the data back to the underlying data
> files, checkpoints render older WAL segments irrelevant, at which
> point we can recycle those files before the disk fills up.
> 

So this says to me that the WAL is a place where DIO should really be
reconsidered. It's mostly sequential writes that need to hit the disk
ASAP, and you need to know that they have hit the disk before you can
proceed with other operations.

Also, is the WAL actually ever read under normal (non-recovery)
conditions or is it write-only under normal operation? If it's seldom
read, then using DIO for them also avoids some double buffering since
they wouldn't go through pagecache.

Again, I think this discussion would really benefit from an outline of
the different files used by pgsql, and what sort of data access
patterns you expect with them.

> Temp files are something else again.  If PostgreSQL needs to sort a
> small amount of data, like a kilobyte, it'll use quicksort.  But if it
> needs to sort a large amount of data, like a terabyte, it'll use a
> merge sort.[1]  The reason is of course that quicksort requires random
> access to work well; if parts of quicksort's working memory get paged
> out during the sort, your life sucks.  Merge sort (or at least our
> implementation of it) is slower overall, but it only accesses the data
> sequentially.  When we do a merge sort, we use files to simulate the
> tapes that Knuth had in mind when he wrote down the algorithm.  If the
> OS runs short of memory - because the sort is really big or just
> because of other memory pressure - it can page out the parts of the
> file we're not actively using without totally destroying performance.
> It'll be slow, of course, because disks always are, but not like
> quicksort would be if it started swapping.
> 
> I haven't actually experienced (or heard mentioned) the problem Jeff
> Janes is mentioning where temp files get written out to disk too
> aggressively; as mentioned before, the problems I've seen are usually
> the other way - stuff not getting written out aggressively enough.
> But it sounds plausible.  The OS only lets you set one policy, and if
> you make that file right for permanent data files that get
> checkpointed it could well be wrong for temp files that get thrown
> out.  Just stuffing the data on RAMFS will work for some
> installations, but might not be good if you actually do want to
> perform sorts whose size exceeds RAM.
> 
> BTW, I haven't heard anyone on pgsql-hackers say they'd be interesting
> in attending Collab on behalf of the PostgreSQL community.  Although
> the prospect of a cross-country flight is a somewhat depressing
> thought, it does sound pretty cool, so I'm potentially interested.  I
> have no idea what the procedure is here for moving forward though,
> especially since it sounds like there might be only one seat available
> and I don't know who else may wish to sit in it.
> 


-- 
Jeff Layton 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 08:48:24PM -0500, Robert Haas wrote:
> On Thu, Jan 16, 2014 at 7:31 PM, Dave Chinner  wrote:
> > But there's something here that I'm not getting - you're talking
> > about a data set that you want ot keep cache resident that is at
> > least an order of magnitude larger than the cyclic 5-15 minute WAL
> > dataset that ongoing operations need to manage to avoid IO storms.
> > Where do these temporary files fit into this picture, how fast do
> > they grow and why are do they need to be so large in comparison to
> > the ongoing modifications being made to the database?

[ snip ]

> Temp files are something else again.  If PostgreSQL needs to sort a
> small amount of data, like a kilobyte, it'll use quicksort.  But if it
> needs to sort a large amount of data, like a terabyte, it'll use a
> merge sort.[1] 

IOWs the temp files contain data that requires transformation as
part of a query operation. So, temp file size is bound by the
dataset, growth determined by data retreival and transformation
rate.

IOWs, there are two very different IO and caching requirements in
play here and tuning the kernel for one actively degrades the
performance of the other. Right, got it now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> On 1/15/14, 12:00 AM, Claudio Freire wrote:
> >My completely unproven theory is that swapping is overwhelmed by
> >near-misses. Ie: a process touches a page, and before it's
> >actually swapped in, another process touches it too, blocking on
> >the other process' read. But the second process doesn't account
> >for that page when evaluating predictive models (ie: read-ahead),
> >so the next I/O by process 2 is unexpected to the kernel. Then
> >the same with 1. Etc... In essence, swap, by a fluke of its
> >implementation, fails utterly to predict the I/O pattern, and
> >results in far sub-optimal reads.
> >
> >Explicit I/O is free from that effect, all read calls are
> >accountable, and that makes a difference.
> >
> >Maybe, if the kernel could be fixed in that respect, you could
> >consider mmap'd files as a suitable form of temporary storage.
> >But that would depend on the success and availability of such a
> >fix/patch.
> 
> Another option is to consider some of the more "radical" ideas in
> this thread, but only for temporary data. Our write sequencing and
> other needs are far less stringent for this stuff.  -- Jim C.

I suspect that a lot of the temporary data issues can be solved by
using tmpfs for temporary files

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-17 Thread Dave Chinner
On Thu, Jan 16, 2014 at 03:58:56PM -0800, Jeff Janes wrote:
> On Thu, Jan 16, 2014 at 3:23 PM, Dave Chinner  wrote:
> 
> > On Wed, Jan 15, 2014 at 06:14:18PM -0600, Jim Nasby wrote:
> > > On 1/15/14, 12:00 AM, Claudio Freire wrote:
> > > >My completely unproven theory is that swapping is overwhelmed by
> > > >near-misses. Ie: a process touches a page, and before it's
> > > >actually swapped in, another process touches it too, blocking on
> > > >the other process' read. But the second process doesn't account
> > > >for that page when evaluating predictive models (ie: read-ahead),
> > > >so the next I/O by process 2 is unexpected to the kernel. Then
> > > >the same with 1. Etc... In essence, swap, by a fluke of its
> > > >implementation, fails utterly to predict the I/O pattern, and
> > > >results in far sub-optimal reads.
> > > >
> > > >Explicit I/O is free from that effect, all read calls are
> > > >accountable, and that makes a difference.
> > > >
> > > >Maybe, if the kernel could be fixed in that respect, you could
> > > >consider mmap'd files as a suitable form of temporary storage.
> > > >But that would depend on the success and availability of such a
> > > >fix/patch.
> > >
> > > Another option is to consider some of the more "radical" ideas in
> > > this thread, but only for temporary data. Our write sequencing and
> > > other needs are far less stringent for this stuff.  -- Jim C.
> >
> > I suspect that a lot of the temporary data issues can be solved by
> > using tmpfs for temporary files
> >
> 
> Temp files can collectively reach hundreds of gigs.

So unless you have terabytes of RAM you're going to have to write
them back to disk.

But there's something here that I'm not getting - you're talking
about a data set that you want ot keep cache resident that is at
least an order of magnitude larger than the cyclic 5-15 minute WAL
dataset that ongoing operations need to manage to avoid IO storms.
Where do these temporary files fit into this picture, how fast do
they grow and why are do they need to be so large in comparison to
the ongoing modifications being made to the database?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
On Fri, Jan 17, 2014 at 2:07 PM, Thom Brown  wrote:

> On 17 January 2014 13:01, Magnus Hagander  wrote:
> > Is there any real use-case for not setting wal_buffers to -1 these days?
> >
> > Or should we just remove it and use the -1 behaviour at all times?
> >
> > IIRC we discussed not keeping it at all when the autotune behavior was
> > introduced, but said we wanted to keep it "just in case". If we're not
> ready
> > to remove it, then does that just mean that we need to fix it so we can?
>
> Robert Haas reported that setting it to 32MB can yield a considerable
> performance benefit:
>
>
> http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com


In that case, sholdn't the autotuning be changed to not limit it at 16MB?
:)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Andres Freund
Hi,

On 2014-01-17 14:01:56 +0100, Magnus Hagander wrote:
> Is there any real use-case for not setting wal_buffers to -1 these days?
> 
> Or should we just remove it and use the -1 behaviour at all times?

I have seen improvements by setting it larger than the max -1 one
value. Also, for some workloads (low latency) it can be beneficial to
have a small s_b but still have a larger wal_buffers setting.

So -1 for removing it from me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wal_buffers = -1

2014-01-17 Thread Thom Brown
On 17 January 2014 13:01, Magnus Hagander  wrote:
> Is there any real use-case for not setting wal_buffers to -1 these days?
>
> Or should we just remove it and use the -1 behaviour at all times?
>
> IIRC we discussed not keeping it at all when the autotune behavior was
> introduced, but said we wanted to keep it "just in case". If we're not ready
> to remove it, then does that just mean that we need to fix it so we can?

Robert Haas reported that setting it to 32MB can yield a considerable
performance benefit:

http://www.postgresql.org/message-id/CA+TgmobgMv_aaakLoasBt=5iYfi=kdcOUz0X9TdYe0c2SZ=2...@mail.gmail.com

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] wal_buffers = -1

2014-01-17 Thread Magnus Hagander
Is there any real use-case for not setting wal_buffers to -1 these days?

Or should we just remove it and use the -1 behaviour at all times?

IIRC we discussed not keeping it at all when the autotune behavior was
introduced, but said we wanted to keep it "just in case". If we're not
ready to remove it, then does that just mean that we need to fix it so we
can?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] patch: option --if-exists for pg_dump

2014-01-17 Thread Pavel Stehule
Hello


2014/1/16 Jeevan Chalke 

> Hi Pavel,
>
> I have reviewed the patch and here are my concerns and notes:
>
> POSITIVES:
> ---
> 1. Patch applies with some white-space errors.
> 2. make / make install / make check is smooth. No issues as such.
> 3. Feature looks good as well.
> 4. NO concern on overall design.
> 5. Good work.
>
>
> NEGATIVES:
> ---
>
> Here are the points which I see in the review and would like you to have
> your attention.
>
> 1.
> +It use conditional commands (with IF EXISTS
>
> Grammar mistakes. use => uses
>
> 2.
> @@ -55,7 +55,8 @@ static ArchiveHandle *_allocAH(const char *FileSpec,
> const ArchiveFormat fmt,
>  const int compression, ArchiveMode mode, SetupWorkerPtr
> setupWorkerPtr);
>  static void _getObjectDescription(PQExpBuffer buf, TocEntry *te,
>   ArchiveHandle *AH);
> -static void _printTocEntry(ArchiveHandle *AH, TocEntry *te,
> RestoreOptions *ropt, bool isData, bool acl_pass);
> +static void _printTocEntry(ArchiveHandle *AH, TocEntry *te,
> RestoreOptions *ropt,
> +bool isData, bool acl_pass);
>  static char *replace_line_endings(const char *str);
>  static void _doSetFixedOutputState(ArchiveHandle *AH);
>  static void _doSetSessionAuth(ArchiveHandle *AH, const char *user);
> @@ -234,6 +235,7 @@ RestoreArchive(Archive *AHX)
> boolparallel_mode;
> TocEntry   *te;
> OutputContext sav;
> +
>
> AH->stage = STAGE_INITIALIZING;
>
> @@ -2961,7 +3005,8 @@ _getObjectDescription(PQExpBuffer buf, TocEntry *te,
> ArchiveHandle *AH)
>  }
>
>  static void
> -_printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt,
> bool isData, bool acl_pass)
> +_printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt,
> bool isData,
> + bool acl_pass)
>  {
> /* ACLs are dumped only during acl pass */
> if (acl_pass)
>
> Above changes are NOT at all related to the patch. Please remove them even
> though they are clean-up like changes. Don't mix them with actual changes.
>
> 3.
> +   if (strncmp(te->dropStmt + desc_len + 5, " IF
> EXISTS", 9) != 0)
>
> " IF EXISTS" has 10 characters NOT 9.
>
> 4.
> +   if (strncmp(te->dropStmt + desc_len + 5, " IF
> EXISTS", 9) != 0)
> +   ahprintf(AH, "DROP %s IF EXISTS %s",
> + te->desc,
> + te->dropStmt + 6 + desc_len);
>
> Here you have used strncmp, starting at te->dropStmt + X,
> where X = desc_len + 5. While adding back you used X = 6 + desc_len.
> First time you used 5 as you added space in comparison but for adding back
> we
> want past space location and thus you have used 6. That's correct, but
> little
> bit confusing. Why not you simply used
> +   if (strstr(te->dropStmt, "IF EXISTS") != NULL)
> to check whether drop statement has "IF EXISTS" or not like you did at some
> other place. This will remove my concern 3 and above confusion as well.
> What you think ?
>
> 5.
> +   }
> +
> +   else
>
> Extra line before else part. Please remove it for consistency.
>
> 6.
> +   printf(_("  --if-exists  use IF EXISTS when dropping
> objects\n"));  (pg_dump)
> +   printf(_("  --if-exists  don't report error if cleaned
> object doesn't exist\n"));  (pg_dumpall)
> +   printf(_("  --if-exists  use IF EXISTS when dropping
> objects\n"));  (pg_restore)
>
> Please have same message for all three.
>
> 7.
> printf(_("  --binary-upgrade for use by upgrade utilities
> only\n"));
> printf(_("  --column-inserts dump data as INSERT commands
> with column names\n"));
> +   printf(_("  --if-exists  don't report error if cleaned
> object doesn't exist\n"));
> printf(_("  --disable-dollar-quoting disable dollar quoting, use
> SQL standard quoting\n"));
> printf(_("  --disable-triggers   disable triggers during
> data-only restore\n"));
>
> Please maintain order like pg_dump and pg_restore. Also at variable
> declaration
> and at options parsing mechanism.
>
>
I fixed a previous issue, see a attachment please


> 8.
> +   if (if_exists && !outputClean)
> +   exit_horribly(NULL, "option --if-exists requires -c/--clean
> option\n");
>
> Are we really want to exit when -c is not provided ? Can't we simply ignore
> --if-exists in that case (or with emitting a warning) ?
>
>
This behave is based on a talk related to proposal of this feature - and I
am thinking,  this behave is little bit safer - ignoring requested
functionality is not what I like. And a error message is simple and clean
in this case - is not difficult to use it and it is not difficult to fix
missing option for user

Regards

Pavel




> Marking "Waiting on author".
>
> Thanks
>
>
> --
> Jeevan B Chalke
> Principal Software Engineer, Product Development
> EnterpriseDB Corporation
> The E

Re: [HACKERS] Feature request: Logging SSL connections

2014-01-17 Thread Magnus Hagander
On Sun, Dec 8, 2013 at 10:27 AM, Marko Kreen  wrote:

> On Fri, Dec 06, 2013 at 02:53:27PM +0100, Dr. Andreas Kunert wrote:
> > >>Anything else missing?
> > >
> > >Functionally it's fine now, but I see few style problems:
> > >
> > >- "if (port->ssl > 0)" is wrong, ->ssl is pointer.  So use just
> > >   "if (port->ssl)".
> > >- Deeper indentation would look nicer with braces.
> > >- There are some duplicated message, could you restructure it so that
> > >   each message exists only once.
> >
> > New version is attached. I could add braces before and after the
> > ereport()-calls too, but then I need two more #ifdefs to catch the
> > closing braces.
>
> Thank you.  Looks good now.  I added it to next commitfest:
>
>   https://commitfest.postgresql.org/action/patch_view?id=1324
>
>

Applied, thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH] Fix double-inclusion of pg_config_os.h when building extensions with Visual Studio

2014-01-17 Thread Magnus Hagander
On Sat, Jan 11, 2014 at 8:57 AM, Craig Ringer  wrote:

> Hi all
>
> Related to the earlier comments about building extensions on Windows, I
> just noticed that we don't treat "WINDLL" as equivalent to "WIN32", and
> "WIN32" isn't set in a Visual Studio DLL project.
>
> We should actually be testing _WIN32, which is the compiler's
> pre-defined macro. The attached patch to c.h takes care of that - it
> tests _WIN32 in c.h and sets WIN32 early if it's found.
>
> _WIN32 is set for both win32 and win64, like we expect from WIN32.
>

Regardless of where the other thread goes, this seems like something we
should fix. Thus - applied, with minor changes to the comment, thanks.

My understanding is that this change alone doesn't actually help us very
much, so I haven't backpatched it anywhere. Let me know if that
understanding was incorrect, and it would actually help as a backpatch.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] [PATCH] pgcrypto: implement gen_random_uuid

2014-01-17 Thread Emre Hasegeli
2014/1/9 Oskari Saarenmaa :
> The only useful feature of the uuid-ossp module in my opinion is the
> uuid_generate_v4 function and as uuid-ossp is more or less abandonware
> people have had trouble building and installing it.  This patch implements
> an alternative uuid v4 generation function in pgcrypto which could be moved
> to core once there's a core PRNG with large enough internal state.

It is a small but very useful patch. Installing uuid-ossp can be very hard
on some systems. There is not much to review. The patch applies cleanly to
HEAD. The function is generating valid UUID version 4. The code and
the documentation style seems to fit in the pgcrypto extension. I am marking
it as "Ready for Commiter".

The problem is users probably would not look pgcrypto extension for
UUID generator, especially when there is another extension with uuid in
it's name. Also, UUID generator does not sound like a cryptographic function.
It would be much better, if this would be in core with the UUID type. There
is a reference on the UUID Type documentation page to the uuid-ossp
extension. We can add a reference to pgcrypro extension in that page and
consider adding a deprecation note to the uuid-ossp extension, if is is not
possible to add the function to the core, for now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Wed, Jan 15, 2014 at 10:47 AM, Alexander Korotkov
wrote:

> On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra  wrote:
>
>> On 14.1.2014 00:38, Tomas Vondra wrote:
>> > On 13.1.2014 18:07, Alexander Korotkov wrote:
>> >> On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra > >> > wrote:
>> >>
>> >> On 8.1.2014 22:58, Alexander Korotkov wrote:
>> >> > Thanks for reporting. Fixed version is attached.
>> >>
>> >> I've tried to rerun the 'archie' benchmark with the current patch,
>> and
>> >> once again I got
>> >>
>> >>PANIC:  could not split GIN page, didn't fit
>> >>
>> >> I reran it with '--enable-cassert' and with that I got
>> >>
>> >> TRAP: FailedAssertion("!(ginCompareItemPointers(&items[i - 1],
>> >>&items[i]) < 0)", File: "gindatapage.c", Line:
>> 149)
>> >> LOG:  server process (PID 5364) was terminated by signal 6: Aborted
>> >> DETAIL:  Failed process was running: INSERT INTO messages ...
>> >>
>> >> so the assert in GinDataLeafPageGetUncompressed fails for some
>> reason.
>> >>
>> >> I can easily reproduce it, but my knowledge in this area is rather
>> >> limited so I'm not entirely sure what to look for.
>> >>
>> >>
>> >> I've fixed this bug and many other bug. Now patch passes test suite
>> that
>> >> I've used earlier. The results are so:
>> >
>> > OK, it seems the bug is gone. However now there's a memory leak
>> > somewhere. I'm loading pgsql mailing list archives (~600k messages)
>> > using this script
>> >
>> >https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
>> >
>> > And after loading about 1/5 of the data, all the memory gets filled by
>> > the pgsql backends (loading the data in parallel) and the DB gets killed
>> > by the OOM killer.
>>
>> I've spent a fair amount of time trying to locate the memory leak, but
>> so far no luck. I'm not sufficiently familiar with the GIN code.
>>
>> I can however demonstrate that it's there, and I have rather simple test
>> case to reproduce it - basically just a CREATE INDEX on a table with ~1M
>> email message bodies (in a tsvector column). The data is available here
>> (360MB compressed, 1GB raw):
>>
>>http://www.fuzzy.cz/tmp/message-b.data.gz
>>
>> Simply create a single-column table, load data and create the index
>>
>>CREATE TABLE test ( body_tsvector TSVECTOR );
>>COPY test FROM '/tmp/message-b.data';
>>CREATE test_idx ON test USING gin test ( body_tsvector );
>>
>> I'm running this on a machine with 8GB of RAM, with these settings
>>
>>shared_buffers=1GB
>>maintenance_work_mem=1GB
>>
>> According to top, CREATE INDEX from the current HEAD never consumes more
>> than ~25% of RAM:
>>
>> PID USER  PR  NIVIRTRESSHR  %CPU %MEM  COMMAND
>>   32091 tomas 20   0 2026032 1,817g 1,040g  56,2 23,8  postgres
>>
>> which is about right, as (shared_buffers + maintenance_work_mem) is
>> about 1/4 of RAM.
>>
>> With the v5 patch version applied, the CREATE INDEX process eventually
>> goes crazy and allocates almost all the available memory (but somesimes
>> finishes, mostly by pure luck). This is what I was able to get from top
>>
>> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM  COMMAND
>>   14090 tomas 20   0 7913820 6,962g 955036 D   4,3 91,1  postgres
>>
>> while the system was still reasonably responsive.
>>
>
> Thanks a lot for your help! I believe problem is that each decompressed
> item pointers array is palloc'd but not freed. I hope to fix it today.
>

Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-17 Thread Marko Tiikkaja

On 1/17/14 11:26 AM, Pavel Stehule wrote:

After some thinking I don't think so this design is not good. It  changing
a working with exception (error) levels - and it is not consistent with
other PostgreSQL parts.

A benefit is less than not clean configuration. Better to solve similar
issues via specialized plpgsql extensions or try to help me push
plpgsql_check_function to core. It can be a best holder for this and
similar checks.


I see these as being two are different things.  There *is* a need for a 
full-blown static analyzer for PL/PgSQL, but I don't think it needs to 
be in core.  However, there seems to be a number of pitfalls we could 
warn the user about with little effort in core, and I think those are 
worth doing.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql.warn_shadow

2014-01-17 Thread Pavel Stehule
Hello

After some thinking I don't think so this design is not good. It  changing
a working with exception (error) levels - and it is not consistent with
other PostgreSQL parts.

A benefit is less than not clean configuration. Better to solve similar
issues via specialized plpgsql extensions or try to help me push
plpgsql_check_function to core. It can be a best holder for this and
similar checks.

Regards

Pavel




2014/1/15 Marko Tiikkaja 

> On 1/15/14 3:09 PM, Pavel Stehule wrote:
>
>> You first should to say, what is warning and why it is only warning and
>> not
>> error.
>>
>
> Personally, I'm a huge fan of the computer telling me that I might have
> made a mistake.  But on the other hand, I also really hate it when the
> computer gets in my way when I'm trying to test something quickly and
> making these mistakes on purpose.  Warnings are really good for that: hard
> to ignore (YMMV) accidentally, but easy to spot when developing.
>
> As to how we would categorize these checks between warnings and errors..
>  I can't really answer that.  I'm tempted to say "anything that is an error
> now is an error, any additional checks we might add are warnings", but
> that's effectively just freezing the definition at an arbitrary point in
> time.
>
>
>  And why plpgsql warning processing should be different than general
>> postgresql processing?
>>
>
> What do you mean?  We're adding extra checks on *top* of the normal "this
> is clearly an error" conditions.  PostgreSQL in general doesn't really do
> that.  Consider:
>
>   SELECT * FROM foo WHERE fooid IN (SELECT fooid FROM bar);
>
> where the table "bar" doesn't have a column "fooid".  That's a perfectly
> valid query, but it almost certainly doesn't do what you would want.
> Personally I'd like to see a WARNING here normally, but I've eventually
> learned to live with this caveat.  I'm hoping that in PL/PgSQL we could at
> least solve some of the most annoying pitfalls.
>
>
>  My objection is against too general option. Every option shoudl to do one
>> clean thing.
>>
>
> It looks to me like the GUC *is* doing only one thing.  "list of warnings
> I want to see", or the shorthand "all" for convenience.
>
>
> Regards,
> Marko Tiikkaja
>


[HACKERS] improve the help message about psql -F

2014-01-17 Thread Jov
in the offical doc,-F say:

> Use separator as the field separator for unaligned output.


but in the psql --help,-F say:

> set field separator (default: "|")


if user don't read the offical doc carefully,he can use:

psql -F , -c 'select ...'

But can't get what he want.
It is a bad user Experience.

I make a patch change the help message for -F to:

> set field separator for unaligned output (default: "|")


patch for head attached.

Jov
blog: http:amutu.com/blog 


filed_separator_set_help_imporove.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Funny representation in pg_stat_statements.query.

2014-01-17 Thread Kyotaro HORIGUCHI
Hello, I noticed that pg_stat_statements.query can have funny values.

| =# select pg_stat_statements_reset();
| =# select current_timestamp(0);
| 
| =# select query from pg_stat_statements;
|query
| 
|  select ?(0);
|  select pg_stat_statements_reset();

The same thing happenes for CURRENT_DATE, CURRENT_TIME, LOCALTIME
and LOCALTIMESTAMP which are specially treated, that is,
immediately replaced with a combination of a source function and
a typecast in gram.y.

The story is as follows,

At first, for instance, CURRENT_TIMESTAMP(0) is replaced with
'now()::timestamptz(0)' and "CURRENT_TIMESTAMP"'s location is
used as that of 'now' and the location of the const '1' of
LOCALTIME is used as that of the const '1' for 'timestamptz'.

Let's assume the orignal query was,

| pos: Query
|  0: SELECT
|  7: CURRENT_TIMESTAMP(
| 25: 0
|   : );

This is parsed in gram.y as follows, the location for
'CURRENT_TIMESTAMP' above (7) is used as the location for the
CONST "now".

| TypeCast {
|   TypeCast {
| A_Const("now")
| TypeName {names = ["text"], loc = -1 (undef) }
| loc = 7
|   }
|   Typename {names = ["timestamptz"], loc = -1 }
|   typemods = [Const {1, loc = 25}]
| }

Then this is transformed into,

| FuncExpr {
|   funcid = 'timestamptz'
|   args = [
| CoerceViaIO {
|   arg = Const {type = "text", value = "now", loc = 7 }
|   loc = -1
| }
| Const { type = "int4", value = 1, loc = -1 }
|   ]
| }

Finally pg_stat_statements picks the location '7' as a location
of some constant and replaces the token there with '?'
nevertheless it is originally the location of 'CURRENT_TIMESTAMP'
which is not a constant for users.

I found two ways to fix this issue. Both of them prevents wrong
masking but the original constant parameter ('0') becomes won't
be masked. This behavior seems sane enough for the porpose.

A. Making pg_stat_statements conscious of the token types to
   prevent wrong masking.

  20140117_remove_needless_location_setting.patch

B. Letting gram.y not set the location for the problematic nodes.

  20140117_skip_nonconstants_on_nomalization.patch
  

I don't have firm idea which is preferable. Or the possible
another solution.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f0b9507..30da7aa 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -11355,7 +11355,7 @@ func_expr_common_subexpr:
 	 * to rely on it.)
 	 */
 	Node *n;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	$$ = makeTypeCast(n, SystemTypeName("date"), -1);
 }
 			| CURRENT_TIME
@@ -11365,7 +11365,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	$$ = makeTypeCast(n, SystemTypeName("timetz"), -1);
 }
 			| CURRENT_TIME '(' Iconst ')'
@@ -11376,7 +11376,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	d = SystemTypeName("timetz");
 	d->typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast(n, d, -1);
@@ -11397,7 +11397,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	d = SystemTypeName("timestamptz");
 	d->typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast(n, d, -1);
@@ -11409,7 +11409,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	$$ = makeTypeCast((Node *)n, SystemTypeName("time"), -1);
 }
 			| LOCALTIME '(' Iconst ')'
@@ -11420,7 +11420,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	d = SystemTypeName("time");
 	d->typmods = list_make1(makeIntConst($3, @3));
 	$$ = makeTypeCast((Node *)n, d, -1);
@@ -11432,7 +11432,7 @@ func_expr_common_subexpr:
 	 * See comments for CURRENT_DATE.
 	 */
 	Node *n;
-	n = makeStringConstCast("now", @1, SystemTypeName("text"));
+	n = makeStringConstCast("now", -1, SystemTypeName("text"));
 	$$ = makeTypeCast(n, SystemTypeName("timestamp"), -1);
 }
 			| LOCALTIMESTAMP '(' Iconst ')'
@@ -11443,7 +11443,7 @@ func_expr_common_subexpr:
 	 */
 	Node *n;
 	TypeName *d;

Re: [HACKERS] ECPG FETCH readahead, was: Re: ECPG fixes

2014-01-17 Thread Boszormenyi Zoltan

2014-01-16 23:42 keltezéssel, Alvaro Herrera írta:

Boszormenyi Zoltan escribió:

Rebased patches after the regression test and other details were fixed
in the infrastructure part.

This thread started in 2010, and various pieces have been applied
already and some others have changed in nature.  Would you please post a
new patchset, containing rebased patches that still need application, in
a new email thread to be linked in the commitfest entry?


I will do that.

Best regards,
Zoltán Böszörményi



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Review: [HACKERS] ECPG infrastructure changes part 1, was: Re: ECPG fixes

2014-01-17 Thread Boszormenyi Zoltan

2014-01-16 22:13 keltezéssel, Alvaro Herrera írta:

Alvaro Herrera escribió:

Boszormenyi Zoltan escribió:


All patches are attached again for completeness.

Thanks.  I pushed a commit comprising patches 09 through 14.

Now also pushed 15 to 17.


Thanks very much.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers