date:20180217

different results from plpgsql functions related to last changes in master

2018-02-17 Thread Pavel Stehule

Hi

I did update of plpgsql_check and I see, so some functions returns
different result than on older posgresql. Probably this is wanted behave,
but It should be mentioned as partial compatibility break, because some
regress test can be broken too.

create table t(i int);
create function test_t(OUT t) returns t AS $$
begin
$1 := null;
end;
$$ language plpgsql;

select test_t();

result on PostgreSQL11 is null, on older is empty record "()"

Regards

Pavel

Re: ALTER TABLE ADD COLUMN fast default

2018-02-17 Thread David Rowley

On 17 February 2018 at 10:46, Andrew Dunstan
 wrote:
> The attached version largely fixes with the performance degradation
> that Tomas Vondra reported, replacing an O(N^2) piece of code in
> slot_getmissingattrs() by one that is O(N).

I've looked over this patch and had a play around with it.

Just a couple of things that I noticed while reading:

1. I believe "its" should be "it's" here:

+ /*
+ * Yes, and its of the by-value kind Copy in the Datum
+ */

copy does not need an upper case 'C'. There should also be a comma after "kind"

There's a similar problem with the following:

+ /*
+ * Yes, and its fixed length Copy it out and have teh Datum
+ * point to it.
+ */

there should be a comma after "length" too.

Also, "teh" ...

2. "dfferent" -> "different"

+-- Test a large sample of dfferent datatypes

I'll try to spend some time reviewing the code in detail soon.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] path toward faster partition pruning

2018-02-17 Thread David Rowley

 On 17 February 2018 at 22:39, David Rowley
 wrote:
> 10. I'd rather see bms_next_member() used here:
>
> /* Add selected partitions' RT indexes to result. */
> while ((i = bms_first_member(partindexes)) >= 0)
> result = bms_add_member(result, rel->part_rels[i]->relid);
>
> There's not really much use for bms_first_member these days. It can be
> slower due to having to traverse the unset lower significant bits each
> loop. bms_next_member starts where the previous loop left off.
>
> Will try to review more tomorrow.

As I mentioned yesterday, here's the remainder of the review:

11. The following comment is misleading. It says: "We currently handle
two such cases:", then it goes on to say the 2nd case is not handled.

/*
 * Handle cases where the clause's operator does not belong to
 * the partitioning operator family.  We currently handle two
 * such cases: 1. Operators named '<>' are not listed in any
 * operator family whatsoever, 2.  Ordering operators like '<'
 * are not listed in the hash operator families.  For 1, check
 * if list partitioning is in use and if so, proceed to pass
 * the clause to the caller without doing any more processing
 * ourselves.  2 cannot be handled at all, so the clause is
 * simply skipped.
 */

12. The following code should test for LIST partitioning before doing
anything else:

if (!op_in_opfamily(opclause->opno, partopfamily))
{
Oid negator;

/*
 * To confirm if the operator is really '<>', check if its
 * negator is a equality operator.  If it's a btree
 * equality operator *and* this is a list partitioned
 * table, we can use it prune partitions.
 */
negator = get_negator(opclause->opno);
if (OidIsValid(negator) &&
op_in_opfamily(negator, partopfamily))
{
Oid lefttype;
Oid righttype;
int strategy;

get_op_opfamily_properties(negator, partopfamily,
false,
   &strategy,
   &lefttype, &righttype);

if (strategy == BTEqualStrategyNumber &&
context->strategy == PARTITION_STRATEGY_LIST)
is_ne_listp = true;
}

/* Cannot handle this clause. */
if (!is_ne_listp)
continue;
}

The code should probably be in the form of:

if (!op_in_opfamily(opclause->opno, partopfamily))
{
if (context->strategy != PARTITION_STRATEGY_LIST)
continue;

...

if (strategy == BTEqualStrategyNumber)
is_ne_listp = true;
}

that way we'll save 3 syscache lookups when a <> clause appears in a
RANGE or HASH partitioned table.

13. The following code makes assumptions that the partitioning op
family is btree:

/*
 * In case of NOT IN (..), we get a '<>', which while not
 * listed as part of any operator family, we are able to
 * handle it if its negator is an equality operator that
 * is in turn part of the operator family.
 */
if (!op_in_opfamily(saop_op, partopfamily))
{
Oid negator = get_negator(saop_op);
int strategy;
Oid lefttype,
righttype;

if (!OidIsValid(negator))
continue;
get_op_opfamily_properties(negator, partopfamily, false,
  &strategy,
   &lefttype, &righttype);
if (strategy != BTEqualStrategyNumber)
continue;
}

this might not be breakable today, but it could well break in the
future, for example, if hash op family managed to grow two more
strategies, then we could get a false match on the matching strategy
numbers (both 3).

14. The following code assumes there will be a right op:

if (IsA(clause, OpExpr))
{
OpExpr*opclause = (OpExpr *) clause;
Expr*leftop,
  *rightop,
  *valueexpr;
bool is_ne_listp = false;

leftop = (Expr *) get_leftop(clause);
if (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
rightop = (Expr *) get_rightop(clause);

This'll crash with the following:

create function nonzero(p int) returns bool as $$ begin return (p <>
0); end;$$ language plpgsql;
create operator # (procedure = nonzero, leftarg = int);
create table listp (a int) partition by list (a);
create table listp1 partition of listp for values in(1);
select * from listp where a#;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

You need to ensure that there are 2 args.

15. Various if tests in extract_partition_clauses result in a
`continue` when they should perhaps be a `break` instead.

For example:

if (!PartCollMatchesExprColl(partcoll, opclause->inputcollid))
continue;

This clause does not depend on the partition key, so there's no point
in trying to match this again for the next partition key.

This item is perhaps not so important, as it's only a small
inefficiency, but I just wanted to point it out. Also, note that plain
Var partition keys cannot be duplicated, but expressions can, so there
may be cases that you don't want to cha

Re: missing toast table for pg_policy

2018-02-17 Thread Michael Paquier

On Sat, Feb 17, 2018 at 03:52:33PM -0800, Andres Freund wrote:
> I've a hard hard hard time believing this is something useful to do. I
> mean by that argument you can just cause trouble everywhere by just
> storing arbitrarily large stuff via sql.

Did you read my last email until the end?  Particularly this quote:

> Longer salts make it for harder to reproduce connection proofs, so some
> users may want to privilege that than the number of iterations, and
> those are perfectly valid per the SCRAM exchange protocol.

The argument here is not about storing large blobs, it is about the
flexibility that the SCRAM protocol allows that PostgreSQL does not
because of this restriction in row size.  Postgres should have in the
future a set of GUC parameters to allow users to control the interation
number and the salt length when generating the SCRAM verifier depending
on their security requirements.  And I see no point in restraining
things on the backend as we do now.
--
Michael


signature.asc
Description: PGP signature

Re: missing toast table for pg_policy

2018-02-17 Thread Andres Freund

Hi,

On 2018-02-18 08:48:37 +0900, Michael Paquier wrote:
> On Sat, Feb 17, 2018 at 08:52:11AM -0800, Andres Freund wrote:
> > On 2018-02-17 11:39:57 -0500, Tom Lane wrote:
> > >  pg_authid   | rolpassword | text
> > 
> > that seems not not to require one.
> 
> You can craft SCRAM verifiers that make it fail, which can be easily
> done using this module:
> https://github.com/michaelpq/pg_plugins/tree/master/scram_utils
> 
> =# create extension scram_utils ;
> CREATE EXTENSION
> =# select scram_utils_verifier('your_role_name', 'foo', 100, 9000);
> ERROR:  54000: row is too big: size 12224, maximum size 8160
> 
> The third argument counts for the number of iterations to generate the
> proof and the fourth controls the salt length.

I've a hard hard hard time believing this is something useful to do. I
mean by that argument you can just cause trouble everywhere by just
storing arbitrarily large stuff via sql.

Greetings,

Andres Freund

Re: missing toast table for pg_policy

2018-02-17 Thread Michael Paquier

On Sat, Feb 17, 2018 at 08:52:11AM -0800, Andres Freund wrote:
> On 2018-02-17 11:39:57 -0500, Tom Lane wrote:
> >  pg_authid   | rolpassword | text
> 
> that seems not not to require one.

You can craft SCRAM verifiers that make it fail, which can be easily
done using this module:
https://github.com/michaelpq/pg_plugins/tree/master/scram_utils

=# create extension scram_utils ;
CREATE EXTENSION
=# select scram_utils_verifier('your_role_name', 'foo', 100, 9000);
ERROR:  54000: row is too big: size 12224, maximum size 8160

The third argument counts for the number of iterations to generate the
proof and the fourth controls the salt length.

Longer salts make it for harder to reproduce connection proofs, so some
users may want to privilege that than the number of iterations, and
those are perfectly valid per the SCRAM exchange protocol.

There is another restriction which limits the size of authentication
messages to 2000 in libpq, which we may actually want to relax in the
future if we allow configurable in-core salt lengths to be created with
a GUC.  And other clients like jdbc don't have this restriction if I
recall correctly.

In short, removing this restriction at least on HEAD for the backend
gives more flexibility.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] path toward faster partition pruning

2018-02-17 Thread David Rowley

On 17 February 2018 at 22:24, David Rowley  wrote:
> On 2 February 2018 at 23:03, Amit Langote  
> wrote:
>> I started wondering if it's not such a good idea to make
>> PartitionClauseInfo a Node at all?
> That sounds like a good idea.
>
> A patch which puts this back is attached.

Please find attached an updated patch. The previous one must've got a
bit mangled in a bad merge.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


make_PartitionClauseInfo_a_nonnode_type_v2.patch
Description: Binary data

Re: [HACKERS] Cache lookup errors with functions manipulation object addresses

2018-02-17 Thread Alvaro Herrera

Michael Paquier wrote:

> > As far as format_type_extended() is concerned, IMO we've gone far enough
> > with the number of variants of format_type().  Making the function
> > public makes sense to me, but let's add a bits32 flags argument instead
> > of exposing the messy set of booleans.  We can add compatibility
> > wrappers for the flag combinations most used in core code, and maybe
> > take the opportunity phase out the uncommon ones.
> 
> OK, I was a bit hesitant to propose that without more input, so I
> definitely agree with this API interface.  I have tackled that in 0003,
> with the following changes:
> - let's get rid of format_type_with_typemod_qualified.  This is only
> used by postgres_fdw in one place.
> - format_type_be_qualified is also rather localized, but I have kept
> it.  Perhaps this could be nuked as well.  Input is welcome.
> - let's keep format_type_be and format_type_with_typemod.  Those are
> largely more spread in the core code, so I don't think that we need to
> invade things more than necessary.

Pushed 0003.  Maybe we can get rid of format_type_be_qualified too, but
I didn't care too much about it either; it's not a big deal I think.

What interested me more was whether we could get rid of the
FORMAT_TYPE_TYPEMOD_GIVEN flag, but ended up deciding not to pursue that
as a phenomenal waste of time.  Here are some references in case you
care.

https://www.postgresql.org/message-id/flat/20001659.fAAGxKX06044%40postgresql.org
https://git.postgresql.org/pg/commitdiff/a585c20d12d0e22befc8308e9f8ccb6f54a5df69

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Proposal for changes in official Docker image

2018-02-17 Thread Максим Кольцов

Hi!

First of all, thanks for the great app :)

I started using PgAdmin with docker image (dpage/pgadmin4) a few weeks
ago, however I thought that it had some issues, so I decided to make
my own image. Some of the advantages:

- Use alpine linux instead of centos to greatly reduce image size
(170MB vs 560MB)
- Use lightweight pure-python HTTP server waitress instead of heavy
apache/mod_wsgi
- Use python 3.6

You can test the image at https://hub.docker.com/r/maksbotan/pgadmin4/
Readme contains more detailed explanation and usage instructions.

The Dockerfile is hosted at github: https://github.com/maksbotan/pgadmin4_docker

If you find my work useful, I'd love to make a contribution with these
scripts, after some discussion with pgadmin developers and further
improvements.

Looking forward for an answer,
Maxim.

Re: master check fails on Windows Server 2008

2018-02-17 Thread Marina Polyakova


On 16-02-2018 19:31, Tom Lane wrote:

Marina Polyakova  writes:

Hello, hackers! I got a permanent failure of master (commit
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 
2008.

Regression output and diffs as well as config.pl are attached.


Weird.  AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent.  (And that test has been there nearly a
year without causing reported problems.)

To dig into it a bit more, I tweaked the test case to show the costs
for both plans, and got an output diff as attached.  Could you try
the same experiment on your Windows box?  In order to force the choice
in the other direction, you'd need to temporarily disable enable_sort,
not enable_hashagg as I did here, but the principle is the same.


Thank you very much! Your test showed that hash aggregation was not even 
added to the possible paths (see windows_regression.diffs attached). 
Exploring this, I found that not allowing float8 to pass by value in 
config.pl was crucial for the size of the hash table used in this query 
(see diff.patch attached):


From postmaster.log on Windows:

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:  
EXPLAIN

 SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG:  rewritten 
parse tree:

...
2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext STATEMENT:  
EXPLAIN

 SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 0
get_agg_clause_costs_walker avgwidth 8 sizeof(void *) 8 
costs->transitionSpace 24

# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 24:
estimate_hashagg_tablesize 2 hashentrysize 56
estimate_hashagg_tablesize 3 hashentrysize 96
estimate_hashagg_tablesize dNumGroups 1632.00
# 156672 = 96 * 1632 > 131072:
add_paths_to_grouping_rel hashaggtablesize 156672 work_mem 128 work_mem 
* 1024L 131072 grouped_rel->pathlist == NIL 0

2018-02-17 20:50:48.596 MSK [1592] pg_regress/stats_ext LOG:  plan:
...

From postmaster.log on my computer (allow float8 to pass by value):

2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:  
EXPLAIN

 SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG:  rewritten 
parse tree:

...
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext STATEMENT:  
EXPLAIN

 SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
# 20 = INT8OID => pg_type.typbyval = FLOAT8PASSBYVAL:
get_agg_clause_costs_walker aggtranstype 20 get_typbyval(aggtranstype) 1
# add AGG_SORTED path:
add_paths_to_grouping_rel 1 create_agg_path (aggstrategy 1)
estimate_hashagg_tablesize 1 hashentrysize 32
# add transitionSpace = 0:
estimate_hashagg_tablesize 2 hashentrysize 32
estimate_hashagg_tablesize 3 hashentrysize 72
estimate_hashagg_tablesize dNumGroups 1632.00
# 117504 = 72 * 1632 < 131072:
add_paths_to_grouping_rel hashaggtablesize 117504 work_mem 128 work_mem 
* 1024L 131072 grouped_rel->pathlist == NIL 0

# add AGG_HASHED path:
add_paths_to_grouping_rel 2 create_agg_path (aggstrategy 2)
2018-02-17 20:45:57.651 MSK [3012] pg_regress/stats_ext LOG:  plan:
...

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd14..d9788fd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3583,11 +3583,24 @@ estimate_hashagg_tablesize(Path *path, const AggClauseCosts *agg_costs,
 	hashentrysize = MAXALIGN(path->pathtarget->width) +
 		MAXALIGN(SizeofMinimalTupleHeader);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 1 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus space for pass-by-ref transition values... */
 	hashentrysize += agg_costs->transitionSpace;
+
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 2 hashentrysize %ld\n",
+			hashentrysize);
+
 	/* plus the per-hash-entry overhead */
 	hashentrysize += hash_agg_entry_size(agg_costs->numAggs);
 
+	fprintf(stderr,
+			"estimate_hashagg_tablesize 3 hashentrysize %ld\nestimate_hashagg_tablesize dNumGroups %f\n",
+			hashentrysize, dNumGroups);
+
 	/*
 	 * Note that this disregards the effect of fill-factor and growth policy
 	 * of the hash-table. That's probably ok, given default the default
@@ -6043,6 +6056,9 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	 * We have aggregation, possibly with plain GROUP BY. Make
 	 * an AggPath.
 	 */
+	fprintf(stderr,
+			"add_paths_to_grouping_rel 1 create_agg_path (aggstrategy %d)\n",
+			parse->groupClause ? AGG_SORTED : AGG_PLAIN);

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Alvaro Hernandez




On 17/02/18 12:37, Fabien COELHO wrote:


    Why not then insert a "few" rows, measure size, truncate the 
table, compute the formula and then insert to the desired user 
requested size? (or insert what should be the minimum, scale 1, 
measure, and extrapolate what's missing). It doesn't sound too 
complicated to me, and targeting a size is

something that I believe it's quite good for user.


The formula I used approximates the whole database, not just one 
table. There was one for the table, but this is only part of the 
issue. In particular, ISTM that index sizes should be included when 
caching is considered.


Also, index sizes are probably in n ln(n), so some level of 
approximation is inevitable.


Moreover, the intrinsic granularity of TPC-B as multiple of 100,000 
rows makes it not very precise wrt size anyway.




    Sure, makes sense, so my second suggestion seems more reasonable: 
insert with scale 1, measure there (ok, you might need to crete indexes 
only to later drop them), and if computed scale > 1 then insert whatever 
is left to insert. Shouldn't be a big deal to me.


    I like the feature :)

    Álvaro

--

Alvaro Hernandez


---
OnGres

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Fabien COELHO



    Why not then insert a "few" rows, measure size, truncate the table, 
compute the formula and then insert to the desired user requested size? (or 
insert what should be the minimum, scale 1, measure, and extrapolate what's 
missing). It doesn't sound too complicated to me, and targeting a size is

something that I believe it's quite good for user.


The formula I used approximates the whole database, not just one table. 
There was one for the table, but this is only part of the issue. In 
particular, ISTM that index sizes should be included when caching is 
considered.


Also, index sizes are probably in n ln(n), so some level of approximation 
is inevitable.


Moreover, the intrinsic granularity of TPC-B as multiple of 100,000 rows 
makes it not very precise wrt size anyway.


--
Fabien.

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Fabien COELHO



Hello Tom,


Here is a attempt at extending --scale so that it can be given a size.


I do not actually find this to be a good idea.  It's going to be
platform-dependent, or not very accurate, or both, and thereby
contribute to confusion by making results less reproducible.


I have often wanted to have such an option for testing, with criterion 
like "within shared_buffers", "within memory", "twice the available 
memory", to look for behavioral changes in some performance tests.


I you want reproducible (for some definition of reproducible) and 
accurate, you can always use scale with a number. The report provides the 
actual scale used anyway, so providing the size is just a convenience for 
the initialization phase. I agree that it cannot be really exact.


Would it be more acceptable with some clear(er)/explicit caveat?


Plus, what do we do if the backend changes table representation in
some way that invalidates Kaarel's formula altogether?


Then the formula (a simple linear regression, really) should have to be 
updated?



More confusion would be inevitable.


There is no much confusion when the "scale" is reported. As for confusion, 
a performance tests is influenced by dozen of parameters anyway.


Now if you do not want such a feature, you can mark it as rejected, and we 
will keep on trying to guess or look for the formula till the end of 
time:-)


--
Fabien.

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Alvaro Hernandez




On 17/02/18 12:17, Tom Lane wrote:

Alvaro Hernandez  writes:

On 17/02/18 11:26, Tom Lane wrote:

Fabien COELHO  writes:

Here is a attempt at extending --scale so that it can be given a size.

I do not actually find this to be a good idea.  It's going to be
platform-dependent, or not very accurate, or both, and thereby
contribute to confusion by making results less reproducible.

Plus, what do we do if the backend changes table representation in
some way that invalidates Kaarel's formula altogether?  More confusion
would be inevitable.

      Why not then insert a "few" rows, measure size, truncate the table,
compute the formula and then insert to the desired user requested size?
(or insert what should be the minimum, scale 1, measure, and extrapolate
what's missing). It doesn't sound too complicated to me, and targeting a
size is something that I believe it's quite good for user.

Then you'd *really* have irreproducible results.

regards, tom lane


    You also have irreproducible results today, according to your 
criteria. Either you agree on the number of rows but may not agree on 
the size (today), or you agree on the size but may not agree on the 
number of rows. Right now you can only pick the former, while I think 
people would significantly appreciate the latter. If neither is correct, 
let's at least provide the choice.


    Regards,

    Álvaro


--

Alvaro Hernandez


---
OnGres

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Tom Lane

Alvaro Hernandez  writes:
> On 17/02/18 11:26, Tom Lane wrote:
>> Fabien COELHO  writes:
>>> Here is a attempt at extending --scale so that it can be given a size.

>> I do not actually find this to be a good idea.  It's going to be
>> platform-dependent, or not very accurate, or both, and thereby
>> contribute to confusion by making results less reproducible.
>> 
>> Plus, what do we do if the backend changes table representation in
>> some way that invalidates Kaarel's formula altogether?  More confusion
>> would be inevitable.

>      Why not then insert a "few" rows, measure size, truncate the table, 
> compute the formula and then insert to the desired user requested size? 
> (or insert what should be the minimum, scale 1, measure, and extrapolate 
> what's missing). It doesn't sound too complicated to me, and targeting a 
> size is something that I believe it's quite good for user.

Then you'd *really* have irreproducible results.

regards, tom lane

Re: missing toast table for pg_policy

2018-02-17 Thread Tom Lane

Andres Freund  writes:
> On 2018-02-17 11:39:57 -0500, Tom Lane wrote:
>> pg_aggregate| agginitval  | text
>> pg_aggregate| aggminitval | text

> Seems like it should have a toast table, it's not too hard to imagine
> some form of aggregate to have a large initial condition.

Really?  I should think the initial condition would almost always be some
spelling of "zero".  Certainly nobody's ever complained of this in the
past.

>> pg_attribute| attacl  | aclitem[]
>> pg_attribute| attfdwoptions   | text[]
>> pg_attribute| attoptions  | text[]

> Seems like it should have a toast table, but I think other people
> differed.

I think there was fear of circularity if we tried to toast pg_class
or pg_attribute.  (In particular, VACUUM FULL already has its hands
full handling pg_class correctly --- dealing with a toast table too
would probably be really, uh, interesting.)  Also, given the fact
that tupdescs can only store the fixed-width part of a pg_attribute
entry, having var-width fields in there at all is a pretty dubious
decision; it's way too easy to forget about that and try to fetch
them out of a tupdesc entry.  I think the right approach for
potentially-long per-attribute properties is exemplified by pg_attrdef.

In any case, I don't see any of the "options" columns as things that
are likely to get long enough to be a problem.  ACLs maybe could get
long, but I can only recall perhaps one thread complaining about that,
so I don't feel that there's field demand to justify toasting all the
catalogs with ACLs in them.

>> pg_largeobject  | data| bytea

> We deal with this in other ways.

Right, this is one that definitely should not have a toast table.

>> pg_partitioned_table| partexprs   | pg_node_tree

> Probably makes sense.

Dunno, what is a sane partitioning expression?

I don't feel that we need to insist on having a toast table for every
theoretically toastable column.  The point here is to make a conscious
decision for each such column that we don't expect it to get long.
I think most of these columns are probably fine.  Unsure about the
partitioning-related ones.

regards, tom lane

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Alvaro Hernandez




On 17/02/18 11:26, Tom Lane wrote:

Fabien COELHO  writes:

Here is a attempt at extending --scale so that it can be given a size.

I do not actually find this to be a good idea.  It's going to be
platform-dependent, or not very accurate, or both, and thereby
contribute to confusion by making results less reproducible.

Plus, what do we do if the backend changes table representation in
some way that invalidates Kaarel's formula altogether?  More confusion
would be inevitable.


    Why not then insert a "few" rows, measure size, truncate the table, 
compute the formula and then insert to the desired user requested size? 
(or insert what should be the minimum, scale 1, measure, and extrapolate 
what's missing). It doesn't sound too complicated to me, and targeting a 
size is something that I believe it's quite good for user.



    Álvaro


--

Alvaro Hernandez


---
OnGres

Re: missing toast table for pg_policy

2018-02-17 Thread Andres Freund

On 2018-02-17 11:39:57 -0500, Tom Lane wrote:
> BTW, I was wondering if it'd be a good idea to try to forestall future
> oversights of this sort by adding a test query in, say, misc_sanity.sql.
> Something like

+many


>  relname | attname |   atttypid   
> -+-+--
>  pg_aggregate| agginitval  | text
>  pg_aggregate| aggminitval | text

Seems like it should have a toast table, it's not too hard to imagine
some form of aggregate to have a large initial condition.


>  pg_attribute| attacl  | aclitem[]
>  pg_attribute| attfdwoptions   | text[]
>  pg_attribute| attoptions  | text[]

Seems like it should have a toast table, but I think other people
differed.


>  pg_authid   | rolpassword | text

that seems not not to require one.


>  pg_class| relacl  | aclitem[]
>  pg_class| reloptions  | text[]
>  pg_class| relpartbound| pg_node_tree

I still think this should have one, but we disagreed. Only argument
against that I can see is complexity around rewrites.


>  pg_collation| collversion | text

unnecessary.


>  pg_database | datacl  | aclitem[]
>  pg_default_acl  | defaclacl   | aclitem[]

hm.

>  pg_event_trigger| evttags | text[]

unnecessary?


>  pg_extension| extcondition| text[]
>  pg_extension| extconfig   | oid[]
>  pg_extension| extversion  | text

Possibly add?


>  pg_foreign_data_wrapper | fdwacl  | aclitem[]
>  pg_foreign_data_wrapper | fdwoptions  | text[]

Possibly add?


>  pg_foreign_server   | srvacl  | aclitem[]
>  pg_foreign_server   | srvoptions  | text[]
>  pg_foreign_server   | srvtype | text
>  pg_foreign_server   | srvversion  | text
>  pg_foreign_table| ftoptions   | text[]

Add? That's a fair number of potentially longer fields.


>  pg_index| indexprs| pg_node_tree
>  pg_index| indpred | pg_node_tree

Imo we should add one here, but honestly I can recall only one or two
complaints.


>  pg_init_privs   | initprivs   | aclitem[]

Only if we decide to make other aclitem containing fields toastable.


>  pg_largeobject  | data| bytea

We deal with this in other ways.


>  pg_partitioned_table| partexprs   | pg_node_tree

Probably makes sense.


>  pg_pltemplate   | tmplacl | aclitem[]
>  pg_pltemplate   | tmplhandler | text
>  pg_pltemplate   | tmplinline  | text
>  pg_pltemplate   | tmpllibrary | text
>  pg_pltemplate   | tmplvalidator   | text

Hard to imagine this being required, unless we just want to make
aclitem[] toastable as a rule.


>  pg_policy   | polqual | pg_node_tree
>  pg_policy   | polroles| oid[]
>  pg_policy   | polwithcheck| pg_node_tree

Yes.


>  pg_replication_origin   | roname  | text

unnecessary.


>  pg_subscription | subconninfo | text
>  pg_subscription | subpublications | text[]
>  pg_subscription | subsynccommit   | text

I'd say yes, with a few alternative hosts connection info can become
quite long.


>  pg_tablespace   | spcacl  | aclitem[]
>  pg_tablespace   | spcoptions  | text[]

Hm?


>  pg_ts_dict  | dictinitoption  | text

probably not.


> I think in most of these cases we've consciously decided not to toast-ify,
> but maybe some of them need a second look.

Greetings,

Andres Freund

Re: missing toast table for pg_policy

2018-02-17 Thread Tom Lane

Joe Conway  writes:
> Yes, exactly. I'm fine with not backpatching, just wanted to raise the
> possibility. I will push later today to HEAD (with a catalog version bump).

BTW, I was wondering if it'd be a good idea to try to forestall future
oversights of this sort by adding a test query in, say, misc_sanity.sql.
Something like

select relname, attname, atttypid::regtype
from pg_class c join pg_attribute a on c.oid = attrelid
where c.oid < 16384 and reltoastrelid=0 and relkind = 'r' and attstorage != 'p'
order by 1,2;

If you try that you'll see the list is quite long:

 relname | attname |   atttypid   
-+-+--
 pg_aggregate| agginitval  | text
 pg_aggregate| aggminitval | text
 pg_attribute| attacl  | aclitem[]
 pg_attribute| attfdwoptions   | text[]
 pg_attribute| attoptions  | text[]
 pg_authid   | rolpassword | text
 pg_class| relacl  | aclitem[]
 pg_class| reloptions  | text[]
 pg_class| relpartbound| pg_node_tree
 pg_collation| collversion | text
 pg_database | datacl  | aclitem[]
 pg_default_acl  | defaclacl   | aclitem[]
 pg_event_trigger| evttags | text[]
 pg_extension| extcondition| text[]
 pg_extension| extconfig   | oid[]
 pg_extension| extversion  | text
 pg_foreign_data_wrapper | fdwacl  | aclitem[]
 pg_foreign_data_wrapper | fdwoptions  | text[]
 pg_foreign_server   | srvacl  | aclitem[]
 pg_foreign_server   | srvoptions  | text[]
 pg_foreign_server   | srvtype | text
 pg_foreign_server   | srvversion  | text
 pg_foreign_table| ftoptions   | text[]
 pg_index| indexprs| pg_node_tree
 pg_index| indpred | pg_node_tree
 pg_init_privs   | initprivs   | aclitem[]
 pg_language | lanacl  | aclitem[]
 pg_largeobject  | data| bytea
 pg_largeobject_metadata | lomacl  | aclitem[]
 pg_namespace| nspacl  | aclitem[]
 pg_partitioned_table| partexprs   | pg_node_tree
 pg_pltemplate   | tmplacl | aclitem[]
 pg_pltemplate   | tmplhandler | text
 pg_pltemplate   | tmplinline  | text
 pg_pltemplate   | tmpllibrary | text
 pg_pltemplate   | tmplvalidator   | text
 pg_policy   | polqual | pg_node_tree
 pg_policy   | polroles| oid[]
 pg_policy   | polwithcheck| pg_node_tree
 pg_replication_origin   | roname  | text
 pg_subscription | subconninfo | text
 pg_subscription | subpublications | text[]
 pg_subscription | subsynccommit   | text
 pg_tablespace   | spcacl  | aclitem[]
 pg_tablespace   | spcoptions  | text[]
 pg_ts_dict  | dictinitoption  | text
 pg_type | typacl  | aclitem[]
 pg_type | typdefault  | text
 pg_type | typdefaultbin   | pg_node_tree
 pg_user_mapping | umoptions   | text[]
(50 rows)

I think in most of these cases we've consciously decided not to toast-ify,
but maybe some of them need a second look.

regards, tom lane

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Tom Lane

Fabien COELHO  writes:
> Here is a attempt at extending --scale so that it can be given a size.

I do not actually find this to be a good idea.  It's going to be
platform-dependent, or not very accurate, or both, and thereby
contribute to confusion by making results less reproducible.

Plus, what do we do if the backend changes table representation in
some way that invalidates Kaarel's formula altogether?  More confusion
would be inevitable.

regards, tom lane

Re: missing toast table for pg_policy

2018-02-17 Thread Joe Conway

On 02/16/2018 05:24 PM, Tom Lane wrote:
> Joe Conway  writes:
>> On 02/16/2018 05:07 PM, Andres Freund wrote:
>>> If problematic for < master users I think you'll have to restart cluster
>>> with allow_system_table_mods, manually create/drop toasted column. IIRC
>>> that should add a toast table even after dropping.
> 
>> I wasn't sure if we would want to backpatch and put the manual procedure
>> steps into the release notes.
> 
> The example you give seems like pretty bad practice to me.  I don't think
> we should back-patch unless it's possible to trigger the problem with a
> more realistic policy expression.

Fair enough, but the origin of this was a real life-based complaint.

> (Also, one can always work around it by putting the complicated condition
> into a function, which would likely be a better idea anyway from a
> maintenance standpoint.)

Yes, exactly. I'm fine with not backpatching, just wanted to raise the
possibility. I will push later today to HEAD (with a catalog version bump).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Fabien COELHO



Seem a nice addition but something isn't quite right; with '-s 50' (no unit) 
I get: 'scale 50 too small':


Sigh. Indeed, it seems that I forgot to test some cases... Thanks 
for the debug. Here is an hopefully better attempt.


I also upgraded the regression test to test more that "-s 1". I also tried 
to improve the documentation to point out that the result is an 
approximation.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..82343ed 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,7 +49,7 @@
 
 
 transaction type: 
-scaling factor: 10
+scaling factor: 10 (about 150 MiB)
 query mode: simple
 number of clients: 10
 number of threads: 1
@@ -282,6 +282,21 @@ pgbench  options  d
 in order to be big enough to hold the range of account
 identifiers.

+
+   
+The scale can also be specified as an expected database size by
+specifying a unit, assuming around 15 MiB size increments per scale
+unit.
+For instance, -s 5G will approximate the scale
+required for a 5 GiB database.
+Note that the size to scale conversion is based on a regression which
+entails some errors, especially for small scales.
+Moreover, the actual database size may vary depending on options.
+Allowed units are IEC 1024 powers (KiB MiB GiB TiB PiB),
+SI 1000 powers (kB MB GB TB PB) and for convenience
+simple size prefixes K M G T P are aliases for the IEC
+binary sizes.
+   
   
  
 
@@ -1600,7 +1615,7 @@ END;
 
 starting vacuum...end.
 transaction type: 
-scaling factor: 1
+scaling factor: 1 (about 15 MiB)
 query mode: simple
 number of clients: 10
 number of threads: 1
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d420942..e3f950a 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -524,7 +524,7 @@ usage(void)
 		   "  -F, --fillfactor=NUM set fill factor\n"
 		   "  -n, --no-vacuum  do not run VACUUM during initialization\n"
 		   "  -q, --quiet  quiet logging (one message each 5 seconds)\n"
-		   "  -s, --scale=NUM  scaling factor\n"
+		   "  -s, --scale=NUM|SIZE scaling factor or expected database size\n"
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
@@ -552,7 +552,7 @@ usage(void)
 		   "  -P, --progress=NUM   show thread progress report every NUM seconds\n"
 		   "  -r, --report-latencies   report average latency per command\n"
 		   "  -R, --rate=NUM   target rate in transactions per second\n"
-		   "  -s, --scale=NUM  report this scale factor in output\n"
+		   "  -s, --scale=NUM|SIZE report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM   duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all vacuum all four standard tables before tests\n"
@@ -668,6 +668,100 @@ gotdigits:
 	return ((sign < 0) ? -result : result);
 }
 
+/* return a size in bytes, or exit with an error message
+ */
+static int64
+parse_size(char * s, const char * error_message)
+{
+	static struct { char *name; int64 multiplier; }
+		UNITS[17] = {
+			/* IEC units */
+			{ "KiB", 1024 },
+			{ "MiB", 1024 * 1024 },
+			{ "GiB", 1024 * 1024 * 1024 },
+			{ "TiB", (int64) 1024 * 1024 * 1024 * 1024 },
+			{ "PiB", (int64) 1024 * 1024 * 1024 * 1024 * 1024 },
+			/* SI units */
+			{ "kB", 1000 },
+			{ "MB", 1000 * 1000 },
+			{ "GB", 1000 * 1000 * 1000 },
+			{ "TB", (int64) 1000 * 1000 * 1000 * 1000 },
+			{ "PB", (int64) 1000 * 1000 * 1000 * 1000 * 1000 },
+			/* common/convenient JEDEC usage */
+			{ "KB", 1024 },
+			{ "K", 1024 },
+			{ "M", 1024 * 1024 },
+			{ "G", 1024 * 1024 * 1024 },
+			{ "T", (int64) 1024 * 1024 * 1024 * 1024 },
+			{ "P", (int64) 1024 * 1024 * 1024 * 1024 * 1024 },
+			/* unit */
+			{ "B", 1 },
+	};
+
+	int		len = strlen(s), last = -1, i;
+	int64	size;
+	char	clast;
+
+	/* look for the unit */
+	for (i = 0; i < lengthof(UNITS); i++)
+		if (strcmp(s + len - strlen(UNITS[i].name), UNITS[i].name) == 0)
+			break;
+
+	/* found, or not */
+	if (i < lengthof(UNITS))
+	{
+		last = len - strlen(UNITS[i].name);
+		clast = s[last];
+		s[last] = '\0';
+	}
+	else /* assume bytes */
+		i = lengthof(UNITS) - 1;
+
+	if (!is_an_int(s))
+	{
+		fprintf(stderr, "invalid %s: \"%s\"\n", error_message, s);
+		exit(1);
+	}
+
+	size = strtoint64(s) * UNITS[i].multiplier;
+
+	if (last != -1)
+		s[last] = clast;
+
+	return size;
+}
+
+/* parse scale, returning at least 1 */
+static int
+parse_scale(char * s)
+{
+	int scale;
+
+	if (is_an_int(s))
+	{
+		/* standard scaling */
+		scale = atoi(s);
+	}
+	e

Documenting PROVE_TESTS in section of TAP tests

2018-02-17 Thread Michael Paquier

Hi all,

The section of the documentation dedicated to TAP tests mentions
PROVE_FLAGS:
https://www.postgresql.org/docs/devel/static/regress-tap.html

I think that it would be a good idea to mention PROVE_TESTS as well.  I
personally use and abuse of it, and documenting it instead of keeping it
hidden in Makefile.global.in is good for new developers and testers.

Proposal of patch attached.

Thanks,
--
Michael
diff --git a/doc/src/sgml/regress.sgml b/doc/src/sgml/regress.sgml
index 53716a029f..d14318053b 100644
--- a/doc/src/sgml/regress.sgml
+++ b/doc/src/sgml/regress.sgml
@@ -675,6 +675,17 @@ make -C src/bin check PROVE_FLAGS='--timer'
 See the manual page of prove for more information.

 
+   
+The make variable PROVE_TESTS
+can be used to define a whitespace-separated list of paths relative
+to the Makefile invoking prove
+to run a subset of tests instead of the default
+t/*.pl.  For example:
+
+make check PROVE_TESTS='t/001_test1.pl t/003_test3.pl'
+
+   
+

 The TAP tests require the Perl module IPC::Run.
 This module is available from CPAN or an operating system package.


signature.asc
Description: PGP signature

Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-17 Thread Michael Paquier

On Sat, Feb 17, 2018 at 08:34:41AM +0900, Michael Paquier wrote:
> For this an environment variable seems suited to me.  Say if
> PG_TAP_ALLOW_INSECURE is set, then the tests said "insecure" are allowed
> to run.  If the tests are tried to be run, then they are just skipped
> with a nice log message to tell you about it.  The cool part about this
> design is that all the tests that are not part of check-world could be
> integrated in it.  Most developers don't run regression tests on a
> shared resource.  Let me think about a full-fledged patch first.
> Documentation also matters a lot.

Attached is what I have finished with.  I have gathered the feedback
from everybody on this thread and I think that the result can satisfy
all the requirements mentioned:
- 0001 is a small patch which makes the SSL and LDAP test suite fail
immediately if the build's ./configure is not set up with necessary
build options.  This uses TestLib::check_pg_config to do the checks.
- 0002 introduces a new environment variable which can be used to decide
if an extra test suite can be used or not.  I have named it
PROVE_EXTRA_ALLOWED, and can be used as such:
make -C src/test check PROVE_EXTRA_ALLOWED=3D'ssl ldap'
This includes also some documentation.  Note that with default settings
the tests are skipped to keep things secure.

0002 is close to the logic mentioned by Peter E just upthread.  To be
more consistent with PROVE_TESTS and PROVE_FLAGS I also chose a prove
variable as that's related to TAP at the end.  I am of course open to
better names.
--
Michael
From c702f3366a4b144bea76e738f744b10ce9c9c57d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 17 Feb 2018 21:16:29 +0900
Subject: [PATCH 1/2] Prevent SSL and LDAP tests from running without support
 in build

An extra set of checks in each one of the test files is added to make
the tests fail should they be invoked without the proper build options.
This makes use of TestLib::check_pg_config to check for the build
configuration.
---
 src/test/ldap/t/001_auth.pl| 4 
 src/test/ssl/t/001_ssltests.pl | 4 
 src/test/ssl/t/002_scram.pl| 4 
 3 files changed, 12 insertions(+)

diff --git a/src/test/ldap/t/001_auth.pl b/src/test/ldap/t/001_auth.pl
index a83d96ae91..9d5065c494 100644
--- a/src/test/ldap/t/001_auth.pl
+++ b/src/test/ldap/t/001_auth.pl
@@ -4,6 +4,10 @@ use TestLib;
 use PostgresNode;
 use Test::More tests => 19;
 
+# LDAP tests are not supported without proper build options
+die "LDAP tests not supported without support in build" unless
+	check_pg_config("#define USE_LDAP 1");
+
 my ($slapd, $ldap_bin_dir, $ldap_schema_dir);
 
 $ldap_bin_dir = undef;			# usually in PATH
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index e53bd12ae9..bf68a727eb 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -6,6 +6,10 @@ use Test::More tests => 40;
 use ServerSetup;
 use File::Copy;
 
+# SSL tests are not supported without proper build options
+die "SSL tests not supported without support in build" unless
+	check_pg_config("#define USE_OPENSSL 1");
+
  Some configuration
 
 # This is the hostname used to connect to the server. This cannot be a
diff --git a/src/test/ssl/t/002_scram.pl b/src/test/ssl/t/002_scram.pl
index 67c1409a6e..8e79b6a99f 100644
--- a/src/test/ssl/t/002_scram.pl
+++ b/src/test/ssl/t/002_scram.pl
@@ -8,6 +8,10 @@ use Test::More tests => 5;
 use ServerSetup;
 use File::Copy;
 
+# SSL tests are not supported without proper build
+die "SSL tests not supported without support in build" unless
+	check_pg_config("#define USE_OPENSSL 1");
+
 # This is the hostname used to connect to the server.
 my $SERVERHOSTADDR = '127.0.0.1';
 
-- 
2.16.1

From e8d8071cbb972324eb75de3bb1d700e93e4ee928 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Sat, 17 Feb 2018 22:39:39 +0900
Subject: [PATCH 2/2] Add PROVE_EXTRA_ALLOWED to control optional test suites

By default, SSL and LDAP test suites are not allowed to run as they are
not secure for multi-user environments, which is why they are not part
of check-world.  This commit adds an extra make variable which can be
used to optionally enable them if wanted.  The user can make use of the
variable like that for example:
make -C src/test check PROVE_EXTRA_ALLOWED='ssl ldap'

PROVE_EXTRA_ALLOWED needs to be a list of items separated by
whitespaces, and supports two values for now: 'ssl' and 'ldap' to be
able to run respectively tests in src/test/ssl and src/test/ldap.

In consequence, the SSL and LDAP test suites are added to check-world
but they are skipped except if the user has asked for them to be
enabled.
---
 doc/src/sgml/regress.sgml  | 15 +++
 src/test/Makefile  |  9 -
 src/test/ldap/t/001_auth.pl| 13 -
 src/test/perl/TestLib.pm   | 21 +
 src/test/ssl/t/001_ssltests.pl | 13 -
 src/test/ssl/t/002_scram.pl| 13 -
 6 files changed, 76

Re: pgbench - allow to specify scale as a size

2018-02-17 Thread Erik Rijkers


On 2018-02-17 10:20, Fabien COELHO wrote:

After Karel Moppel piece on pgbench scale/size conversion, it occured
to me that having this as an option would be nice.

https://www.cybertec-postgresql.com/en/a-formula-to-calculate-pgbench-scaling-factor-for-target-db-size/

Here is a attempt at extending --scale so that it can be given a size.

  pgbench -i --scale=124G ...

The approximated database size is also shown in the end-of-run summary.



[pgbench-scale-size-1.patch]


Seem a nice addition but something isn't quite right; with '-s 50' (no 
unit)  I get: 'scale 50 too small':


$ pgbench -is 50
scale 50 too small, rounded to 1
dropping old tables...
creating tables...
generating data...
10 of 10 tuples (100%) done (elapsed 0.13 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.

echo '\dti+ pgbench_accounts*' | psql -qX

  List of relations
 Schema | Name  | Type  |  Owner   |  Table   |  
Size   | Description

+---+---+--+--+-+-
 public | pgbench_accounts  | table | aardvark |  | 
13 MB   |
 public | pgbench_accounts_pkey | index | aardvark | pgbench_accounts | 
2208 kB |

(2 rows)


thanks,

Erik Rijkers

Re: [HACKERS] path toward faster partition pruning

2018-02-17 Thread David Rowley

On 15 February 2018 at 18:57, Amit Langote
 wrote:
> Here is an updated version.

Thanks for sending v27. I've had a quick look over it while I was
working on the run-time prune patch. However, I've not quite managed a
complete pass of this version yet

A couple of things so far:

1. Following loop;

for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
/* Only the default partition accepts nulls. */
if (partition_bound_has_default(boundinfo))
return bms_make_singleton(boundinfo->default_index);
else
return NULL;
}
}

could become:

if (partition_bound_has_default(boundinfo) &&
   !bms_is_empty(keys->keyisnull)
return bms_make_singleton(boundinfo->default_index);
else
return NULL;

2. Is the following form of loop necessary?

for (i = 0; i < partnatts; i++)
{
if (bms_is_member(i, keys->keyisnull))
{
keys->n_eqkeys++;
keyisnull[i] = true;
}
}

Can't this just be:

i = -1;
while ((i = bms_next_member(keys->keyisnull, i)) >= 0)
{
keys->n_eqkeys++;
keyisnull[i] = true;
}

Perhaps you can just Assert(i < partnatts), if you're worried about that.

Similar code exists in get_partitions_for_keys_range

3. Several comments mention partition_bound_bsearch, but there is now
no such function.

4. "us" should be "is"

* not be any unassigned range to speak of, because the range us unbounded

5. The following code is more complex than it needs to be:

/*
* Since partition keys with nulls are mapped to the default range
* partition, we must include the default partition if some keys
* could be null.
*/
if (keys->n_minkeys < partnatts || keys->n_maxkeys < partnatts)
{
for (i = 0; i < partnatts; i++)
{
if (!bms_is_member(i, keys->keyisnotnull))
{
include_def = true;
break;
}
}
}


Instead of the for loop, couldn't you just write:

include_def = (bms_num_members(keys->keyisnotnull) < partnatts);

6. The following comment is not well written:

 * get_partitions_excluded_by_ne_datums
 *
 * Returns a Bitmapset of indexes of partitions that can safely be removed
 * due to each such partition's every allowable non-null datum appearing in
 * a <> opeartor clause.

Maybe it would be better to write:

 * get_partitions_excluded_by_ne_datums
 *
 * Returns a Bitmapset of partition indexes that can safely be removed due to
 * the discovery of <> clauses for each datum value allowed in the partition.

if not, then "opeartor" needs the spelling fixed.

7. "The following"

 * Followig entry points exist to this module.

Are there any other .c files where we comment on all the extern
functions in this way? I don't recall seeing it before.

8. The following may as well just: context.partnatts = partnatts;

context.partnatts = rel->part_scheme->partnatts;


9. Why palloc0? Wouldn't palloc be ok?

context.partkeys = (Expr **) palloc0(sizeof(Expr *) *
context.partnatts);

Also, no need for context.partnatts, just partnatts should be fine.

10. I'd rather see bms_next_member() used here:

/* Add selected partitions' RT indexes to result. */
while ((i = bms_first_member(partindexes)) >= 0)
result = bms_add_member(result, rel->part_rels[i]->relid);

There's not really much use for bms_first_member these days. It can be
slower due to having to traverse the unset lower significant bits each
loop. bms_next_member starts where the previous loop left off.

Will try to review more tomorrow.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Proposal: generic WAL compression

2018-02-17 Thread Oleg Ivanov


Hello everyone!

Unfortunately, I faced the use case with RUM index, in which my patch 
produced

enormously large time overhead (queries execution time is about 2 or 3 times
slower). Only now I finally managed to limit this overhead by 20% or 
even make
it statistically insignificant on my HDD. Nevertheless, SSD evaluation 
is still

required.
So I attach new test totat_test.sh script which includes the bad RUM use 
case

and the new version of the patch.

Thanks everybody for review, it improved the patch a lot. The majority 
of the

listed drawbacks were fixed, the others are discussed below.

On 11/16/2017 08:31 PM, Antonin Houska wrote:

* writeToPtr() and readFromPtr() are applied to the existing code. I think
   this is a reason for a separate diff, to be applied before the actual patch.
Removed the existing code refactoring, that should be a separate patch 
indeed.



* What's the reason for changing FRAGMENT_HEADER_SIZE ? I see no change in
   the data layout that writeFragment() produces.
Diff delta has its own header which consists of one char and one int. To 
make

this point more clear, I defined DIFF_DELTA_HEADER_SIZE and added
2 * DIFF_DELTA_HEADER_SIZE to MAX_DELTA_SIZE.


** "An O(ND) Difference Algorithm and Its Variations" - I think the reference
should contain more information, e.g. name of the author(s). I think I even
saw URLs to scientific papers in the PG source but I'm not 100% sure right
now.

The reference to the paper contains much more information now.


** As for the algorithm itself: Although I didn't have enough patience (and
time) to follow it in every detail for this first review, I think I can
imagine what it is about. Yet I think reading would be easier if the
concepts of alignment and "diff delta" were depicted in the comments,
perhaps using some sort of "ASCII graphics".

I briefly described the main idea of the dynamical programming algorithm
in comments of alignRegions. It can be further refined if you think it is
still unclear now, but I'm afraid the detailed description will result 
in just

copying the referred paper into the comments.


** DiffDeltaOperations enumeration: the individual values should be described.


* computeRegionDiffDelta()

** Does "previous delta" in one of the comments refer to "base delta",
i.e. the delta computation w/o your patch? If so, the comment should
mention it explicitly.

** If you use Min() to initialize the for-loop, it'd be more consistent if you
also used Max() when checking the limits:

for (i = Min(validStart, targetStart); i < Max(validEnd, targetEnd); ++i)

And similarly you can simplify the body of the loop.

I found that this part of code is unnecessary and deleted it.

On 11/17/2017 03:02 PM, Arthur Silva wrote:
I wonder if the algorithm could be changed to operate with units of 2 
or 4 bytes (instead of 1).
Since the algorithm speed is essentially doubled/quadrupled it could 
yield a better runtime/compression trade-off.


Does that make sense?
This approach implies that we cannot detect data shift by one byte, for 
example.

It is suitable for 2- or 4-bytes-aligned data, but looks not general enough.

On 01/23/2018 12:37 AM, Stephen Frost wrote:

* writeToPtr() and readFromPtr() are applied to the existing code. I think
   this is a reason for a separate diff, to be applied before the actual patch.

I'm not really a fan of using these, for my 2c.  I'm not sure how others
feel, but having these macros which change one of the arguments to the
macro (by advancing the pointer) doesn't result in a net improvement to
readability for me.
Sounds reasonable, but the patch uses such constructions as writeToPtr 
rather

often. How do you feel about using writeToPtr(&ptr, data) which which more
explicitly indicates the first argument changing?

On 02/07/2018 09:02 PM, Markus Nullmeier wrote:

One general comment I can already make is that enabling compression
should be made optional, which appears to be a small and easy addition
to the generic WAL API.

Now I'm working on this.I consider creation the function
GenericXLogRegisterBufferEx in addition to GenericXLogRegisterBuffer.
GenericXLogRegisterBufferEx will take a structure with parameters for diff
delta such as maximal alignment length, to which parts of page it must be
applied, etc. And GenericXLogRegisterBufferwill call
GenericXLogRegisterBufferEx with default parameters. This allows using
different compression settings for different indexes. What do you think 
about

such solution?
diff --git a/src/backend/access/transam/generic_xlog.c b/src/backend/access/transam/generic_xlog.c
index 3adbf7b..2ed4146 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -44,8 +44,16 @@
  *-
  */
 #define FRAGMENT_HEADER_SIZE	(2 * sizeof(OffsetNumber))
+#define DIFF_DELTA_HEADER_SIZE	(sizeof(char) + sizeof(int))
 #define MATCH_

Re: [HACKERS] path toward faster partition pruning

2018-02-17 Thread David Rowley

On 2 February 2018 at 23:03, Amit Langote  wrote:
>> 2. PartitionClauseInfo->keyclauses is a list of PartClause which is
>> not a node type. This will cause _copyPartitionClauseInfo() to fail.
>>
>> I'm still not quite sure the best way to fix #2 since PartClause
>> contains a FmgrInfo. I do have a local fix which moves PartClause to
>> primnodes.h and makes it a proper node type. I also added a copy
>> function which does not copy any of the cache fields in PartClause. It
>> just sets valid_cache to false. I didn't particularly think this was
>> the correct fix. I just couldn't think of how exactly this should be
>> done at the time.
>>
>> The attached patch also adds the missing nodetag from
>> PartitionClauseInfo and also fixes up other code so as we don't memset
>> the node memory to zero, as that would destroy the tag. I ended up
>> just having extract_partition_key_clauses do the makeNode call. This
>> also resulted in populate_partition_clauses being renamed to
>> generate_partition_clauses
>
> I started wondering if it's not such a good idea to make
> PartitionClauseInfo a Node at all?  I went back to your earlier message
> [1] where you said that it's put into the Append node for run-time pruning
> to use, but it doesn't sound nice that we'd be putting into the plan
> something that's looks more like scratchpad for the partition.c code.  I
> think we should try to keep PartitionClauseInfo in partition.h and put
> only the list of matched bare clauses into Append.

That sounds like a good idea.

A patch which puts this back is attached.

I've changed the run-time prune patch to process the clause lists
during execution instead.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


make_PartitionClauseInfo_a_nonnode_type.patch
Description: Binary data

pgbench - allow to specify scale as a size

2018-02-17 Thread Fabien COELHO



After Karel Moppel piece on pgbench scale/size conversion, it occured to 
me that having this as an option would be nice.


https://www.cybertec-postgresql.com/en/a-formula-to-calculate-pgbench-scaling-factor-for-target-db-size/

Here is a attempt at extending --scale so that it can be given a size.

  pgbench -i --scale=124G ...

The approximated database size is also shown in the end-of-run summary.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 3dd492c..093e1d4 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -49,7 +49,7 @@
 
 
 transaction type: 
-scaling factor: 10
+scaling factor: 10 (about 149 MiB)
 query mode: simple
 number of clients: 10
 number of threads: 1
@@ -282,6 +282,16 @@ pgbench  options  d
 in order to be big enough to hold the range of account
 identifiers.

+
+   
+The scale can also be specified as an expected database size by
+specifying a unit, assuming around 15 MiB size increments per scale unit.
+For instance, -s 5G will approximate the scale required
+for a 5 GiB database.
+Allowed units are IEC 1024 powers (KiB MiB GiB TiB PiB),
+SI 1000 powers (kB MB GB TB PB) and for convenience
+simple size prefixes K M G T P are aliases for the IEC binary sizes.
+   
   
  
 
@@ -1600,7 +1610,7 @@ END;
 
 starting vacuum...end.
 transaction type: 
-scaling factor: 1
+scaling factor: 1 (about 15 MiB)
 query mode: simple
 number of clients: 10
 number of threads: 1
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index d420942..38eb13d 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -524,7 +524,7 @@ usage(void)
 		   "  -F, --fillfactor=NUM set fill factor\n"
 		   "  -n, --no-vacuum  do not run VACUUM during initialization\n"
 		   "  -q, --quiet  quiet logging (one message each 5 seconds)\n"
-		   "  -s, --scale=NUM  scaling factor\n"
+		   "  -s, --scale=NUM|SIZE scaling factor or expected database size\n"
 		   "  --foreign-keys   create foreign key constraints between tables\n"
 		   "  --index-tablespace=TABLESPACE\n"
 		   "   create indexes in the specified tablespace\n"
@@ -552,7 +552,7 @@ usage(void)
 		   "  -P, --progress=NUM   show thread progress report every NUM seconds\n"
 		   "  -r, --report-latencies   report average latency per command\n"
 		   "  -R, --rate=NUM   target rate in transactions per second\n"
-		   "  -s, --scale=NUM  report this scale factor in output\n"
+		   "  -s, --scale=NUM|SIZE report this scale factor in output\n"
 		   "  -t, --transactions=NUM   number of transactions each client runs (default: 10)\n"
 		   "  -T, --time=NUM   duration of benchmark test in seconds\n"
 		   "  -v, --vacuum-all vacuum all four standard tables before tests\n"
@@ -668,6 +668,87 @@ gotdigits:
 	return ((sign < 0) ? -result : result);
 }
 
+/* return a size in bytes, or exit with an error message
+ */
+static int64
+parse_size(char * s, const char * error_message)
+{
+	static struct { char *name; int64 multiplier; }
+		UNITS[17] = {
+			/* IEC units */
+			{ "KiB", 1024 },
+			{ "MiB", 1024 * 1024 },
+			{ "GiB", 1024 * 1024 * 1024 },
+			{ "TiB", (int64) 1024 * 1024 * 1024 * 1024 },
+			{ "PiB", (int64) 1024 * 1024 * 1024 * 1024 * 1024 },
+			/* SI units */
+			{ "kB", 1000 },
+			{ "MB", 1000 * 1000 },
+			{ "GB", 1000 * 1000 * 1000 },
+			{ "TB", (int64) 1000 * 1000 * 1000 * 1000 },
+			{ "PB", (int64) 1000 * 1000 * 1000 * 1000 * 1000 },
+			/* common/convenient JEDEC usage */
+			{ "KB", 1024 },
+			{ "K", 1024 },
+			{ "M", 1024 * 1024 },
+			{ "G", 1024 * 1024 * 1024 },
+			{ "T", (int64) 1024 * 1024 * 1024 * 1024 },
+			{ "P", (int64) 1024 * 1024 * 1024 * 1024 * 1024 },
+			/* unit */
+			{ "B", 1 },
+	};
+
+	int		len = strlen(s), last = -1, i;
+	int64	size;
+	char	clast;
+
+	/* look for the unit */
+	for (i = 0; i < lengthof(UNITS); i++)
+		if (strcmp(s + len - strlen(UNITS[i].name), UNITS[i].name) == 0)
+			break;
+
+	/* found, or not */
+	if (i < lengthof(UNITS))
+	{
+		last = len - strlen(UNITS[i].name);
+		clast = s[last];
+		s[last] = '\0';
+	}
+	else /* assume bytes */
+		i = lengthof(UNITS) - 1;
+
+	if (!is_an_int(s))
+	{
+		fprintf(stderr, "invalid %s: \"%s\"\n", error_message, s);
+		exit(1);
+	}
+
+	size = strtoint64(s) * UNITS[i].multiplier;
+
+	if (last != -1)
+		s[last] = clast;
+
+	return size;
+}
+
+/* parse scale, returning at least 1 */
+static int
+parse_scale(char * s)
+{
+	int64 size = parse_size(s, "scaling factor");
+	/*
+	 * formula from Kaarel Moppel linear regression on pg 10.1,
+	 * which gives about 15 MiB per pgbench scale unit
+	 */
+	int scale = (int) ceil(0.06616 * size / (1024 * 1024) - 0.511799076);
+	if (scale <= 0)
+	{
+		fprintf(stderr, "scale %s too s

Re: [HACKERS] Bug in to_timestamp().

2018-02-17 Thread Arthur Zakirov

On Wed, Feb 14, 2018 at 05:23:53PM +0100, Dmitry Dolgov wrote:
> * On line 52 there is a removed empty line, without any other changes around

Agree, it is unnecessary change. There was the macro there before.

> * On line 177 there is a new commented out line of code. I assume it's not
> an
>   explanation or something and we don't need it, am I right?

It explains a node type we deal with. But maybe it is not very useful,
so I removed it.

> And Oracle complains about this:
> 
> SELECT to_timestamp('2000 + JUN', ' /') FROM dual
> ORA-01830: date format picture ends before converting entire input string
> 
> SELECT to_timestamp('2000 + JUN', ' ') FROM dual
> ORA-01830: date format picture ends before converting entire input string
> 
> It's sort of corner case, but anyway maybe you would be interested to handle
> it.

I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..1cc8b078f7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
to_timestamp and to_date
skip multiple blank spaces in the input string unless the
FX option is used. For example,
-   to_timestamp('2000JUN', ' 
MON') works, but
+   to_timestamp('2000JUN', ' 
MON') and
+   to_timestamp('2000 JUN', 
'MON') work, but
to_timestamp('2000JUN', 'FX 
MON') returns an error
because to_timestamp expects one space only.
FX must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
   
  
 
+ 
+  
+   to_timestamp and to_date don't
+   skip multiple printable non letter and non digit characters in the input
+   string, but skip them in the formatting string. For example,
+   to_timestamp('2000-JUN', '/MON') and
+   to_timestamp('2000/JUN', '//MON') work, but
+   to_timestamp('2000//JUN', '/MON')
+   returns an error because count of the "/" character in the input string
+   doesn't match count of it in the formatting string.
+  
+ 
+
  
   
Ordinary text is allowed in to_char
diff --git a/src/backend/utils/adt/formatting.c 
b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..a23122d3de 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END  1
 #define NODE_TYPE_ACTION   2
 #define NODE_TYPE_CHAR 3
+#define NODE_TYPE_SEPARATOR4
+#define NODE_TYPE_SPACE5
 
 #define SUFFTYPE_PREFIX1
 #define SUFFTYPE_POSTFIX   2
@@ -961,6 +963,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int 
type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 const KeySuffix *suf, const int *index, int ver, 
NUMDesc *Num);
@@ -1050,6 +1053,16 @@ suff_search(const char *str, const KeySuffix *suf, int 
type)
return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+   /* ASCII printable character, but not letter or digit */
+   return (*str > 0x20 && *str < 0x7F &&
+   !(*str >= 'A' && *str <= 'Z') &&
+   !(*str >= 'a' && *str <= 'z') &&
+   !(*str >= '0' && *str <= '9'));
+}
+
 /* --
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * --
@@ -1325,7 +1338,14 @@ parse_format(FormatNode *node, const char *str, const 
KeyWord *kw,
if (*str == '\\' && *(str + 1) == '"')
str++;
chlen = pg_mblen(str);
-   n->type = NODE_TYPE_CHAR;
+
+   if (ver == DCH_TYPE && is_separator_char(str))
+   n->type = NODE_TYPE_SEPARATOR;
+   else if (isspace((unsigned char) *str))
+   n->type = NODE_TYPE_SPACE;
+   else
+   n->type = NODE_TYPE_CHAR;
+
memcpy(n->character, str, chlen);
n->character[chlen] = '\0';
n->key = NULL;
@@ -2996,12 +3016,73 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar 
*out)

Re: [HACKERS] Runtime Partition Pruning

2018-02-17 Thread David Rowley

Hi,

I've attached an updated patch, now at v10. v9 was short lived due to
the evolution of Amit's which which this based on.

This version is based on Amit's v27 of faster partition pruning [1]
which can be applied atop of ad7dbee36.

I've done a self review of this, but I've not yet done any final
polishing work as the faster partition pruning patch is still
evolving. I will, for example, likely need to do some more work in
nodeAppend.c to add a few more comments and probably think of better
names for a few new things that have made a first appearance in this
version of the patch

[1] 
https://www.postgresql.org/message-id/520f8a71-286d-e36d-34cf-363fd7436...@lab.ntt.co.jp

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


runtime_prune_drowley_v10.patch
Description: Binary data

different results from plpgsql functions related to last changes in master

Re: ALTER TABLE ADD COLUMN fast default

Re: [HACKERS] path toward faster partition pruning

Re: missing toast table for pg_policy

Re: missing toast table for pg_policy

Re: missing toast table for pg_policy

Re: [HACKERS] path toward faster partition pruning

Re: [HACKERS] Cache lookup errors with functions manipulation object addresses

Proposal for changes in official Docker image

Re: master check fails on Windows Server 2008

Re: pgbench - allow to specify scale as a size

Re: pgbench - allow to specify scale as a size

Re: pgbench - allow to specify scale as a size

Re: pgbench - allow to specify scale as a size

Re: pgbench - allow to specify scale as a size

Re: missing toast table for pg_policy

Re: pgbench - allow to specify scale as a size

Re: missing toast table for pg_policy

Re: missing toast table for pg_policy

Re: pgbench - allow to specify scale as a size

Re: missing toast table for pg_policy

Re: pgbench - allow to specify scale as a size

Documenting PROVE_TESTS in section of TAP tests

Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

Re: pgbench - allow to specify scale as a size

Re: [HACKERS] path toward faster partition pruning

Re: [HACKERS] Proposal: generic WAL compression

Re: [HACKERS] path toward faster partition pruning

pgbench - allow to specify scale as a size

Re: [HACKERS] Bug in to_timestamp().

Re: [HACKERS] Runtime Partition Pruning

31 matches

Site Navigation

Mail list logo

Footer information