date:20150713

On Mon, Jul 13, 2015 at 7:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 The two contrib modules this patch added are nowhere near fit for public
 consumption.  They cannot clean up after themselves when dropped:
 ...
 Raw inserts into system catalogs just
 aren't a sane thing to do in extensions.

 I had some thoughts about how we might fix that, without going to the
 rather tedious lengths of creating a complete set of DDL infrastructure
 for CREATE/DROP TABLESAMPLE METHOD.

 First off, the extension API designed for the tablesample patch is
 evidently modeled on the index AM API, which was not a particularly good
 precedent --- it's not a coincidence that index AMs can't be added or
 dropped on-the-fly.  Modeling a server internal API as a set of
 SQL-visible functions is problematic when the call signatures of those
 functions can't be accurately described by SQL datatypes, and it's rather
 pointless and inefficient when none of the functions in question is meant
 to be SQL-callable.  It's even less attractive if you don't think you've
 got a completely stable API spec, because adding, dropping, or changing
 signature of any one of the API functions then involves a pile of
 easy-to-mess-up catalog changes on top of the actually useful work.
 Not to mention then having to think about backwards compatibility of
 your CREATE command's arguments.

 We have a far better model to follow already, namely the foreign data
 wrapper API.  In that, there's a single SQL-visible function that returns
 a dummy datatype indicating that it's an FDW handler, and when called,
 it hands back a struct containing pointers to all the other functions
 that the particular wrapper needs to supply (and, if necessary, the struct
 could have non-function-pointer fields containing other info the core
 system might need to know about the handler).  We could similarly invent a
 pseudotype tablesample_handler and represent each tablesample method by
 a single SQL function returning tablesample_handler.  Any future changes
 in the API for tablesample handlers would then appear as changes in the C
 definition of the struct returned by the handler, which requires no
 SQL-visible thrashing, hence creates no headaches for pg_upgrade, and it
 is pretty easy to design it so that failure to update an external module
 that implements the API results in C compiler errors and/or sane fallback
 behavior.

That's elegant.

 Once we've done that, I think we don't even need a special catalog
 representing tablesample methods.  Given FROM foo TABLESAMPLE
 bernoulli(...), the parser could just look for a function bernoulli()
 returning tablesample_handler, and it's done.  The querytree would have an
 ordinary function dependency on that function, which would be sufficient
 to handle DROP dependency behaviors properly.  (On reflection, maybe
 better if it's bernoulli(internal) returns tablesample_handler,
 so as to guarantee that such a function doesn't create a conflict with
 any user-defined function of the same name.)

 Thoughts?

Regarding the fact that those two contrib modules can be part of a
-contrib package and could be installed, nuking those two extensions
from the tree and preventing the creating of custom tablesample
methods looks like a correct course of things to do for 9.5.

When looking at this patch I was as well surprised that the BERNOUILLI
method can only be applied on a physical relation and was not able to
fire on a materialized set of tuples, say the result of a WITH clause
for example.

 PS: now that I've written this rant, I wonder why we don't redesign the
 index AM API along the same lines.  It probably doesn't matter much at
 the moment, but if we ever get serious about supporting index AM
 extensions, I think we ought to consider doing that.

Any API redesign looks to be clearly 9.6 work IMO at this point.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Amit Kapila

On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko sawada.m...@gmail.com
wrote:

 On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:
 
  I think we need something for pg_upgrade to rewrite existing VMs.
  Otherwise a large read only database would suddenly require a massive
  revacuum after upgrade, which seems bad. That can wait for now until
we all
  agree this patch is sound.
 
 
  Since we need to rewrite the vm map, I think we should call the new
map
  vfm
 
  That way we will be able to easily check whether the rewrite has been
  conducted on all relations.
 
  Since the maps are just bits there is no other way to tell that a map
has
  been rewritten

 To avoid revacuum after upgrade, you meant that we need to rewrite
 each bit of vm to corresponding bits of vfm, if it's from
 not-supporting vfm version(i.g., 9.5 or earlier ). right?
 If so, we will need to do whole scanning table, which is expensive as
well.
 Clearing vm and do revacuum would be nice, rather than doing in
 upgrading, I think.


How will you ensure to have revacuum for all the tables after
upgrading?  Till the time Vacuum is done on the tables that
have vm before upgrade, any queries on those tables can
become slower.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] intarray planning/exeuction problem with multiple operators

2015-07-13 Thread Uriy Zhuravlev

On Sunday 12 July 2015 23:12:41 Jeff Janes wrote:
 I've found an interesting performance problem in the intarray extension
 module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
 and 9.6devel.

Hello.
Look at my patch. Maybe he solves this problem.
https://commitfest.postgresql.org/5/253/

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Mon, Jul 13, 2015 at 9:22 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 I might missing something but, these functions will generate WAL?
 If they does, we will face the situation where we need to wait
 forever, Fujii-san pointed out.

No, those functions are here to manipulate the metadata defining the
quorum/priority set. We definitely do not want something that
generates WAL.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Enhanced ALTER OPERATOR

2015-07-13 Thread Uriy Zhuravlev

 Hello hackers.
 
Attached is a new version of patch:
* port syntax from NULL to truth NONE
* fix documentation (thanks Heikki)
* rebase to master
 
Thanks.

-- 
Uriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/ref/alter_operator.sgml b/doc/src/sgml/ref/alter_operator.sgml
index bdb2d02..237e4f1 100644
--- a/doc/src/sgml/ref/alter_operator.sgml
+++ b/doc/src/sgml/ref/alter_operator.sgml
@@ -26,6 +26,11 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 
 ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } )
SET SCHEMA replaceablenew_schema/replaceable
+
+ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/replaceable | NONE } , { replaceableright_type/replaceable | NONE } )
+SET ( {  RESTRICT = { replaceable class=parameterres_proc/replaceable | NONE }
+   | JOIN = { replaceable class=parameterjoin_proc/replaceable | NONE }
+ } [, ... ] )
 /synopsis
  /refsynopsisdiv
 
@@ -34,8 +39,7 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 
   para
commandALTER OPERATOR/command changes the definition of
-   an operator.  The only currently available functionality is to change the
-   owner of the operator.
+   an operator.
   /para
 
   para
@@ -98,6 +102,25 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
  /para
 /listitem
/varlistentry
+
+   varlistentry
+ termreplaceable class=parameterres_proc/replaceable/term
+ listitem
+   para
+ The restriction selectivity estimator function for this operator; write NONE to remove existing selectivity estimator.
+   /para
+  /listitem
+   /varlistentry
+
+   varlistentry
+ termreplaceable class=parameterjoin_proc/replaceable/term
+ listitem
+   para
+ The join selectivity estimator function for this operator; write NONE to remove existing selectivity estimator.
+   /para
+ /listitem
+   /varlistentry
+
   /variablelist
  /refsect1
 
@@ -109,6 +132,13 @@ ALTER OPERATOR replaceablename/replaceable ( { replaceableleft_type/repla
 programlisting
 ALTER OPERATOR @@ (text, text) OWNER TO joe;
 /programlisting/para
+
+  para
+Change the restriction and join selectivity estimator function of a custom operator literala  b/literal for type typeint[]/type:
+programlisting
+ALTER OPERATOR  (_int4, _int4) SET (RESTRICT = _int_contsel, JOIN = _int_contjoinsel);
+/programlisting/para
+
  /refsect1
 
  refsect1
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 072f530..4c5c9c6 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -28,8 +28,10 @@
 #include catalog/pg_operator.h
 #include catalog/pg_proc.h
 #include catalog/pg_type.h
+#include commands/defrem.h
 #include miscadmin.h
 #include parser/parse_oper.h
+#include parser/parse_func.h
 #include utils/acl.h
 #include utils/builtins.h
 #include utils/lsyscache.h
@@ -53,7 +55,7 @@ static Oid OperatorShellMake(const char *operatorName,
   Oid leftTypeId,
   Oid rightTypeId);
 
-static void OperatorUpd(Oid baseId, Oid commId, Oid negId);
+static void ShellOperatorUpd(Oid baseId, Oid commId, Oid negId);
 
 static Oid get_other_operator(List *otherOp,
    Oid otherLeftTypeId, Oid otherRightTypeId,
@@ -563,7 +565,7 @@ OperatorCreate(const char *operatorName,
 		commutatorId = operatorObjectId;
 
 	if (OidIsValid(commutatorId) || OidIsValid(negatorId))
-		OperatorUpd(operatorObjectId, commutatorId, negatorId);
+		ShellOperatorUpd(operatorObjectId, commutatorId, negatorId);
 
 	return address;
 }
@@ -633,7 +635,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
 }
 
 /*
- * OperatorUpd
+ * ShellOperatorUpd
  *
  *	For a given operator, look up its negator and commutator operators.
  *	If they are defined, but their negator and commutator fields
@@ -642,7 +644,7 @@ get_other_operator(List *otherOp, Oid otherLeftTypeId, Oid otherRightTypeId,
  *	which are the negator or commutator of each other.
  */
 static void
-OperatorUpd(Oid baseId, Oid commId, Oid negId)
+ShellOperatorUpd(Oid baseId, Oid commId, Oid negId)
 {
 	int			i;
 	Relation	pg_operator_desc;
@@ -864,3 +866,154 @@ makeOperatorDependencies(HeapTuple tuple)
 
 	return myself;
 }
+
+/*
+ * Operator update aka ALTER OPERATOR for RESTRICT, JOIN
+ */
+void OperatorUpd(Oid classId,
+ Oid baseId,
+ List *operator_params)
+{
+	int			i;
+	ListCell	*pl;
+	Relation	catalog;
+	HeapTuple	tup;
+	Oid 		operator_param_id = 0;
+	bool		nulls[Natts_pg_operator];
+	bool		replaces[Natts_pg_operator];
+	Datum		values[Natts_pg_operator];
+
+	for (i = 0; i  Natts_pg_operator; ++i)
+	{
+		values[i] = (Datum) 0;
+		replaces[i] = false;
+		nulls[i] = false;
+	}
+
+	catalog = heap_open(classId, RowExclusiveLock);
+	tup =

Re: [HACKERS] multivariate statistics / patch v7

2015-07-13 Thread Kyotaro HORIGUCHI

Hi, Thanks for the detailed explaination. I misunderstood the
code (more honest speaking, din't look so close there). Then I
looked it closer.


At Wed, 08 Jul 2015 03:03:16 +0200, Tomas Vondra tomas.von...@2ndquadrant.com 
wrote in 559c76d4.2030...@2ndquadrant.com
 FWIW this was a stupid bug in update_match_bitmap_histogram(), which
 initially handled only AND clauses, and thus assumed the match of a
 bucket can only decrease. But for OR clauses this is exactly the
 opposite (we assume no buckets match and add buckets matching at least
 one of the clauses).
 
 With this fixed, the estimates look like this:
 

 IMHO pretty accurate estimates - no issue with OR clauses.

Ok, I understood the diferrence between what I thought and what
you say. The code is actually concious of OR clause but is looks
somewhat confused.

Currently choosing mv stats in clauselist_selectivity can be
outlined as following,

1. find_stats finds candidate mv stats containing *all*
   attributes appeared in the whole clauses regardless of and/or
   exprs by walking whole the clause tree.

   Perhaps this is the measure to early bailout.

2.1. Within every disjunction elements, collect mv-related
   attributes while checking whether the all leaf nodes (binop or
   ifnull) are compatible by (eventually) walking whole the
   clause tree.

2.2. Check if all the collected attribute are contained in
   mv-stats columns.

3. Finally, clauseset_mv_selectivity_histogram() (and others).

   This funciton applies every ExprOp onto every attribute in
   every histogram backes and (tries to) make the boolean
   operation of the result bitmaps.

I have some comments on the implement and I also try to find the
solution for them.


1. The flow above looks doing  very similiar thins repeatedly.

2. I believe what the current code does can be simplified.

3. As you mentioned in comments, some additional infrastructure
   needed.

After all, I think what we should do after this are as follows,
as the first step.

- Add the means to judge the selectivity operator(?) by other
  than oprrest of the op of ExprOp. (You missed neqsel already)

  I suppose one solution for this is adding oprmvstats taking
  'm', 'h' and 'f' and their combinations. Or for the
  convenience, it would be a fixed-length string like this.

  oprname | oprmvstats
  =   | 'mhf'
| 'mhf'
 | 'mh-'
 | 'mh-'
  =  | 'mh-'
  =  | 'mh-'

  This would make the code in clause_is_mv_compatible like this.

   oprmvstats = get_mvstatsset(expr-opno); /* bitwise representation */
   if (oprmvstats  types)
   {
  *attnums = bms_add_member(*attnums, var-varattno);
  return true;
   }
   return false;

- Current design just manage to work but it is too complicated
  and hardly have affinity with the existing estimation
  framework. I proposed separation of finding stats phase and
  calculation phase, but I would like to propose transforming
  RestrictInfo(and finding mvstat) phase and running the
  transformed RestrictInfo phase after looking close to the
  patch.

  I think transforing RestrictInfo makes the situnation
  better. Since it nedds different information, maybe it is
  better to have new struct, say, RestrictInfoForEstimate
  (boo!). Then provide mvstatssel() to use in the new struct.
  The rough looking of the code would be like below. 

  clauselist_selectivity()
  {
...
RestrictInfoForEstmate *esclause =
  transformClauseListForEstimation(root, clauses, varRelid);
...

return clause_selectivity(esclause):
  }

  clause_selectivity(RestrictInfoForEstmate *esclause)
  {
if (IsA(clause, RestrictInfo))...
if (IsA(clause, RestrictInfoForEstimate))
{
   RestrictInfoForEstimate *ecl = (RestrictInfoForEstimate*) clause;
   if (ecl-selfunc)
   {
  sx = ecl-selfunc(root, ecl);
   }
}
if (IsA(clause, Var))...
  }

  
  transformClauseListForEstimation(...)
  {
...

relid = collect_mvstats_info(root, clause, attlist);
if (!relid) return;
if (get_mvstats_hook)
 mvstats = (*get_mvstats_hoook) (root, relid, attset);
else
 mvstats = find_mv_stats(root, relid, attset))
  }
  ...

 I've pushed this to github [1] but I need to do some additional
 fixes. I also had to remove some optimizations while fixing this, and
 will have to reimplement those.
 
 That's not to say that the handling of OR-clauses is perfectly
 correct. After looking at clauselist_selectivity_or(), I believe it's
 a bit broken and will need a bunch of fixes, as explained in the
 FIXMEs I pushed to github.
 
 [1] https://github.com/tvondra/postgres/tree/mvstats

I don't see whether it is doable or not, and I suppose you're
unwilling to change the big picture, so I will consider the idea
and will show you the result, if it turns out to be possible and
promising.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Amit Kapila

On Mon, Jul 13, 2015 at 3:26 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:

On 07/12/2015 06:53 AM, Amit Kapila wrote:

For having duration, I think you need to use gettimeofday or some
similar call to calculate the wait time, now it will be okay for the
cases where wait time is longer, however it could be problematic for
the cases if the waits are very small (which could probably be the
case for LWLocks)

gettimeofday already used in our patch and it gives enough accuracy (in
microseconds), especially when lwlock become a problem. Also we tested our
realization and it gives overhead less than 1%. (
http://www.postgresql.org/message-id/559d4729.9080...@postgrespro.ru,
testing part).

I think that test is quite generic, we should test more combinations
(like use -M prepared option as that can stress LWLock machinery
somewhat more) and other type of tests which can stress the part
of code where gettimeofday() is used in patch.

We need help here with testing on other platforms. I used gettimeofday
because of builtin module instr_time.h that already gives cross-platform
tested functions for measuring, but I'm planning to make similar
implementation for monotonic functions based on clock_gettime for more
accuracy.

2) Accumulate per backend statistics about each wait event type: number
of occurrences and total duration. With this statistics user can identify
system bottlenecks again without sampling.

Number #2 will be provided as a separate patch.
Number #1 require different concurrency model. ldus will extract it from
waits monitoring patch shortly.

Sure, I think those should be evaluated as separate patches,
and I can look into those patches and see if something more
can be exposed as part of this patch which we can be reused in
those patches.

If you agree I'l do some modifications to your patch, so we can later
extend it with our other modifications. Main issue is that one variable for
all types is not enough. For flexibity in the future we need at least two -
class and event, for example class=LWLock, event=ProcArrayLock, or
class=Storage, and event=READ.

I have already proposed something very similar in this thread [1]
(where instead of class, I have used wait_event_type) to which
Robert doesn't agree, so here I think before writing code, it seems
prudent to get an agreement about what kind of User-Interface
would satisfy the requirement and will be extendible for future as
well. I think it will be better if you can highlight some points about
what kind of user-interface is better (extendible) and the reasons for
same.

[1] (Refer option-3) -
http://www.postgresql.org/message-id/caa4ek1j6cg_jym00nrwt4n8r78zn4ljoqy_zu1xrzxfq+me...@mail.gmail.com

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Freeze avoidance of very large table.

On Mon, Jul 13, 2015 at 9:03 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:
 
  I think we need something for pg_upgrade to rewrite existing VMs.
  Otherwise a large read only database would suddenly require a massive
  revacuum after upgrade, which seems bad. That can wait for now until we
  all
  agree this patch is sound.
 
 
  Since we need to rewrite the vm map, I think we should call the new
  map
  vfm
 
  That way we will be able to easily check whether the rewrite has been
  conducted on all relations.
 
  Since the maps are just bits there is no other way to tell that a map
  has
  been rewritten

 To avoid revacuum after upgrade, you meant that we need to rewrite
 each bit of vm to corresponding bits of vfm, if it's from
 not-supporting vfm version(i.g., 9.5 or earlier ). right?
 If so, we will need to do whole scanning table, which is expensive as
 well.
 Clearing vm and do revacuum would be nice, rather than doing in
 upgrading, I think.


 How will you ensure to have revacuum for all the tables after
 upgrading?

 We use script file which are generated by pg_upgrade.

I haven't followed this thread closely, but I am sure you recall that
vacuumdb has a parallel mode.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:

 I think we need something for pg_upgrade to rewrite existing VMs.
 Otherwise a large read only database would suddenly require a massive
 revacuum after upgrade, which seems bad. That can wait for now until we all
 agree this patch is sound.


 Since we need to rewrite the vm map, I think we should call the new map
 vfm

 That way we will be able to easily check whether the rewrite has been
 conducted on all relations.

 Since the maps are just bits there is no other way to tell that a map has
 been rewritten

To avoid revacuum after upgrade, you meant that we need to rewrite
each bit of vm to corresponding bits of vfm, if it's from
not-supporting vfm version(i.g., 9.5 or earlier ). right?
If so, we will need to do whole scanning table, which is expensive as well.
Clearing vm and do revacuum would be nice, rather than doing in
upgrading, I think.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson memissemer...@gmail.com wrote:
 Hello,

 Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

 Adding group labels do have a lot of values but as Amit has pointed out,
 with little modification, they can be included in GUC as well. It will not
 make it any more complex.

 On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.

 I was trying to figure out how the JSON metadata can be used.
 It would have to be set using a given set of functions. Right?
 I am sorry this question is very basic.

 The functions could be something like:
 1. pg_add_synch_set(set_name NAME, quorum INT, is_priority bool, set_members
 VARIADIC)

 This will be used to add a sync set. The set_members can be individual
 elements of another set name. The parameter is_priority is used to decide
 whether the set is priority (true) set or quorum (false). This function call
 will  create a folder pg_syncinfo/groups/$NAME and store the json blob?

 The root group would be automatically sset by finding the group which is not
 included in other groups? or can be set by another function?

 2. pg_modify_sync_set(set_name NAME, quorum INT, is_priority bool,
 set_members VARIADIC)

 This will update the pg_syncinfo/groups/$NAME to store the new values.

 3. pg_drop_synch_set(set_name NAME)

 This will update the pg_syncinfo/groups/$NAME folder. Also all the groups
 which included this would be updated?

 4. pg_show_synch_set()

 this will display the current sync setting in json format.

 Am I missing something?

 Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
 format already known to users?

 In a real-life scenario, at most how many groups and nesting would be
 expected?


I might missing something but, these functions will generate WAL?
If they does, we will face the situation where we need to wait
forever, Fujii-san pointed out.


Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Andres Freund

On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
 Even If we implement rewriting tool for vm into pg_upgrade, it will
 take time as much as revacuum because it need whole scanning table.

Why would it? Sure, you can only set allvisible and not the frozen bit,
but that's fine. That way the cost for freezing can be paid over time.

If we require terrabytes of data to be scanned, including possibly
rewriting large portions due to freezing, before index only scans work
and most vacuums act in a partial manner the migration to 9.6 will be a
major pain for our users.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

On Mon, Jul 13, 2015 at 7:46 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 3:39 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Tue, Jul 7, 2015 at 9:07 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On 6 July 2015 at 17:28, Simon Riggs si...@2ndquadrant.com wrote:
 
  I think we need something for pg_upgrade to rewrite existing VMs.
  Otherwise a large read only database would suddenly require a massive
  revacuum after upgrade, which seems bad. That can wait for now until we
  all
  agree this patch is sound.
 
 
  Since we need to rewrite the vm map, I think we should call the new
  map
  vfm
 
  That way we will be able to easily check whether the rewrite has been
  conducted on all relations.
 
  Since the maps are just bits there is no other way to tell that a map
  has
  been rewritten

 To avoid revacuum after upgrade, you meant that we need to rewrite
 each bit of vm to corresponding bits of vfm, if it's from
 not-supporting vfm version(i.g., 9.5 or earlier ). right?
 If so, we will need to do whole scanning table, which is expensive as
 well.
 Clearing vm and do revacuum would be nice, rather than doing in
 upgrading, I think.


 How will you ensure to have revacuum for all the tables after
 upgrading?

We use script file which are generated by pg_upgrade.

  Till the time Vacuum is done on the tables that
 have vm before upgrade, any queries on those tables can
 become slower.

Even If we implement rewriting tool for vm into pg_upgrade, it will
take time as much as revacuum because it need whole scanning table.
I meant that we rewrite vm using by existing facility (i.g., vacuum
(freeze)), instead of implementing new rewriting tool for vm.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Ildus Kurbangaliev

On 07/13/2015 01:36 PM, Amit Kapila wrote:

[1] (Refer option-3) -
http://www.postgresql.org/message-id/caa4ek1j6cg_jym00nrwt4n8r78zn4ljoqy_zu1xrzxfq+me...@mail.gmail.com

The idea of splitting to classes and events does not confict with your
current implementation. That is not an issue to show only one value
in pg_stat_activity and more detailed two parameters in other places.
The base reason is that DBA will want to see grouped information about
class (for example wait time of whole `Storage` class).

About user interface it depends from what we want to be monitored. In
our patch we have profiling and history. In profiling we show class,
event, wait_time and count. In history we save all parameters of wait.

Other problem of pg_stat_activity that we can not see all processes
there (checkpointer for example). So we anyway need separate view for
monitoring purposes.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Noah Misch

On Mon, Jul 13, 2015 at 06:19:49PM -0400, Andrew Dunstan wrote:
 On 7/13/2015 5:36 PM, Andrew Dunstan wrote:
 hstore_plpython.o: In function `hstore_to_plpython':
 /home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
 undefined reference to `PLyUnicode_FromStringAndSize'

 The functions are in fact apparently built - the names are present in the
 object file and the DLL.

Per objdump -x src/pl/plpython/plpython3.dll | less -pOrdinal/Name, those
symbols aren't exported.  The Cygwin toolchain appears to export every
function until you mark one with __declspec (dllexport), after which it
exports just the ones so marked.  Since plpython3.dll has an explicit
__declspec on PyInit_plpy(), that's the sole function exported.  Adding
-Wl,--export-all-symbols to the PL/Python link lets the build complete; I have
not researched whether it is a proper fix.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forensic recovery deleted pgdump custom format file

On Tue, Jul 14, 2015 at 11:20 AM, David Guimaraes skys...@gmail.com wrote:
 Yeah bingo

Hm. While there is a magic-code header for the custom format, by
looking at the code I am not seeing any traces of a similar thing at
the end of the dump file (_CloseArchive in pg_backup_custom.c), and I
don't recall wither that there is an estimation of the size of the
dump either in the header. If those files were stored close to each
other, one idea may be to look for the next header present. or to
attempt to roughly estimate the size that they would have I am afraid.
In any case, applying reverse engineering methods seems like the most
reliable method to reconstitute an archive handler that could be used
by pg_restore or pg_dump, but perhaps others have other ideas.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions



On 07/13/2015 05:41 AM, Shulgin, Oleksandr wrote:
On Mon, Jul 13, 2015 at 9:44 AM, Pavel Stehule 
pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com wrote:



To reiterate: for my problem, that is escaping numerics that
can potentially overflow[1] under ECMAScript standard, I want
to be able to override the code that outputs the numeric
converted to string.  There is no way in current
implementation to do that *at all*, short of copying all the
code involved in producing JSON output and changing it at
certain points.  One could try re-parsing JSON instead, but
that doesn't actually solve the issue, because type
information is lost forever at that point.

The whitespace unification was a mere side-effect of the
original effort on this patch.


The dynamic type change is some what I would not to do in
database, really :)

If you afraid about overflow, then convert numeric to string
immediately - in this case, the client have to support both
variant - so immediate cast should not be a problem.


Yeah, but how would you do that in context of a logical replication 
decoding plugin?  I've tried a number of tricks for that, including, 
but not limited to registering phony types to wrap numeric type and 
replacing the OID of numeric with this custom type OID in TupleDesc, 
but then again one has to register that as known record type, etc.


Anyway this check on max number should be implemented in our
JSON(b) out functions (as warning?).


Not really, since this is a problem of ECMAScript standard, not JSON 
spec.  For example, Python module for handling JSON doesn't suffer 
from this overflow problem,


The thing is, we cannot know which clients are going to consume the 
stream of decoded events, and if it's some implementation of 
JavaScript, it can suffer silent data corruption if we don't guard 
against that in the logical decoding plugin.


Hope that makes it clear. :-)




Yes, but I think the plugin is the right place to do it. What is more, 
this won't actually prevent you completely from producing non-ECMAScript 
compliant JSON, since json or jsonb values containing offending numerics 
won't be caught, AIUI. But a fairly simple to write function that 
reparsed and fixed the JSON inside the decoder would work.


cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Laurent Laborde

Friendly greetings !

in file src/bin/scripts/print.c line 421 :
need_recordsep = false;
then set to true line 424.

Now i'm pretty sure it's a meaningless bug without any consequence (the
commit that introduced it is 15 years old).

There is a lot of (apparently) dead assignment here and there but some
assigment could be used for debugging purpose so ... why not. But this one ?

-- 
Laurent ker2x Laborde
DBA gandi.net \o/

Re: [HACKERS] Freeze avoidance of very large table.

On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
 Even If we implement rewriting tool for vm into pg_upgrade, it will
 take time as much as revacuum because it need whole scanning table.

 Why would it? Sure, you can only set allvisible and not the frozen bit,
 but that's fine. That way the cost for freezing can be paid over time.

 If we require terrabytes of data to be scanned, including possibly
 rewriting large portions due to freezing, before index only scans work
 and most vacuums act in a partial manner the migration to 9.6 will be a
 major pain for our users.

Ah, If we set all bit as not all-frozen,  we don't need to whole table
scanning, only scan vm.
And I agree with this.

But please image the case where old cluster has table which is very
large, read-only and vacuum freeze is done.
In this case, the all-frozen bit of such table in new cluster will not
set, unless we do vacuum freeze again.
The information of all-frozen of such table is lacked.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions



On 07/13/2015 10:46 AM, Ryan Pedela wrote:
On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr 
oleksandr.shul...@zalando.de mailto:oleksandr.shul...@zalando.de 
wrote:



To reiterate: for my problem, that is escaping numerics that can
potentially overflow[1] under ECMAScript standard, I want to be
able to override the code that outputs the numeric converted to
string.  There is no way in current implementation to do that *at
all*, short of copying all the code involved in producing JSON
output and changing it at certain points.  One could try
re-parsing JSON instead, but that doesn't actually solve the
issue, because type information is lost forever at that point.


I had the exact same problem with Node.js and client-side Javascript. 
That is why I wrote json-bignum [1] for Node.js. There is a bower 
version [2] as well. The only caveat is that it is slower than the 
native JSON functions, but I am happy to receive PRs to improve 
performance.


1. https://github.com/datalanche/json-bignum
2. https://libraries.io/bower/json-bignum

As far as large numbers in JSON, I think Postgres is doing the right 
thing and should not be changed. It is Javascript that is stupid here, 
and I don't think it is wise to add something to core just because one 
client does stupid things with large numbers. In addition, ES7 is 
introducing value types which will hopefully solve the large number 
problem in Javascript.


The random whitespace issue is valid in my opinion and should be fixed.






OK, I think we're getting a consensus here. It's good to know the JS 
world is acquiring some sanity in this area.


Let's just fix the whitespace and be done, without all the callback 
stuff. That should be a much smaller patch.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson memissemer...@gmail.com wrote:
 Hello,

 Tue, Jul 7, 2015 at 02:56 AM, Josh Berkus wrote:
 pro-JSON:

 * standard syntax which is recognizable to sysadmins and devops.
 * can use JSON/JSONB functions with ALTER SYSTEM SET to easily make
 additions/deletions from the synch rep config.
 * can add group labels (see below)

 Adding group labels do have a lot of values but as Amit has pointed out,
 with little modification, they can be included in GUC as well.

Or you can extend the custom GUC mechanism so that we can
specify the groups by using them, for example,

quorum_commit.mygroup1 = 'london, nyc'
quorum_commit.mygruop2 = 'tokyo, pune'
synchronous_standby_names = '1(mygroup1), 1(mygroup2)'

 On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.

 I was trying to figure out how the JSON metadata can be used.
 It would have to be set using a given set of functions.

So we can use only such a set of functions to configure synch rep?
I don't like that idea. Because it prevents us from configuring that
while the server is not running.

 Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
 format already known to users?

At least currently ALTER SYSTEM cannot accept the JSON data
(e.g., the return value of JSON function like json_build_object())
as the setting value. So I'm not sure how friendly ALTER SYSTEM
and JSON format really. If you want to argue that, probably you
need to improve ALTER SYSTEM so that JSON can be specified.

 In a real-life scenario, at most how many groups and nesting would be
 expected?

I don't think that many groups and nestings are common.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Ryan Pedela

On Mon, Jul 13, 2015 at 1:30 AM, Shulgin, Oleksandr 
oleksandr.shul...@zalando.de wrote:


 To reiterate: for my problem, that is escaping numerics that can
 potentially overflow[1] under ECMAScript standard, I want to be able to
 override the code that outputs the numeric converted to string.  There is
 no way in current implementation to do that *at all*, short of copying all
 the code involved in producing JSON output and changing it at certain
 points.  One could try re-parsing JSON instead, but that doesn't actually
 solve the issue, because type information is lost forever at that point.


I had the exact same problem with Node.js and client-side Javascript. That
is why I wrote json-bignum [1] for Node.js. There is a bower version [2] as
well. The only caveat is that it is slower than the native JSON functions,
but I am happy to receive PRs to improve performance.

1. https://github.com/datalanche/json-bignum
2. https://libraries.io/bower/json-bignum

As far as large numbers in JSON, I think Postgres is doing the right thing
and should not be changed. It is Javascript that is stupid here, and I
don't think it is wise to add something to core just because one client
does stupid things with large numbers. In addition, ES7 is introducing
value types which will hopefully solve the large number problem in
Javascript.

The random whitespace issue is valid in my opinion and should be fixed.

Thanks,
Ryan Pedela

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

On Mon, Jul 13, 2015 at 9:19 PM, Ildus Kurbangaliev
i.kurbangal...@postgrespro.ru wrote:
 On 07/13/2015 01:36 PM, Amit Kapila wrote:

 Other problem of pg_stat_activity that we can not see all processes there
 (checkpointer for example). So we anyway need separate view for monitoring
 purposes.

+1

When there are many walsender processes running,
maybe I'd like to see their wait events.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Default Roles (was: Additional role attributes)

On Wed, May 13, 2015 at 12:07 PM, Stephen Frost sfr...@snowman.net wrote:
 All,

 This patch gets smaller and smaller.

 Upon reflection I realized that, with default roles, it's entirely
 unnecssary to change how the permission checks happen today- we can
 simply add checks to them to be looking at role membership also.  That's
 removed the last of my concerns regarding any API breakage for existing
 use-cases and has greatly simplified things overall.

 This does change the XLOG functions to require pg_monitor, as discussed
 on the other thread where it was pointed out by Heikki that the XLOG
 location information could be used to extract sensitive information
 based on what happens during compression.  Adding docs explaining that
 is on my to-do list for tomorrow.

 * Stephen Frost (sfr...@snowman.net) wrote:
 Andres suggested that we drop the REPLICATION role attribute and just
 use membership in pg_replication instead.  That's turned out quite
 fantastically as we can now handle upgrades without breaking anything-
 CREATE ROLE and ALTER ROLE still accept the attribute but simply grant
 pg_replication to the role instead, and postinit.c has been changed to
 check role membership similar to other pg_hba role membership checks
 when a replication connection comes in.  Hat's off to Andres for his
 suggestion.

 It's also unnecessary to change how the REPLICATION role attribute
 functions today.  This patch does add the pg_replication role, but it's
 only allowed to execute the various pg_logical and friends functions and
 not to actually connect as a REPLICATION user.  Connecting as a
 REPLICATION user allows you to stream the entire contents of the
 backend, after all, so it makes sense to have that be independent.

 I added another default role which allows the user to view
 pg_show_file_settings, as that seemed useful to me.  The diffstat for
 that being something like 4 additions without docs and maybe 10 with.
 More documentation would probably be good though and I'll look at adding
 to it.

 Most of the rest of what I've done has simply been reverting back to
 what we had.  The patch is certainly far easier to verify by reading
 through it now, as the changes are right next to each other, and the
 regression output changes are much smaller.

 Thoughts?  Comments?  Suggestions?

he documents of the functions which the corresponding default roles
are added by this patch need to be updated. For example, the description
of pg_xlog_replay_pause() says Pauses recovery immediately (restricted
to superusers).. I think that the description needs to mention
the corresponding default role pg_replay. Otherwise, it's difficult for
users to see which default role is related to the function they want to use.
Or probably we can add the table explaining all the relationships between
default roles and corresponding operations. And it's useful.

Why do we allow users to change the attributes of default roles?
For example, ALTER ROLE default_role or GRANT ... TO default_role.
Those changes are not dumped by pg_dumpall. So if users change
the attributes for some reasons but they disappear via pg_dumpall,
maybe the system goes into unexpected state.

I think that it's better to allow the roles with pg_monitor to
execute pgstattuple functions. They are usually used for monitoring.
Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

Michael Paquier michael.paqu...@gmail.com writes:
 Regarding the fact that those two contrib modules can be part of a
 -contrib package and could be installed, nuking those two extensions
 from the tree and preventing the creating of custom tablesample
 methods looks like a correct course of things to do for 9.5.

TBH, I think the right thing to do at this point is to revert the entire
patch and send it back for ground-up rework.  I think the high-level
design is wrong in many ways and I have about zero confidence in most
of the code details as well.

I'll send a separate message about high-level issues, but as far as code
details go, I started to do some detailed code review last night and only
got through contrib/tsm_system_rows/tsm_system_rows.c before deciding it
was hopeless.  Let's have a look at my notes:

 * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

Obsolete copyright date.

 * IDENTIFICATION
 *contrib/tsm_system_rows_rowlimit/tsm_system_rows.c

Wrong filename.  (For the moment, I'll refrain from any value judgements
about the overall adequacy or quality of the comments in this patch, and
just point out obvious errors that should have been caught in review.)

typedef struct
{
SamplerRandomState randstate;
uint32  seed;   /* random seed */
BlockNumber nblocks;/* number of block in relation */
int32   ntuples;/* number of tuples to return */
int32   donetuples; /* tuples already returned */
OffsetNumber lt;/* last tuple returned from current block */
BlockNumber step;   /* step size */
BlockNumber lb; /* last block visited */
BlockNumber doneblocks; /* number of already returned blocks */
} SystemSamplerData;

This same typedef name is defined in three different places in the patch
(tablesample/system.c, tsm_system_rows.c, tsm_system_time.c).  While that
might not amount to a bug, it's sure a recipe for confusion, especially
since the struct definitions are all different.

Datum
tsm_system_rows_init(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);
uint32  seed = PG_GETARG_UINT32(1);
int32   ntuples = PG_ARGISNULL(2) ? -1 : PG_GETARG_INT32(2);

This is rather curious coding.  Why should this function check only
one of its arguments for nullness?  If it needs to defend against
any of them being null, it really needs to check all.  But in point of
fact, this function is declared STRICT, which means it's a violation of
the function call protocol if the core code ever passes a null to it,
and so this test ought to be dead code.

A similar pattern of ARGISNULL checks in declared-strict functions exists
in all the tablesample modules, not just this one, showing that this is an
overall design error not just a thinko here.  My inclination would be to
make the core code enforce non-nullness of all tablesample arguments so as
to make it unnecessary to check strictness of the tsm*init functions, but
in any case it is Not Okay to just pass nulls willy-nilly to strict C
functions.

Also, I find this coding pretty sloppy even without the strictness angle,
because the net effect of this way of dealing with nulls is that an
argument-must-not-be-null complaint is reported as out of range,
which is both confusing and the wrong ERRCODE.

if (ntuples  1)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(invalid sample size),
 errhint(Sample size must be positive integer value.)));

I don't find this to be good error message style.  The secondary comment
is not a hint, it's an ironclad statement of what you did wrong, so if
we wanted to phrase it like this it should be an errdetail not errhint.
But the whole thing is overly cute anyway because there is no reason at
all not to just say what we mean in the primary error message, eg
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(sample size must be greater than zero)));

While we're on the subject, what's the reason for disallowing a sample
size of zero?  Seems like a reasonable edge case.

/* All blocks have been read, we're done */
if (sampler-doneblocks  sampler-nblocks ||
sampler-donetuples = sampler-ntuples)
PG_RETURN_UINT32(InvalidBlockNumber);

Okay, I lied, I *am* going to complain about this comment.  Comments that
do not accurately describe the code they're attached to are worse than
useless.

/*
 * Cleanup method.
 */
Datum
tsm_system_rows_end(PG_FUNCTION_ARGS)
{
TableSampleDesc *tsdesc = (TableSampleDesc *) PG_GETARG_POINTER(0);

pfree(tsdesc-tsmdata);

PG_RETURN_VOID();
}

This cleanup method is a waste of code space.  There is no need to
pfree individual allocations at query execution end.

limitnode = linitial(args);
limitnode = estimate_expression_value(root, limitnode);

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Simon Riggs

On 13 July 2015 at 15:48, Sawada Masahiko sawada.m...@gmail.com wrote:

 On Mon, Jul 13, 2015 at 9:22 PM, Andres Freund and...@anarazel.de wrote:
  On 2015-07-13 21:03:07 +0900, Sawada Masahiko wrote:
  Even If we implement rewriting tool for vm into pg_upgrade, it will
  take time as much as revacuum because it need whole scanning table.
 
  Why would it? Sure, you can only set allvisible and not the frozen bit,
  but that's fine. That way the cost for freezing can be paid over time.
 
  If we require terrabytes of data to be scanned, including possibly
  rewriting large portions due to freezing, before index only scans work
  and most vacuums act in a partial manner the migration to 9.6 will be a
  major pain for our users.

 Ah, If we set all bit as not all-frozen,  we don't need to whole table
 scanning, only scan vm.
 And I agree with this.

 But please image the case where old cluster has table which is very
 large, read-only and vacuum freeze is done.
 In this case, the all-frozen bit of such table in new cluster will not
 set, unless we do vacuum freeze again.
 The information of all-frozen of such table is lacked.


The contents of the VM fork is essential to retain after an upgrade because
it is used for Index Only Scans. If we destroy that information it could
send SQL response times to unacceptable levels after upgrade.

It takes time to scan the VM and create the new VFM, but the time taken is
proportional to the size of VM, which seems like it will be acceptable.

Example calcs:
An 8TB PostgreSQL installation would need us to scan 128MB of VM into about
256MB of VFM. Probably the fsyncs will occupy the most time.
In comparison, we would need to scan all 8TB to rebuild the VMs, which will
take much longer (and fsyncs will still be needed).

Since we don't record freeze map information now it is acceptable to begin
after upgrade with all freeze info set to zero.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes

The backups were deleted. I need them to use pg_restore.
Em 13/07/2015 21:18, Michael Paquier michael.paqu...@gmail.com escreveu:

 On Tue, Jul 14, 2015 at 9:28 AM, David Guimaraes skys...@gmail.com
 wrote:
  So I decided to try to understand the file format generated by
  pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
  things:
 
  The header of the file always starts with PGDMP followed by pgdump
 version
  number used, followed by int size, offset, etc. followed by TOCs.
 
  My question is how to know the end of the file? Are there any signature
 that
  I can use? Or would have to analyze the whole file?

 Why are you trying to reinvent the wheel? pg_restore is not available?
 --
 Michael

Re: [HACKERS] Forensic recovery deleted pgdump custom format file

On Tue, Jul 14, 2015 at 10:58 AM, David Guimaraes skys...@gmail.com wrote:
 The backups were deleted. I need them to use pg_restore.

So what you mean is that you are looking at your disk at FS level to
find traces of those deleted backups by analyzing their binary
format... Am I missing something?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes

Yeah bingo
Em 13/07/2015 22:11, Michael Paquier michael.paqu...@gmail.com escreveu:

 On Tue, Jul 14, 2015 at 10:58 AM, David Guimaraes skys...@gmail.com
 wrote:
  The backups were deleted. I need them to use pg_restore.

 So what you mean is that you are looking at your disk at FS level to
 find traces of those deleted backups by analyzing their binary
 format... Am I missing something?
 --
 Michael

Re: [HACKERS] Default Roles (was: Additional role attributes)

2015-07-13 Thread Stephen Frost

Fujii,

* Fujii Masao (masao.fu...@gmail.com) wrote:
 he documents of the functions which the corresponding default roles
 are added by this patch need to be updated. For example, the description
 of pg_xlog_replay_pause() says Pauses recovery immediately (restricted
 to superusers).. I think that the description needs to mention
 the corresponding default role pg_replay. Otherwise, it's difficult for
 users to see which default role is related to the function they want to use.
 Or probably we can add the table explaining all the relationships between
 default roles and corresponding operations. And it's useful.

Certainly, totally agree that we need to make it clear in the function
descriptions also.

 Why do we allow users to change the attributes of default roles?
 For example, ALTER ROLE default_role or GRANT ... TO default_role.
 Those changes are not dumped by pg_dumpall. So if users change
 the attributes for some reasons but they disappear via pg_dumpall,
 maybe the system goes into unexpected state.

Good point.  I'm fine with simply disallowing that completely; does
anyone want to argue that we should allow superusers to ALTER or GRANT
to these roles?  I have a hard time seeing the need for that and it
could make things quite ugly.

 I think that it's better to allow the roles with pg_monitor to
 execute pgstattuple functions. They are usually used for monitoring.
 Thought?

Possibly, but I'd need to look at them more closely than I have time to
right now.  Can you provide a use-case?  That would certainly help.
Also, we are mostly focused on things which are currently superuser-only
capabilities, if you don't need to be superuser today then the
monitoring system could be granted access using the normal mechanisms.
Actually logging systems won't log in directly as pg_monitor anyway,
they'll log in as nagios or similar, which has been GRANT'd
pg_monitor and could certainly be GRANT'd other rights also.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] A little RLS oversight?

2015-07-13 Thread Stephen Frost

Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
 On Sun, Jul 12, 2015 at 5:59 PM, Yaroslav wrote:
  I can still see all statistics for 'test' in pg_stats under unprivileged
  user.
 
 Indeed, this looks like an oversight of RLS. Even if a policy is
 defined to prevent a user from seeing the rows of other users, it is
 still possible to get some information though this view.
 I am adding an open item regarding that for 9.5.

We need to be careful to avoid the slippery slope of trying to prevent
all covert channels, which has been extensively discussed previously.  I
tend to agree with this specific case of, if you have RLS configured on
the table then we probably shouldn't allow normal users to see the stats
on the table, but I don't have a problem with the usage of those stats
for generating plans, which users could see the results of via EXPLAIN.

  I'd prefer statistics on RLS-enabled tables to be simply hidden completely
  for unprivileged users.
 
 This looks like something simple enough to do.
 @Stephen: perhaps you have some thoughts on the matter? Currently
 pg_stats breaks its promise to only show information about the rows
 current user can read.

I agree that it should be reasonably simple to do and, provided that's
the case, I'm fine with doing it once I get back (currently out until
the 27th).

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pg_upgrade + Extensions



On 07/13/2015 01:12 PM, Smitha Pamujula wrote:
Yes. I have checked that the extension didn't exist in any of the 
databases. I used \dx to see if there was json_build was listed and i 
didnt see any. Is that sufficient to check its existence. I am about 
to do another testing in a few minutes on a different machine. I will 
capture before/after shots





Please don't top-post on the PostgreSQL lists - see 
http://idallen.com/topposting.html


In theory it should be enough if it was installed in the standard way. 
But a more thorough procedure would be to run this command:


   select count(*)
   from pg_proc
   where prosrc ~ 'json_build';

Here's a one-liner that will check every database for you:

   for db in `psql  -t -c 'select datname from pg_database where
   datallowconn'` ; do C=`psql -t -c select count(*) from pg_proc
   where prosrc ~ 'json_build' $db`; echo $db $C; done

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs

On 11 July 2015 at 21:28, Tom Lane t...@sss.pgh.pa.us wrote:


 What are we going to do about this?


I will address the points you raise, one by one.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] pg_upgrade + Extensions

2015-07-13 Thread Smitha Pamujula

Yes. I have checked that the extension didn't exist in any of the
databases. I used \dx to see if there was json_build was listed and i didnt
see any. Is that sufficient to check its existence. I am about to do
another testing in a few minutes on a different machine. I will capture
before/after shots

Thanks

On Fri, Jul 10, 2015 at 4:26 PM, Andrew Dunstan and...@dunslane.net wrote:



 On Fri, Jul 10, 2015 at 5:05 PM, David E. Wheeler da...@justatheory.com
 wrote:

 On Jul 10, 2015, at 11:32 AM, Smitha Pamujula 
 smitha.pamuj...@iovation.com wrote:


  Your installation references loadable libraries that are missing from
 the
  new installation.  You can add these libraries to the new installation,
  or remove the functions using them from the old installation.  A list of
  problem libraries is in the file:
  loadable_libraries.txt
 
  Failure, exiting
  [postgres@pdxdvrptsrd04 ~]$ cat loadable_libraries.txt
  Could not load library json_build
  ERROR:  could not access file json_build: No such file or directory

 So you drop the json_build extension before upgrading, but pg_upgrade
 still complains that it’s missing? That seems odd.




 Are you sure the extension was uninstalled from every database in the
 cluster? This seems likely to occur when you forgot to uninstall it from
 some database (e.g. template1)

 cheers

 andrew




-- 
Smitha Pamujula
Database Administrator // The Watch Woman

Direct: 503.943.6764
Mobile: 503.290.6214 // Twitter: iovation
www.iovation.com

Re: [HACKERS] [PATCH] SQL function to report log message

On Mon, Jul 13, 2015 at 1:11 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Mon, Jul 13, 2015 at 4:54 PM, dinesh kumar dineshkuma...@gmail.com
 wrote:
  Would like to discuss on below feature here.
 
  Feature:
  Having an SQL function, to write messages to log destination.
 
  Justification:
  As of now, we don't have an SQL function to write custom/application
  messages to log destination. We have RAISE clause which is controlled
 by
  log_ parameters. If we have an SQL function which works irrespective of
 log
  settings, that would be a good for many log parsers. What i mean is, in
 DBA
  point of view, if we route all our native OS stats to log files in a
 proper
  format, then we can have our log reporting tools to give most effective
  reports. Also, Applications can log their own messages to postgres log
  files, which can be monitored by DBAs too.

 What's the actual use case for this feature other than it would be
 good to have it?


That's a good question Michael.

When i was working as a DBA for a different RDBMS, developers used to write
some serious APP errors, Followed by instructions into some sort of log and
trace files.
Since, DBAs didn't have the permission to check app logs, which was owned
by Ops team.

In my old case, application was having serious OOM issues, which was
crashing frequently after the deployment. It wasn't the consistent behavior
from the app side, hence they used to sent  a copy all APP metrics to trace
files, and we were monitoring the DB what was happening during the spike on
app servers.

I didn't mean that, we need to have this feature, since we have it on other
RDBMS. I don't see a reason, why don't we have this in our PG too.

I see the similar item in our development list
http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com.

Let me know if i miss anything here.

Best Regards,
Dinesh
manojadinesh.blogspot.com

A log message is here to give information about the
 state of something that happens in backend, but in the case of this
 function the event that happens is the content of the function itself.
 It also adds a new log level for something that has a unique usage,
 which looks like an overkill to me. Btw, you could do something more
 advanced with simply an extension if you really want to play with this
 area... But I am dubious about what kind of applications would use
 that.
 --
 Michael

Re: [HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Heikki Linnakangas


On 07/13/2015 04:56 PM, Laurent Laborde wrote:

Friendly greetings !

in file src/bin/scripts/print.c line 421 :
need_recordsep = false;
then set to true line 424.

Now i'm pretty sure it's a meaningless bug without any consequence (the
commit that introduced it is 15 years old).

There is a lot of (apparently) dead assignment here and there but some
assigment could be used for debugging purpose so ... why not. But this one ?


The code in question looks like this:


for (f = footers; f; f = f-next)
{
if (need_recordsep)
{
print_separator(cont-opt-recordSep, fout);
need_recordsep = false;
}
fputs(f-data, fout);
need_recordsep = true;
}


Hmm. It does kind of make sense. Right after printing the separator, you 
don't need to print a separator because you just printed one.  But as 
soon as you print the field, you need a separator again. It would be 
quite understandable without that dead assignment too, and that's 
probably how I would've written it in the first place. But since that's 
how it is and has been for 15 years, I'm inclined to just leave it so.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

2015-07-13 Thread Marco Atzeri


On 7/13/2015 5:36 PM, Andrew Dunstan wrote:


On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being
hardcoded.

Ah, I see.  So then what Marco is suggesting is copying a coding pattern
that does work.


Unless there is further argument I propose to commit something very like
Marco's suggestion for hstore_plperl, hstore_plpython and ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with
python2, which builds and tests fine):

make -C hstore_plpython all
make[1]: Entering directory
'/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
-I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
-I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
hstore_plpython.c
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2   -shared -o
hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
-L../../src/common -Wl,--allow-multiple-definition
-Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
-Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
-lhstore -L../../src/pl/plpython -lplpython3
-L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
-lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
hstore_plpython.o: In function `hstore_to_plpython':

/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:

undefined reference to `PLyUnicode_FromStringAndSize'


[cut]

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this:
PYTHON=/usr/bin/python3 - this can be done in the config file.


I am only building with python2, but on cygwin
there is an additional intl library for python3 binding

$ python2-config --libs
-lpython2.7 -ldl

$ python3-config --libs
-lpython3.4m -lintl -ldl


cheers

andrew


Regards
Marco



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2015-07-13 Thread Andres Freund

On 2015-07-13 23:48:02 +0900, Sawada Masahiko wrote:
 But please image the case where old cluster has table which is very
 large, read-only and vacuum freeze is done.
 In this case, the all-frozen bit of such table in new cluster will not
 set, unless we do vacuum freeze again.
 The information of all-frozen of such table is lacked.

So what? That's the situation today… Yes, it'll trigger a
anti-wraparound vacuum at some later point, after that they map bits
will be set.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dead assignment src/bin/scripts/print.c line 421

2015-07-13 Thread Laurent Laborde

Should have been sent to the bugs ML sorry :-/

On Mon, Jul 13, 2015 at 3:56 PM, Laurent Laborde kerdez...@gmail.com
wrote:

 Friendly greetings !

 in file src/bin/scripts/print.c line 421 :
 need_recordsep = false;
 then set to true line 424.

 Now i'm pretty sure it's a meaningless bug without any consequence (the
 commit that introduced it is 15 years old).

 There is a lot of (apparently) dead assignment here and there but some
 assigment could be used for debugging purpose so ... why not. But this one ?

 --
 Laurent ker2x Laborde
 DBA gandi.net \o/




-- 
Laurent ker2x Laborde

Re: [HACKERS] [BUGS] BUG #13126: table constraint loses its comment

2015-07-13 Thread Heikki Linnakangas


On 07/08/2015 08:12 AM, Michael Paquier wrote:

On Tue, Jul 7, 2015 at 6:24 PM, Petr Jelinek p...@2ndquadrant.com wrote:

On 2015-07-04 13:45, Michael Paquier wrote:


On Fri, Jul 3, 2015 at 11:59 PM, Petr Jelinek wrote:


Well for indexes you don't really need to add the new AT command, as
IndexStmt has char *idxcomment which it will automatically uses as
comment
if not NULL. While  I am not huge fan of the idxcomment it doesn't seem
to
be easy to remove it in the future and it's what transformTableLikeClause
uses so it might be good to be consistent with that.



Oh, right, I completely missed your point and this field in IndexStmt.
Yes it is better to be consistent in this case and to use it. It makes
as well the code easier to follow.
Btw, regarding the new AT routines, I am getting find of them, it
makes easier to follow which command is added where in the command
queues.

Updated patch attached.



Cool, I am happy with the patch now. Marking as ready for committer.


Thanks for the review.


I don't much like splitting the code across multiple helper functions, 
it makes the overall logic more difficult to follow, although I agree 
that the indentation has made the pretty hard to read as it is. I'm 
planning to commit the attached two patches. The first one is just 
reformatting changes to ATExecAlterColumnType(), turning the switch-case 
statements into if-else blocks. If-else doesn't cause so much 
indentation, and allows defining variables local to the cases more 
naturally, so it alleviates the indentation problem somewhat. The second 
patch is the actual bug fix.


There was one bug in this patch: the COMMENT statement that was 
constructed didn't schema-qualify the relation, so if the ALTERed table 
was not in search_path, the operation would fail with a relation not 
found error (or add the comment to wrong object). Fixed that.


I plan to commit the attached patches later today or tomorrow. But how 
do you feel about back-patching this? It should be possible to 
backpatch, although at a quick test it seems that there have been small 
changes to the affected code in many versions, so it would require some 
work. Also, in back-branches we'd need to put the new AT_ReAddComment 
type to the end of the list, like we've done when we added 
AT_ReAddConstraint in 9.3. I'm inclined to backpatch this to 9.5 only, 
even though this is clearly a bug fix, on the grounds that it's not a 
very serious bug and there's always some risk in backpatching.


- Heikki

From c4865eb873a9cafb7e247cc69b7030243b74f3be Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas heikki.linnakan...@iki.fi
Date: Mon, 13 Jul 2015 19:22:31 +0300
Subject: [PATCH 1/2] Reformat code in ATPostAlterTypeParse.

The code in ATPostAlterTypeParse was very deeply indented, mostly because
there were two nested switch-case statements, which add a lot of
indentation. Use if-else blocks instead, to make the code less indented
and more readable.

This is in preparation for next patch that makes some actualy changes to
the function. These cosmetic parts have been separated to make it easier
to see the real changes in the other patch.
---
 src/backend/commands/tablecmds.c | 104 +++
 1 file changed, 51 insertions(+), 53 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d394713..e7b23f1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,69 +8645,67 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
 		Node	   *stm = (Node *) lfirst(list_item);
 		AlteredTableInfo *tab;
 
-		switch (nodeTag(stm))
+		tab = ATGetQueueEntry(wqueue, rel);
+
+		if (IsA(stm, IndexStmt))
+		{
+			IndexStmt  *stmt = (IndexStmt *) stm;
+			AlterTableCmd *newcmd;
+
+			if (!rewrite)
+TryReuseIndex(oldId, stmt);
+
+			newcmd = makeNode(AlterTableCmd);
+			newcmd-subtype = AT_ReAddIndex;
+			newcmd-def = (Node *) stmt;
+			tab-subcmds[AT_PASS_OLD_INDEX] =
+lappend(tab-subcmds[AT_PASS_OLD_INDEX], newcmd);
+		}
+		else if (IsA(stm, AlterTableStmt))
 		{
-			case T_IndexStmt:
+			AlterTableStmt *stmt = (AlterTableStmt *) stm;
+			ListCell   *lcmd;
+
+			foreach(lcmd, stmt-cmds)
+			{
+AlterTableCmd *cmd = (AlterTableCmd *) lfirst(lcmd);
+
+if (cmd-subtype == AT_AddIndex)
 {
-	IndexStmt  *stmt = (IndexStmt *) stm;
-	AlterTableCmd *newcmd;
+	Assert(IsA(cmd-def, IndexStmt));
 
 	if (!rewrite)
-		TryReuseIndex(oldId, stmt);
+		TryReuseIndex(get_constraint_index(oldId),
+	  (IndexStmt *) cmd-def);
 
-	tab = ATGetQueueEntry(wqueue, rel);
-	newcmd = makeNode(AlterTableCmd);
-	newcmd-subtype = AT_ReAddIndex;
-	newcmd-def = (Node *) stmt;
+	cmd-subtype = AT_ReAddIndex;
 	tab-subcmds[AT_PASS_OLD_INDEX] =
-		lappend(tab-subcmds[AT_PASS_OLD_INDEX], newcmd);
-	break;
+		lappend(tab-subcmds[AT_PASS_OLD_INDEX], cmd);
 }
-			case T_AlterTableStmt:
+else

[HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread David Christensen

pgsql-hackers,

So I’ve put some time into a design for the incremental checksum feature and 
wanted to get some feedback from the group:

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.

In order to support the idea of incremental checksums, this design adds the 
following things:

** pg_control:

Keep data_checksum_version, but have it indicate *only* the algorithm version 
for checksums. i.e., it's no longer used for the data_checksum enabled/disabled 
state.

Add data_checksum_state, an enum with multiple states: disabled, 
enabling, enforcing (perhaps revalidating too; something to indicate that 
we are reprocessing a database that purports to have been completely 
checksummed already)

An explanation of the states as well as the behavior of the checksums for each.

- disabled = not in a checksum cycle; no read validation, no checksums 
written.  This is the current behavior for Postgres *without* checksums.

- enabling = in a checksum cycle; no read validation, write checksums.  Any 
page that gets written to disk will be a valid checksum.  This is required when 
transitioning a cluster which has never had checksums, as the page reads would 
normally fail since they are uninitialized.

- enforcing = not in a checksum cycle; read validation, write checksums.  This 
is the current behavior of Postgres *with* checksums.

 (caveat: I'm not certain the following state is needed (and the current 
version of this patch doesn't have it)):

- revalidating = in a checksum cycle; read validation, write checksums.  The 
difference between this and enabling is that we care if page reads fail, 
since by definition they should have been validly checksummed, as we should 
verify this.

Add data_checksum_cycle, a counter that gets incremented with every checksum 
cycle change.  This is used as a flag to verify when new checksum actions take 
place, for instance if we wanted to upgrade/change the checksum algorithm, or 
if we just want to support periodic checksum validation.

This variable will be compared against new values in the system tables to keep 
track of which relations still need to be checksummed in the cluster.

** pg_database:

Add a field datlastchecksum which will be the last checksum cycle which has 
completed for all relations in that database.

** pg_class:

Add a field rellastchecksum which stores the last successful checksum cycle 
for each relation.

** The checksum bgworker:

Something needs to proactively checksum any relations which are needing to be 
validated, and this something is known as the checksum bgworker.  Checksum 
bgworker will operate similar to autovacuum daemons, and in fact in this 
initial pass, we'll hook into the autovac launcher due to similarities in 
catalog reading functionality as well as balancing out with other maintenance 
activity.

If autovacuum does not need to do any vacuuming work, it will check if the 
cluster has requested a checksum cycle by checking if the state is enabling 
(or revalidate).  If so, it will look for any database which needs checksums 
update.  It checks the current value of the data_checksum_cycle counter and 
looks for any databases with datlastchecksum  data_checksum_cycle.

When all database have datlastchecksum == data_checksum_cycle, we initiate 
checksumming of any global cluster heap files.  When the global cluster tables 
heap files have been checksummed, then we consider the checksum cycle complete, 
change pg_control's data_checksum_state to enforcing and consider things 
fully up-to-date.

If it finds a database needing work, it iterates through that database's 
relations looking for rellastchecksum  data_checksum_cycle.  If it finds 
none (i.e., every record has rellastchecksum == data_checksum_cycle) then it 
marks the containing database as up-to-date by updating datlastchecksum = 
data_checksum_cycle.

For any relation that it finds in the database which is not checksummed, it 
starts an actual worker to handle the checksum process for this table.  Since 
the state of the cluster is already either enforcing or revalidating, any 
block writes will get checksums added automatically, so the only thing the 
bgworker needs to do is load each block in the relation and explicitly mark as 
dirty (unless that's not required for FlushBuffer() to do its thing).  After 
every block

Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Jim Nasby


On 7/13/15 3:39 PM, dinesh kumar wrote:

I don't really see the point of what you're describing here. Just do
something like RAISE WARNING which should normally be high enough to
make it into the logs. Or use a pl language that will let you write
your own logfile on the server (ie: plperlu).

True. Using plperlu, shall we bypass our log_* settings. If it's true, i
wasn't sure about it.


plperlu can do anything the server can do. Including fun things like 
appending to any file the server can write to or executing things like 
`rm -rf pg_xlog`.



I didn't mean that, we need to have this feature, since we have
it on
other RDBMS. I don't see a reason, why don't we have this in our
PG too.

I see the similar item in our development list
http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com.


That's not at all what that item is talking about. It's talking
about exposing ereport as a SQL function, without altering the rest
of our logging behavior.


Ah. It's' my bad interpretation. Let me work on it, and will send a new
patch as a wrapper sql function for ereport.


You might want to present a plan for that; it's not as trivial as it 
sounds due to how ereport works. In particular, I'd want to see (at 
minimum) the same functionality that plpgsql's RAISE command now 
provides (errdetail, errhint, etc).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

On Mon, Jul 13, 2015 at 1:56 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 7/13/15 3:39 PM, dinesh kumar wrote:

 I don't really see the point of what you're describing here. Just do
 something like RAISE WARNING which should normally be high enough to
 make it into the logs. Or use a pl language that will let you write
 your own logfile on the server (ie: plperlu).

 True. Using plperlu, shall we bypass our log_* settings. If it's true, i
 wasn't sure about it.


 plperlu can do anything the server can do. Including fun things like
 appending to any file the server can write to or executing things like `rm
 -rf pg_xlog`.


Thanks Much Jim.




  I didn't mean that, we need to have this feature, since we have
 it on
 other RDBMS. I don't see a reason, why don't we have this in our
 PG too.

 I see the similar item in our development list
 
 http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com.


 That's not at all what that item is talking about. It's talking
 about exposing ereport as a SQL function, without altering the rest
 of our logging behavior.


 Ah. It's' my bad interpretation. Let me work on it, and will send a new
 patch as a wrapper sql function for ereport.


 You might want to present a plan for that; it's not as trivial as it
 sounds due to how ereport works. In particular, I'd want to see (at
 minimum) the same functionality that plpgsql's RAISE command now provides
 (errdetail, errhint, etc).


Sure. Let me prepare a prototype for it, and will share with you before
implementing.


Best Regards,
Dinesh
manojadinesh.blogspot.com


 --
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread Andres Freund

On 2015-07-13 15:50:44 -0500, Jim Nasby wrote:
 Another possibility is some kind of a page-level indicator of what binary
 format is in use on a given page. For checksums maybe a single bit would
 suffice (indicating that you should verify the page checksum). Another use
 case is using this to finally ditch all the old VACUUM FULL code in
 HeapTupleSatisfies*().

That's a bad idea, because that bit then'd not be protected by the
checksum.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Robert Haas

On Tue, Jul 7, 2015 at 6:28 AM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Please forgive me to resend this message for some too-sad
 misspellings.

 # Waiting for heavy weight locks is somewhat confusing to spell..

 ===
 Hello,

 At Tue, 7 Jul 2015 16:27:38 +0900, Fujii Masao masao.fu...@gmail.com wrote 
 in CAHGQGwEJwov8YwvmbbWps3Rba6kF1yf7qL3S==Oy4D=gq9y...@mail.gmail.com
 Each backend reports its event when trying to take a lock. But
 the reported event is never reset until next event is reported.
 Is this OK? This means that the wait_event column keeps showing
 the *last* event while a backend is in idle state, for example.
 So, shouldn't we reset the reported event or report another one
 when releasing the lock?

 It seems so but pg_stat_activity.waiting would indicate whether
 the event is lasting. However, .waiting reflects only the status
 of heavy-weight locks. It would be quite misleading.

 I think that pg_stat_activity.wait_event sould be linked to
 .waiting then .wait_event should be restricted to heavy weight
 locks if the meaning of .waiting cannot not be changed.

Yeah, that's clearly no good.  It only makes sense to have wait_event
show the most recent event if waiting tells you whether the wait is
still ongoing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

Jim Nasby jim.na...@bluetreble.com writes:
 On 7/13/15 3:39 PM, dinesh kumar wrote:
 Ah. It's' my bad interpretation. Let me work on it, and will send a new
 patch as a wrapper sql function for ereport.

 You might want to present a plan for that; it's not as trivial as it 
 sounds due to how ereport works. In particular, I'd want to see (at 
 minimum) the same functionality that plpgsql's RAISE command now 
 provides (errdetail, errhint, etc).

The real question is why the existing functionality in plpgsql isn't
sufficient.  Somebody who wants a log from SQL function can easily
write a simple plpgsql function that does exactly what they want,
with no more or fewer bells-n-whistles than they need.  If we try
to create a SQL function that does all that, it's likely to be a mess
to use, even with named arguments.

I'm not necessarily against the basic idea, but I think inventing
something that actually offers an increment in usability compared
to the existing alternative is going to be harder than it sounds.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-13 Thread Jim Nasby


On 7/13/15 12:39 PM, dinesh kumar wrote:

 As of now, we don't have an SQL function to write custom/application
 messages to log destination. We have RAISE clause which is controlled by
 log_ parameters. If we have an SQL function which works irrespective of 
log
 settings, that would be a good for many log parsers. What i mean is, in 
DBA
 point of view, if we route all our native OS stats to log files in a 
proper
 format, then we can have our log reporting tools to give most effective
 reports. Also, Applications can log their own messages to postgres log
 files, which can be monitored by DBAs too.

What's the actual use case for this feature other than it would be
good to have it?


That's a good question Michael.

When i was working as a DBA for a different RDBMS, developers used to
write some serious APP errors, Followed by instructions into some sort
of log and trace files.
Since, DBAs didn't have the permission to check app logs, which was
owned by Ops team.

In my old case, application was having serious OOM issues, which was
crashing frequently after the deployment. It wasn't the consistent
behavior from the app side, hence they used to sent  a copy all APP
metrics to trace files, and we were monitoring the DB what was happening
during the spike on app servers.


Spewing a bunch of stuff into the postgres log doesn't seem like an 
improvement here.


I don't really see the point of what you're describing here. Just do 
something like RAISE WARNING which should normally be high enough to 
make it into the logs. Or use a pl language that will let you write your 
own logfile on the server (ie: plperlu).



I didn't mean that, we need to have this feature, since we have it on
other RDBMS. I don't see a reason, why don't we have this in our PG too.

I see the similar item in our development list
http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com.


That's not at all what that item is talking about. It's talking about 
exposing ereport as a SQL function, without altering the rest of our 
logging behavior.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-13 Thread Robert Haas

On Mon, Jul 6, 2015 at 10:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 According to his patch, the wait events that he was thinking to add were:

 + typedef enum PgCondition
 + {
 + PGCOND_UNUSED= 0,/* unused */
 +
 + /* 1 - CPU */
 + PGCOND_CPU= 1,/* generic cpu operations */
 + /* 11000 - CPU:PARSE */
 + PGCOND_CPU_PARSE= 11000,/* pg_parse_query */
 + PGCOND_CPU_PARSE_ANALYZE= 11100,/* parse_analyze */
 + /* 12000 - CPU:REWRITE */
 + PGCOND_CPU_REWRITE= 12000,/* pg_rewrite_query */
 + /* 13000 - CPU:PLAN */
 + PGCOND_CPU_PLAN= 13000,/* pg_plan_query */
 + /* 14000 - CPU:EXECUTE */
 + PGCOND_CPU_EXECUTE= 14000,/* PortalRun or
 PortalRunMulti */
[ etc. ]

Sorry to say it, but I this design is a mess.

Suppose we are executing a query, and during the query we execute the
ILIKE operator, and within that we try to acquire a buffer content
lock (say, to detoast some data).  So at the outermost level our state
is PGCOND_CPU_EXECUTE, and then within that we are in state
PGCOND_CPU_ILIKE, and then within that we are in state
PGCOND_LWLOCK_PAGE.  When we exit each of the inner states, we've got
to restore the proper outer state, or time will be mis-attribtued.
Error handling has got to pop all of the items off the stack that were
added since the PG_TRY() block started, and then push on a new state
for error handling, which gets popped when the PG_TRY block finishes.
Another problem is that some of these things are incredibly specific
(like running the ILIKE operator) and others are extremely general
(like executing the query).  Why does ILIKE get a code but
+(int4,int4) does not?  We need some less-arbitrary way of assigning
codes than what's shown here.

Now, that's not to say there are no good ideas here. For example,
pg_stat_activity could expose a byte of state indicating which phase
of query processing is current: parse / parse analysis / rewrite /
plan / execute / none.  I think that'd be a fine thing to do, and I
support doing it, although maybe not in the same patch as my original
proposal.  On the flip side, I don't support trying to expose
information on the level of which C function are we currently
executing? because I think there's going to be absolutely no
reasonable way to make that sufficiently low-overhead, and also
because I don't see any way to make it less than nightmarishly onerous
from the point of view of code maintenance.  We could expose some
functions but not others, but that seems like a mess; I think unless
and until we have a better solution, the right answer to I need to
know which C function is running in each backend is that's what perf
is for.

In any case, I think the main point is that Itagaki-san's proposal is
not really a proposal for wait events.  It is a proposal to expose
some state about what is the backend doing right now? which might be
waiting or something else.  I believe those things are should be
separated into several separate pieces of state.  It's entirely
reasonable to want to know whether we are in the parse phase or plan
phase separately from knowing whether we are waiting on an lwlock or
not, because then you could (for example) judge what percentage of
your lock waits are coming from parsing vs. what fraction are coming
from planning, which somebody might care about.  Or you might care
about ONLY the phase of query processing and not about wait events at
all, and then you can ignore one column and look at the other.  With
the above proposal, those things all get munged together in a way that
I think is bound to be awkward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] SQL function to report log message

On Mon, Jul 13, 2015 at 1:29 PM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 7/13/15 12:39 PM, dinesh kumar wrote:

  As of now, we don't have an SQL function to write
 custom/application
  messages to log destination. We have RAISE clause which is
 controlled by
  log_ parameters. If we have an SQL function which works
 irrespective of log
  settings, that would be a good for many log parsers. What i mean
 is, in DBA
  point of view, if we route all our native OS stats to log files in
 a proper
  format, then we can have our log reporting tools to give most
 effective
  reports. Also, Applications can log their own messages to postgres
 log
  files, which can be monitored by DBAs too.

 What's the actual use case for this feature other than it would be
 good to have it?


 That's a good question Michael.

 When i was working as a DBA for a different RDBMS, developers used to
 write some serious APP errors, Followed by instructions into some sort
 of log and trace files.
 Since, DBAs didn't have the permission to check app logs, which was
 owned by Ops team.

 In my old case, application was having serious OOM issues, which was
 crashing frequently after the deployment. It wasn't the consistent
 behavior from the app side, hence they used to sent  a copy all APP
 metrics to trace files, and we were monitoring the DB what was happening
 during the spike on app servers.


 Spewing a bunch of stuff into the postgres log doesn't seem like an
 improvement here.


Agreed Jim.


 I don't really see the point of what you're describing here. Just do
 something like RAISE WARNING which should normally be high enough to make
 it into the logs. Or use a pl language that will let you write your own
 logfile on the server (ie: plperlu).

 True. Using plperlu, shall we bypass our log_* settings. If it's true, i
wasn't sure about it.


  I didn't mean that, we need to have this feature, since we have it on
 other RDBMS. I don't see a reason, why don't we have this in our PG too.

 I see the similar item in our development list
 http://www.postgresql.org/message-id/53a8e96e.9060...@2ndquadrant.com.


 That's not at all what that item is talking about. It's talking about
 exposing ereport as a SQL function, without altering the rest of our
 logging behavior.


Ah. It's' my bad interpretation. Let me work on it, and will send a new
patch as a wrapper sql function for ereport.


Thanks again.

Regards,
Dinesh
manojadinesh.blogspot.com

-- 
 Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread David Christensen


 On Jul 13, 2015, at 3:50 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 
 On 7/13/15 3:26 PM, David Christensen wrote:
 * Incremental Checksums
 
 PostgreSQL users should have a way up upgrading their cluster to use data 
 checksums without having to do a costly pg_dump/pg_restore; in particular, 
 checksums should be able to be enabled/disabled at will, with the database 
 enforcing the logic of whether the pages considered for a given database are 
 valid.
 
 Considered approaches for this are having additional flags to pg_upgrade to 
 set up the new cluster to use checksums where they did not before (or 
 optionally turning these off).  This approach is a nice tool to have, but in 
 order to be able to support this process in a manner which has the database 
 online while the database is going throught the initial checksum process.
 
 It would be really nice if this could be extended to handle different page 
 formats as well, something that keeps rearing it's head. Perhaps that could 
 be done with the cycle idea you've described.

I had had this thought too, but the main issues I saw were that new page 
formats were not guaranteed to take up the same space/storage, so there was an 
inherent limitation on the ability to restructure things out *arbitrarily*; 
that being said, there may be a use-case for the types of modifications that 
this approach *would* be able to handle.

 Another possibility is some kind of a page-level indicator of what binary 
 format is in use on a given page. For checksums maybe a single bit would 
 suffice (indicating that you should verify the page checksum). Another use 
 case is using this to finally ditch all the old VACUUM FULL code in 
 HeapTupleSatisfies*().

There’s already a page version field, no?  I assume that would be sufficient 
for the page format indicator.  I don’t know about using a flag for verifying 
the checksum, as that is already modifying the page which is to be checksummed 
anyway, which we want to avoid having to rewrite a bunch of pages 
unnecessarily, no?  And you’d presumably need to clear that state again which 
would be an additional write.  This was the issue that the checksum cycle was 
meant to handle, since we store this information in the system catalogs and the 
types of modifications here would be idempotent.

David
--
David Christensen
PostgreSQL Team Manager
End Point Corporation
da...@endpoint.com
785-727-1171







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [DESIGN] Incremental checksums

2015-07-13 Thread Jim Nasby


On 7/13/15 3:26 PM, David Christensen wrote:

* Incremental Checksums

PostgreSQL users should have a way up upgrading their cluster to use data 
checksums without having to do a costly pg_dump/pg_restore; in particular, 
checksums should be able to be enabled/disabled at will, with the database 
enforcing the logic of whether the pages considered for a given database are 
valid.

Considered approaches for this are having additional flags to pg_upgrade to set 
up the new cluster to use checksums where they did not before (or optionally 
turning these off).  This approach is a nice tool to have, but in order to be 
able to support this process in a manner which has the database online while 
the database is going throught the initial checksum process.


It would be really nice if this could be extended to handle different 
page formats as well, something that keeps rearing it's head. Perhaps 
that could be done with the cycle idea you've described.


Another possibility is some kind of a page-level indicator of what 
binary format is in use on a given page. For checksums maybe a single 
bit would suffice (indicating that you should verify the page checksum). 
Another use case is using this to finally ditch all the old VACUUM FULL 
code in HeapTupleSatisfies*().

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22



On 07/13/2015 11:53 AM, Marco Atzeri wrote:

On 7/13/2015 5:36 PM, Andrew Dunstan wrote:


On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being
hardcoded.
Ah, I see.  So then what Marco is suggesting is copying a coding 
pattern

that does work.

Unless there is further argument I propose to commit something very 
like
Marco's suggestion for hstore_plperl, hstore_plpython and 
ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with
python2, which builds and tests fine):

make -C hstore_plpython all
make[1]: Entering directory
'/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
-I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
-I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
hstore_plpython.c
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -O2   -shared -o
hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
-L../../src/common -Wl,--allow-multiple-definition
-Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
-Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
-lhstore -L../../src/pl/plpython -lplpython3
-L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
-lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
hstore_plpython.o: In function `hstore_to_plpython':

/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35: 



undefined reference to `PLyUnicode_FromStringAndSize'


[cut]

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this:
PYTHON=/usr/bin/python3 - this can be done in the config file.


I am only building with python2, but on cygwin
there is an additional intl library for python3 binding

$ python2-config --libs
-lpython2.7 -ldl

$ python3-config --libs
-lpython3.4m -lintl -ldl



No this doesn't seem to be the problem. For some reason it's apparently 
not finding the symbol in plpython3.dll, where it should definitely 
exist, unless I'm completely off base. So I'm rather confused.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

Andrew Dunstan and...@dunslane.net writes:
 On 07/13/2015 11:53 AM, Marco Atzeri wrote:
 On 7/13/2015 5:36 PM, Andrew Dunstan wrote:
 hstore_plpython.o: In function `hstore_to_plpython':
 /home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
  
 undefined reference to `PLyUnicode_FromStringAndSize'

 No this doesn't seem to be the problem. For some reason it's apparently 
 not finding the symbol in plpython3.dll, where it should definitely 
 exist, unless I'm completely off base. So I'm rather confused.

Could hstore_plpython and plpython somehow have been built with different
ideas about PY_MAJOR_VERSION?  PLyUnicode_FromStringAndSize is
conditionally compiled, and the reference to it from hstore_plpython
depends on a conditionally-defined macro, and this error would make plenty
of sense if those conditions somehow diverged.  So I'd look for instance
at whether identical -I paths were used in both parts of the build.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

Andrew Dunstan and...@dunslane.net writes:
 Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22



On 07/13/2015 07:33 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?




NFI - I tried building with that but it made no difference.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22

On Tue, Jul 14, 2015 at 8:49 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 07/13/2015 07:33 PM, Tom Lane wrote:

 Andrew Dunstan and...@dunslane.net writes:

 Not AFAICT. Here is the contrib build:

 Hm ... what does -DUSE_DL_IMPORT do?

 NFI - I tried building with that but it made no difference.

Is your python3 build a 32b version?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Mon, Jul 13, 2015 at 10:34 PM, Fujii Masao wrote:
 On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson wrote:
 On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.

 I was trying to figure out how the JSON metadata can be used.
 It would have to be set using a given set of functions.

 So we can use only such a set of functions to configure synch rep?
 I don't like that idea. Because it prevents us from configuring that
 while the server is not running.

If you store a json blob in a set of files of PGDATA you could update
them manually there as well. That's perhaps re-inventing the wheel
with what is available with GUCs though.

 Is JSON being preferred because it would be ALTER SYSTEM friendly and in a
 format already known to users?

 At least currently ALTER SYSTEM cannot accept the JSON data
 (e.g., the return value of JSON function like json_build_object())
 as the setting value. So I'm not sure how friendly ALTER SYSTEM
 and JSON format really. If you want to argue that, probably you
 need to improve ALTER SYSTEM so that JSON can be specified.

 In a real-life scenario, at most how many groups and nesting would be
 expected?

 I don't think that many groups and nestings are common.

Yeah, in most common configurations people are not going to have more
than 3 groups with only one level of nodes.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22



On 07/12/2015 05:06 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 07/04/2015 11:02 AM, Tom Lane wrote:

It's not apparent to me how that works at all.

BTW, the .a files being linked to above are not like Unix .a static
archives - they are import library files, which I think they are only
used at link time, not run time. The path to the DLLs isn't being hardcoded.

Ah, I see.  So then what Marco is suggesting is copying a coding pattern
that does work.


Unless there is further argument I propose to commit something very like
Marco's suggestion for hstore_plperl, hstore_plpython and ltree_plpython

No objection here.




OK, I tried the attached patch.

but when trying to build with python 3 I get this (no such problems with 
python2, which builds and tests fine):


   make -C hstore_plpython all
   make[1]: Entering directory
   '/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
   -I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
   -I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
   hstore_plpython.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
   -L../../src/common -Wl,--allow-multiple-definition
   -Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
   -Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
   -lhstore -L../../src/pl/plpython -lplpython3
   -L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
   -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt
   hstore_plpython.o: In function `hstore_to_plpython':
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
   undefined reference to `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:(.text+0x99):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:28:
   undefined reference to `PLyUnicode_FromStringAndSize'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:28:(.text+0xf1):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyUnicode_FromStringAndSize'
   hstore_plpython.o: In function `plpython_to_hstore':
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:96:
   undefined reference to `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:96:(.text+0x2cc):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:84:
   undefined reference to `PLyObject_AsString'
   
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:84:(.text+0x321):
   relocation truncated to fit: R_X86_64_PC32 against undefined symbol
   `PLyObject_AsString'
   collect2: error: ld returned 1 exit status
   ../../src/Makefile.shlib:358: recipe for target
   'hstore_plpython3.dll' failed
   make[1]: *** [hstore_plpython3.dll] Error 1
   make[1]: Leaving directory
   '/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython'
   Makefile:92: recipe for target 'all-hstore_plpython-recurse' failed
   make: *** [all-hstore_plpython-recurse] Error 2

I'd like to get that fixed before applying anything. Marco, any ideas?

To build with python 3, set the environment like this: 
PYTHON=/usr/bin/python3 - this can be done in the config file.



cheers

andrew

diff --git a/contrib/hstore_plperl/Makefile b/contrib/hstore_plperl/Makefile
index 19a8ab4..603ef52 100644
--- a/contrib/hstore_plperl/Makefile
+++ b/contrib/hstore_plperl/Makefile
@@ -30,6 +30,10 @@ override CPPFLAGS += -DPLPERL_HAVE_UID_GID -Wno-comment
 SHLIB_LINK += ../hstore/libhstore.a $(wildcard ../../src/pl/plperl/libperl*.a)
 endif
 
+ifeq ($(PORTNAME), cygwin)
+SHLIB_LINK += -L../hstore -l hstore $(perl_embed_ldflags)
+endif
+
 # As with plperl we need to make sure that the CORE directory is included
 # last, probably because it sometimes contains some header files with names
 # that clash with some of ours, or with some that we include, notably on
diff --git a/contrib/hstore_plpython/Makefile b/contrib/hstore_plpython/Makefile
index 6ee434b..dfff0fd 100644
--- a/contrib/hstore_plpython/Makefile
+++ b/contrib/hstore_plpython/Makefile
@@ -28,6 +28,12 @@ ifeq ($(PORTNAME), win32)
 SHLIB_LINK += ../hstore/libhstore.a $(wildcard

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

I wrote:
 TBH, I think the right thing to do at this point is to revert the entire
 patch and send it back for ground-up rework.  I think the high-level
 design is wrong in many ways and I have about zero confidence in most
 of the code details as well.

 I'll send a separate message about high-level issues,

And here's that.  I do *not* claim that this is a complete list of design
issues with the patch, it's just things I happened to notice in the amount
of time I've spent so far (which is already way more than I wanted to
spend on TABLESAMPLE right now).


I'm not sure that we need an extensible sampling-method capability at all,
let alone that an API for that needs to be the primary focus of a patch.
Certainly the offered examples of extension modules aren't inspiring:
tsm_system_time is broken by design (more about that below) and nobody
would miss tsm_system_rows if it weren't there.  What is far more
important is to get sampling capability into indexscans, and designing
an API ahead of time seems like mostly a distraction from that.

I'd think seriously about tossing the entire executor-stage part of the
patch, creating a stateless (history-independent) sampling filter that
just accepts or rejects TIDs, and sticking calls to that into all the
table scan node types.  Once you had something like that working well
it might be worth considering whether to try to expose an API to
generalize it.  But even then it's not clear that we really need any
more options than true-Bernoulli and block-level sampling.

The IBM paper I linked to in the other thread mentions that their
optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
was requested.  Probably that happens when it decides to do a simple
indexscan, because then there's no advantage to trying to cluster the
sampled rows.  But in the case of a bitmap scan, you could very easily do
either true Bernoulli or block-level sampling simply by adjusting the
rules about which bits you keep or clear in the bitmap (given that you
apply the filter between collection of the index bitmap and accessing the
heap, which seems natural).  The only case where a special scan type
really seems to be needed is if you want block-level sampling, the query
would otherwise use a seqscan, *and* the sampling percentage is pretty low
--- if you'd be reading every second or third block anyway, you're likely
better off with a plain seqscan so that the kernel sees sequential access
and does prefetching.  The current API doesn't seem to make it possible to
switch between seqscan and read-only-selected-blocks based on the sampling
percentage, but I think that could be an interesting optimization.
(Another bet that's been missed is having the samplescan logic request
prefetching when it is doing selective block reads.  The current API can't
support that AFAICS, since there's no expectation that nextblock calls
could be done asynchronously from nexttuple calls.)

Another issue with the API as designed is the examinetuple() callback.
Allowing sample methods to see invisible tuples is quite possibly the
worst idea in the whole patch.  They can *not* do anything with such
tuples, or they'll totally break reproducibility: if the tuple is
invisible to your scan, it might well be or soon become invisible to
everybody, whereupon it would be subject to garbage collection at the
drop of a hat.  So if an invisible tuple affects the sample method's
behavior at all, repeated scans in the same query would not produce
identical results, which as mentioned before is required both by spec
and for minimally sane query behavior.  Moreover, examining the contents
of the tuple is unsafe (it might contain pointers to TOAST values that
no longer exist); and even if it were safe, what's the point?  Sampling
that pays attention to the data is the very definition of biased.  So
if we do re-introduce an API like the current one, I'd definitely lose
this bit and only allow sample methods to consider visible tuples.

On the point of reproducibility: the tsm_system_time method is utterly
incapable of producing identical results across repeated scans, because
there is no reason to believe it would get exactly as far in the same
amount of time each time.  This might be all right across queries if
the method could refuse to work with REPEATABLE clauses (but there's
no provision for such a restriction in the current API).  But it's not
acceptable within a query.  Another problem with tsm_system_time is that
its cost/rowcount estimation is based on nothing more than wishful
thinking, and can never be based on anything more than wishful thinking,
because planner cost units are not time-based.  Adding a comment that
says we'll nonetheless pretend they are milliseconds isn't a solution.
So that sampling method really has to go away and never come back,
whatever we might ultimately salvage from this patch otherwise.

(I'm not exactly convinced that the system or tsm_system_rows methods
are adequately

Re: [HACKERS] intarray planning/exeuction problem with multiple operators

Jeff Janes jeff.ja...@gmail.com writes:
 I've found an interesting performance problem in the intarray extension
 module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
 and 9.6devel.

Seems like this isn't specifically an issue for intarray, but rather one
with the core GIN code not being smart about the combination of selective
and unselective index conditions.  In particular, it seems like the smart
thing for GIN to do with this example is just ignore the @@ condition
altogether so far as the index search goes, and mark all the results as
needing recheck so that the @@ operator gets applied as a filter.

You could also complain about the core planner not considering leaving
some potentially-indexable quals out of the actual index condition,
but TBH I don't see that that would be a useful use of planner cycles.
In almost every case it would mean that if there are K quals potentially
usable with a given index, we'd cost out 2^K-1 index paths and immediately
reject all but the use-em-all alternative.  That's a lot of cycles to
spend to handle a corner case.  It's always been assumed to be the index
AM's problem to optimize use of the index quals it's handed, and I don't
see why that shouldn't be true here.

 The proposed selectivity estimate improvement patch for @@ does not fix
 this (and evaluating that patch was how I found this issue.)

Nah, it wouldn't: the cost estimates are correct, in the sense that
gincostestimate realizes that GIN will be horribly stupid about this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-13 Thread Simon Riggs

On 13 July 2015 at 17:00, Tom Lane t...@sss.pgh.pa.us wrote:

 I wrote:
  TBH, I think the right thing to do at this point is to revert the entire
  patch and send it back for ground-up rework.  I think the high-level
  design is wrong in many ways and I have about zero confidence in most
  of the code details as well.

  I'll send a separate message about high-level issues,

 And here's that.  I do *not* claim that this is a complete list of design
 issues with the patch, it's just things I happened to notice in the amount
 of time I've spent so far (which is already way more than I wanted to
 spend on TABLESAMPLE right now).


Interesting that a time-based sample does indeed yield useful results. Good
to know.

That is exactly what this patch provides: a time-based sample, with reduced
confidence in the accuracy of the result. And other things too.


 I'm not sure that we need an extensible sampling-method capability at all,
 let alone that an API for that needs to be the primary focus of a patch.
 Certainly the offered examples of extension modules aren't inspiring:
 tsm_system_time is broken by design (more about that below) and nobody
 would miss tsm_system_rows if it weren't there.


Time based sampling isn't broken by design. It works exactly according to
the design.

Time-based sampling has been specifically requested by data visualization
developers who are trying to work out how to display anything useful from a
database beyond a certain size. The feature for PostgreSQL implements a
similar mechanism to that deployed already with BlinkDB, demonstrated at
VLDB 2012. I regard it as an Advanced feature worthy of deployment within
PostgreSQL, based upon user request.

Row based sampling offers the capability to retrieve a sample of a fixed
size however big the data set. I shouldn't need to point out that this is
already in use within the ANALYZE command. Since it implements a technique
already in use within PostgreSQL, I see no reason not to expose that to
users. As I'm sure you're aware, there is rigorous math backing up the use
of fixed size sampling, with recent developments from my research
colleagues at Manchester University that further emphasises the usefulness
of this feature for us.


 What is far more
 important is to get sampling capability into indexscans, and designing
 an API ahead of time seems like mostly a distraction from that.


This has come up as a blocker on TABLESAMPLE before. I've got no evidence
to show anyone cares about that. I can't imagine a use for it myself; it
was not omitted from this patch because its difficult, it was omitted
because its just useless. If anyone ever cares, they can add it in a later
release.


 I'd think seriously about tossing the entire executor-stage part of the
 patch, creating a stateless (history-independent) sampling filter that
 just accepts or rejects TIDs, and sticking calls to that into all the
 table scan node types.  Once you had something like that working well
 it might be worth considering whether to try to expose an API to
 generalize it.  But even then it's not clear that we really need any
 more options than true-Bernoulli and block-level sampling.


See above.


 The IBM paper I linked to in the other thread mentions that their
 optimizer will sometimes choose to do Bernoulli sampling even if SYSTEM
 was requested.


That sounds broken to me. This patch gives the user what the user asks for.
BERNOULLI or SYSTEM.

If you particularly like the idea of mixing the two sampling methods, you
can do so *because* the sampling method API is extensible for the user. So
Mr.DB2 can get it his way if he likes and he can call it SYSNOULLI if he
likes; others can get it the way they like also. No argument required.

It was very, very obvious that whatever sampling code anybody dreamt up
would be objected to. Clearly, we need a way to allow people to implement
whatever technique they wish. Stratified sampling, modified sampling, new
techniques. Whatever.


 Probably that happens when it decides to do a simple
 indexscan, because then there's no advantage to trying to cluster the
 sampled rows.  But in the case of a bitmap scan, you could very easily do
 either true Bernoulli or block-level sampling simply by adjusting the
 rules about which bits you keep or clear in the bitmap (given that you
 apply the filter between collection of the index bitmap and accessing the
 heap, which seems natural).  The only case where a special scan type
 really seems to be needed is if you want block-level sampling, the query
 would otherwise use a seqscan, *and* the sampling percentage is pretty low
 --- if you'd be reading every second or third block anyway, you're likely
 better off with a plain seqscan so that the kernel sees sequential access
 and does prefetching.  The current API doesn't seem to make it possible to
 switch between seqscan and read-only-selected-blocks based on the sampling
 percentage, but I think that could be an interesting optimization.

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22



On 07/13/2015 05:18 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 07/13/2015 11:53 AM, Marco Atzeri wrote:

On 7/13/2015 5:36 PM, Andrew Dunstan wrote:

hstore_plpython.o: In function `hstore_to_plpython':
/home/andrew/bf64/root/HEAD/pgsql/contrib/hstore_plpython/hstore_plpython.c:35:
undefined reference to `PLyUnicode_FromStringAndSize'

No this doesn't seem to be the problem. For some reason it's apparently
not finding the symbol in plpython3.dll, where it should definitely
exist, unless I'm completely off base. So I'm rather confused.

Could hstore_plpython and plpython somehow have been built with different
ideas about PY_MAJOR_VERSION?  PLyUnicode_FromStringAndSize is
conditionally compiled, and the reference to it from hstore_plpython
depends on a conditionally-defined macro, and this error would make plenty
of sense if those conditions somehow diverged.  So I'd look for instance
at whether identical -I paths were used in both parts of the build.




Not AFAICT. Here is the contrib build:

   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2 -I../../src/pl/plpython
   -I/usr/include/python3.4m -I../../contrib/hstore -I. -I.
   -I../../src/include -I/usr/include/libxml2   -c -o hstore_plpython.o
   hstore_plpython.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   hstore_plpython3.dll  hstore_plpython.o  -L../../src/port
   -L../../src/common -Wl,--allow-multiple-definition
   -Wl,--enable-auto-import -L/usr/lib  -L/usr/local/lib
   -Wl,--as-needed   -L../../src/backend -lpostgres -L../hstore
   -lhstore -L../../src/pl/plpython -lplpython3
   -L/usr/lib/python3.4/config-3.4m -lpython3.4m -lpgcommon -lpgport
   -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt

and here is the plpython build:

   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2  -I. -I.
   -I/usr/include/python3.4m -I../../../src/include
   -I/usr/include/libxml2  -DUSE_DL_IMPORT  -c -o plpy_util.o plpy_util.c
   ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
   -Wdeclaration-after-statement -Wendif-labels
   -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
   -fwrapv -fexcess-precision=standard -g -O2   -shared -o
   plpython3.dll  plpy_cursorobject.o plpy_elog.o plpy_exec.o
   plpy_main.o plpy_planobject.o plpy_plpymodule.o plpy_procedure.o
   plpy_resultobject.o plpy_spi.o plpy_subxactobject.o plpy_typeio.o
   plpy_util.o  -L../../../src/port -L../../../src/common
   -Wl,--allow-multiple-definition -Wl,--enable-auto-import -L/usr/lib 
   -L/usr/local/lib -Wl,--as-needed -L/usr/lib/python3.4/config-3.4m

   -lpython3.4m -lintl -ldl -L../../../src/backend -lpostgres
   -lpgcommon -lpgport -lxslt -lxml2 -lssl -lcrypto -lz -lreadline -lcrypt


The functions are in fact apparently built - the names are present in 
the object file and the DLL.


cheers

andrew



regards, tom lane





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Forensic recovery deleted pgdump custom format file

2015-07-13 Thread David Guimaraes

Hello. I need some help.

I have the following situation. My client deleted a number of old backups
from a drive disc made by PGDUMP with custom flag activated. I could not
find any program to recover backup files made by PGDUMP of customized /
binary form. So I decided to try to understand the file format generated by
pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
things:

The header of the file always starts with PGDMP followed by pgdump
version number used, followed by int size, offset, etc. followed by TOCs.

My question is how to know the end of the file? Are there any signature
that I can use? Or would have to analyze the whole file?

Thank u.

-- 
David

Re: [HACKERS] PostgreSQL 9.5 Alpha 1 build fail with perl 5.22



On 07/13/2015 07:55 PM, Michael Paquier wrote:

On Tue, Jul 14, 2015 at 8:49 AM, Andrew Dunstan and...@dunslane.net wrote:

On 07/13/2015 07:33 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

Not AFAICT. Here is the contrib build:

Hm ... what does -DUSE_DL_IMPORT do?

NFI - I tried building with that but it made no difference.

Is your python3 build a 32b version?


No, this is all 64 bit.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Forensic recovery deleted pgdump custom format file

On Tue, Jul 14, 2015 at 9:28 AM, David Guimaraes skys...@gmail.com wrote:
 So I decided to try to understand the file format generated by
 pgdump. Analyzing the source code of pgdump/recovery, i concluded a few
 things:

 The header of the file always starts with PGDMP followed by pgdump version
 number used, followed by int size, offset, etc. followed by TOCs.

 My question is how to know the end of the file? Are there any signature that
 I can use? Or would have to analyze the whole file?

Why are you trying to reinvent the wheel? pg_restore is not available?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

On Tue, Jul 14, 2015 at 9:00 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Jul 13, 2015 at 10:34 PM, Fujii Masao wrote:
 On Fri, Jul 10, 2015 at 10:06 PM, Beena Emerson wrote:
 On Tue, Jul 7, 2015 at 2:19 PM, Michael Paquier wrote:

 Something like pg_syncinfo/ coupled with a LW lock, we already do
 something similar for replication slots with pg_replslot/.

 I was trying to figure out how the JSON metadata can be used.
 It would have to be set using a given set of functions.

 So we can use only such a set of functions to configure synch rep?
 I don't like that idea. Because it prevents us from configuring that
 while the server is not running.

 If you store a json blob in a set of files of PGDATA you could update
 them manually there as well. That's perhaps re-inventing the wheel
 with what is available with GUCs though.

Why don't we just use GUC? If the quorum setting is not so complicated
in real scenario, GUC seems enough for that.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] optimizing vacuum truncation scans

2015-07-13 Thread Haribabu Kommi

On Mon, Jul 13, 2015 at 12:06 PM, Haribabu Kommi
kommi.harib...@gmail.com wrote:
 On Thu, Jul 9, 2015 at 5:36 PM, Haribabu Kommi kommi.harib...@gmail.com 
 wrote:

 I will do some performance tests and send you the results.

 Here are the performance results tested on my machine.


  Head  vm patchvm+prefetch 
 patch

 First vacuum120sec1sec 1sec
 second vacuum180 sec   180 sec30 sec

 I did some modifications in the code to skip the vacuum truncation by
 the first vacuum command.
 This way I collected the second vacuum time taken performance.

 I just combined your vm and prefetch patch into a single patch
 vm+prefetch patch without a GUC.
 I kept the prefetch as 32 and did the performance test. I chosen
 prefetch based on the current
 buffer access strategy, which is 32 for vacuum presently instead of an
 user option.
 Here I attached the modified patch with both vm+prefetch logic.

 I will do some tests on a machine with SSD and let you know the
 results. Based on these results,
 we can decide whether we need a GUC or not? based on the impact of
 prefetch on ssd machines.

Following are the performance readings on a machine with SSD.
I increased the pgbench scale factor to 1000 in the test instead of 500
to show a better performance numbers.

 Head   vm patchvm+prefetch patch

First vacuum6.24 sec   2.91 sec   2.91 sec
second vacuum6.66 sec   6.66 sec   7.19 sec

There is a small performance impact on SSD with prefetch.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] intarray planning/exeuction problem with multiple operators

2015-07-13 Thread Jeff Janes

I've found an interesting performance problem in the intarray extension
module.  It doesn't seem to be version dependent, I've verified it in 9.4.4
and 9.6devel.

If I do a query that involves both an = op and a @@ op ANDed together, it
gives a high cost estimate, which is justified by the slow runtime.  If I
omit the @@ it gives a low estimate, also justified.  If I trick it into
thinking it cannot use the index to satisfy the @@ op, then it gives a low
estimate and low runtime, applying the @@ in the filter step and only the
fast = in the bitmap index scan.

Now it could use the index for the fast = operation and apply the @@ in the
recheck, rather than the filter.  But I guess it doesn't think of that,
despite knowing that applying the @@ in index operation is slow.

So it seems to be missing a trick someplace, but I don't if it reasonable
to expect it to find that trick, or how easy/hard it would be to implement.

The proposed selectivity estimate improvement patch for @@ does not fix
this (and evaluating that patch was how I found this issue.)


Set up:

create table foobar as
select (
  select
array_agg(floor(sqrt(random()*1000+g.y/100+f.x/1000))::int)
from generate_series(1,10) g(y)
  ) foo
  from generate_series(1,1000) f(x);
create index on foobar using gin(foo gin__int_ops);
analyze;

You can knock an order of magnitude off from the table size and should
still be able to see the problem.

example:

explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo @@
'!1'::query_int;

QUERY PLAN
---
 Seq Scan on foobar  (cost=0.00..263637.00 rows=1 width=61) (actual
time=9717.957..9717.957 rows=0 loops=1)
   Filter: ((foo @@ '!1'::query_int) AND (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[]))
   Rows Removed by Filter: 1000
   Buffers: shared hit=64 read=113573
   I/O Timings: read=361.402
 Planning time: 0.101 ms
 Execution time: 9717.991 ms
(7 rows)


explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' ;
   QUERY PLAN
-
 Bitmap Heap Scan on foobar  (cost=112.01..116.02 rows=1 width=61) (actual
time=0.027..0.027 rows=0 loops=1)
   Recheck Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
   Buffers: shared hit=21
   -  Bitmap Index Scan on foobar_foo_idx  (cost=0.00..112.01 rows=1
width=0) (actual time=0.023..0.023 rows=0 loops=1)
 Index Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
 Buffers: shared hit=21
 Planning time: 0.097 ms
 Execution time: 0.061 ms

If I trick it into thinking the @@ operator cannot be used in th eindex,
then it gets really fast again:

explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo||'{}'
@@ '!1'::query_int;
   QUERY PLAN
-
 Bitmap Heap Scan on foobar  (cost=112.01..116.03 rows=1 width=61) (actual
time=0.082..0.082 rows=0 loops=1)
   Recheck Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
   Filter: ((foo || '{}'::integer[]) @@ '!1'::query_int)
   Buffers: shared hit=21
   -  Bitmap Index Scan on foobar_foo_idx  (cost=0.00..112.01 rows=1
width=0) (actual time=0.080..0.080 rows=0 loops=1)
 Index Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
 Buffers: shared hit=21
 Planning time: 0.139 ms
 Execution time: 0.129 ms

Cheers,

Jeff

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Shulgin, Oleksandr

On Sun, Jul 12, 2015 at 8:39 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 The thing is - it's not only about whitespace, otherwise I would probably
 not bother with the generic interface. For my original problem, there is
 simply no way to do this correctly in an extension w/o copying over all of
 the logic from json.c, which I have to do right now, would rather not.

 I am sorry - we are talking about JSON, not about any styled document. I
 disagree, so it has not be implemented as extension - the backport of JSON
 support is a extension.


Hm... I'm having a hard time making sense of that statement, sorry.

To reiterate: for my problem, that is escaping numerics that can
potentially overflow[1] under ECMAScript standard, I want to be able to
override the code that outputs the numeric converted to string.  There is
no way in current implementation to do that *at all*, short of copying all
the code involved in producing JSON output and changing it at certain
points.  One could try re-parsing JSON instead, but that doesn't actually
solve the issue, because type information is lost forever at that point.

The whitespace unification was a mere side-effect of the original effort on
this patch.

--
Best regards,
Alex

[1]
http://stackoverflow.com/questions/307179/what-is-javascripts-highest-integer-value-that-a-number-can-go-to-without-losin

Re: [HACKERS] [PATCH] Generalized JSON output functions

2015-07-13 Thread Pavel Stehule

2015-07-13 9:30 GMT+02:00 Shulgin, Oleksandr oleksandr.shul...@zalando.de:

On Sun, Jul 12, 2015 at 8:39 PM, Pavel Stehule pavel.steh...@gmail.com
wrote:

The thing is - it's not only about whitespace, otherwise I would probably
not bother with the generic interface. For my original problem, there is
simply no way to do this correctly in an extension w/o copying over all of
the logic from json.c, which I have to do right now, would rather not.

I am sorry - we are talking about JSON, not about any styled document. I
disagree, so it has not be implemented as extension - the backport of JSON
support is a extension.

Hm... I'm having a hard time making sense of that statement, sorry.

To reiterate: for my problem, that is escaping numerics that can
potentially overflow[1] under ECMAScript standard, I want to be able to
override the code that outputs the numeric converted to string. There is
no way in current implementation to do that *at all*, short of copying all
the code involved in producing JSON output and changing it at certain
points. One could try re-parsing JSON instead, but that doesn't actually
solve the issue, because type information is lost forever at that point.

The whitespace unification was a mere side-effect of the original effort
on this patch.

The dynamic type change is some what I would not to do in database, really
:)

If you afraid about overflow, then convert numeric to string immediately -
in this case, the client have to support both variant - so immediate cast
should not be a problem.

Anyway this check on max number should be implemented in our JSON(b) out
functions (as warning?).

Regards

Pavel

--
Best regards,
Alex

[1]
http://stackoverflow.com/questions/307179/what-is-javascripts-highest-integer-value-that-a-number-can-go-to-without-losin

[HACKERS] [PATCH] SQL function to report log message

Hi All,

Greetings for the day.

Would like to discuss on below feature here.

Feature:
Having an SQL function, to write messages to log destination.

Justification:
As of now, we don't have an SQL function to write custom/application
messages to log destination. We have RAISE clause which is controlled by
log_ parameters. If we have an SQL function which works irrespective of log
settings, that would be a good for many log parsers. What i mean is, in DBA
point of view, if we route all our native OS stats to log files in a proper
format, then we can have our log reporting tools to give most effective
reports. Also, Applications can log their own messages to postgres log
files, which can be monitored by DBAs too.

Implementation:
Implemented a new function pg_report_log which takes one argument as
text, and returns void. I took, LOG prefix for all the reporting
messages.I wasn't sure to go with new prefix for this, since these are
normal LOG messages. Let me know, if i am wrong here.

Here is the attached patch.

Regards,
Dinesh
manojadinesh.blogspot.com
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 76f77cb..b2fc4cd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17850,6 +17850,15 @@ postgres=# SELECT * FROM 
pg_xlogfile_name_offset(pg_stop_backup());
 Return information about a file.
/entry
   /row
+  row
+   entry
+literalfunctionpg_report_log(parametermessage/ 
typetext/type])/function/literal
+   /entry
+   entrytypevoid/type/entry
+   entry
+Write message into log file.
+   /entry
+  /row
  /tbody
 /tgroup
/table
@@ -17918,6 +17927,18 @@ SELECT (pg_stat_file('filename')).modification;
 /programlisting
/para
 
+   indexterm
+primarypg_report_log/primary
+   /indexterm
+   para
+functionpg_report_log/ is useful to write custom messages
+into current log destination and returns typevoid/type.
+Typical usages include:
+programlisting
+SELECT pg_report_log('Message');
+/programlisting
+   /para
+
   /sect2
 
   sect2 id=functions-advisory-locks
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..6c54f3a 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -76,6 +76,23 @@ current_query(PG_FUNCTION_ARGS)
 }
 
 /*
+ * pg_report_log()
+ *
+ * Printing custom log messages in log file.
+ */
+
+Datum
+pg_report_log(PG_FUNCTION_ARGS)
+{
+
+   ereport(MESSAGE,
+   (errmsg(%s, text_to_cstring(PG_GETARG_TEXT_P(0))),
+   errhidestmt(true)));
+
+   PG_RETURN_VOID();
+}
+
+/*
  * Send a signal to another backend.
  *
  * The signal is delivered if the user is either a superuser or the same
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 088c714..2e8f547 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -302,7 +302,7 @@ errstart(int elevel, const char *filename, int lineno,
elevel == INFO);
}
 
-   /* Skip processing effort if non-error message will not be output */
+   /* Skip processing effort if non-error,custom message will not be 
output */
if (elevel  ERROR  !output_to_server  !output_to_client)
return false;
 
@@ -2062,6 +2062,7 @@ write_eventlog(int level, const char *line, int len)
case DEBUG3:
case DEBUG2:
case DEBUG1:
+   case MESSAGE:
case LOG:
case COMMERROR:
case INFO:
@@ -2917,6 +2918,7 @@ send_message_to_server_log(ErrorData *edata)
case DEBUG1:
syslog_level = LOG_DEBUG;
break;
+   case MESSAGE:
case LOG:
case COMMERROR:
case INFO:
@@ -3547,6 +3549,7 @@ error_severity(int elevel)
case DEBUG5:
prefix = _(DEBUG);
break;
+   case MESSAGE:
case LOG:
case COMMERROR:
prefix = _(LOG);
@@ -3666,6 +3669,9 @@ is_log_level_output(int elevel, int log_min_level)
/* Neither is LOG */
else if (elevel = log_min_level)
return true;
+   /* If elevel is MESSAGE, then ignore log settings */
+   else if (elevel == MESSAGE)
+   return true;
 
return false;
 }
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 6fd1278..62c619a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5344,6 +5344,11 @@ DESCR(tsm_bernoulli_reset(internal));
 DATA(insert OID = 3346 (  tsm_bernoulli_cost   PGNSP PGUID 12 1 0 0 0 
f f f f t f v 7 0 2278 2281 2281 2281 2281 2281 2281 2281

Re: [HACKERS] [PATCH] SQL function to report log message