date:20140303

Re: [HACKERS] Rowcounts marked by create_foreignscan_path()

2014-03-03 Thread Etsuro Fujita

(2014/02/18 12:37), Tom Lane wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 (2014/02/18 12:03), Tom Lane wrote:
 The calling FDW is supposed to do that; note the header comment.
 
 However, ISTM postgresGetForeignPaths() doesn't work like
 that.  It uses the same rowcount for all paths of a same parameterization?
 
 That's what we want no?

Maybe I'm missing something.  But I don't think
postgresGetForeignPaths() marks the parameterized path with the correct
row estimate.  Also, that function doesn't seem to estimate the cost of
the parameterized path correctly.  Please find attached a patch.

 Anyway, the point of using ppi_rows would be to enforce that all the
 rowcount estimates for a given parameterized relation are the same.
 In the FDW case, all those estimates are the FDW's responsibility,
 and so making them consistent is also its responsibility IMO.
 
 Another way of looking at this is that none of the pathnode creation
 routines in pathnode.c are responsible for setting rowcount estimates.
 That's done by whatever is setting the cost estimate; this must be so,
 else the cost estimate is surely bogus.  So any way you slice it, the
 FDW has to get it right.

Understood.  Thank you for the analysis.

Sorry for the delay.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 295,300  static bool postgresAnalyzeForeignTable(Relation relation,
--- 295,301 
  static void estimate_path_cost_size(PlannerInfo *root,
RelOptInfo *baserel,
List *join_conds,
+   List *param_conds,
double *p_rows, int *p_width,
Cost *p_startup_cost, Cost 
*p_total_cost);
  static void get_remote_estimate(const char *sql,
***
*** 493,499  postgresGetForeignRelSize(PlannerInfo *root,
 * values in fpinfo so we don't need to do it again to generate 
the
 * basic foreign path.
 */
!   estimate_path_cost_size(root, baserel, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
  
--- 494,500 
 * values in fpinfo so we don't need to do it again to generate 
the
 * basic foreign path.
 */
! estimate_path_cost_size(root, baserel, NIL, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
  
***
*** 523,529  postgresGetForeignRelSize(PlannerInfo *root,
set_baserel_size_estimates(root, baserel);
  
/* Fill in basically-bogus cost estimates for use later. */
!   estimate_path_cost_size(root, baserel, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
}
--- 524,530 
set_baserel_size_estimates(root, baserel);
  
/* Fill in basically-bogus cost estimates for use later. */
!   estimate_path_cost_size(root, baserel, NIL, NIL,
fpinfo-rows, 
fpinfo-width,

fpinfo-startup_cost, fpinfo-total_cost);
}
***
*** 541,547  postgresGetForeignPaths(PlannerInfo *root,
--- 542,550 
PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) baserel-fdw_private;
ForeignPath *path;
List   *join_quals;
+   List   *param_quals;
Relids  required_outer;
+   ParamPathInfo *param_info;
double  rows;
int width;
Coststartup_cost;
***
*** 591,604  postgresGetForeignPaths(PlannerInfo *root,
if (!is_foreign_expr(root, baserel, rinfo-clause))
continue;
  
-   /*
-* OK, get a cost estimate from the remote, and make a path.
-*/
-   join_quals = list_make1(rinfo);
-   estimate_path_cost_size(root, baserel, join_quals,
-   rows, width,
-   startup_cost, 
total_cost);
- 
/* Must calculate required outer rels for this path */
required_outer = bms_union(rinfo-clause_relids,

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-03-03 Thread Christian Kruse

Hi,

 Attached is a patch with the updated documentation (now uses
 consistently huge pages) as well as a renamed GUC, consistent wording
 (always use huge pages) as well as renamed variables.
 
 Hmm, I wonder if that could now be misunderstood to have something to do
 with the PostgreSQL page size? Maybe add the word memory or operating
 system in the first sentence in the docs, like this: Enables/disables the
 use of huge memory pages.

Accepted, see attached patch.

para
 At present, this feature is supported only on Linux. The setting is
 ignored on other systems when set to literaltry/literal.
 productnamePostgreSQL/productname will
 refuse to start when set to literalon/literal.
/para
 
 Is it clear enough that PostgreSQL will only refuse to start up when it's
 set to on, *if the feature's not supported on the platform*? Perhaps just
 leave that last sentence out. It's mentioned later that  With
 literalon/literal, failure to use huge pages will prevent the server
 from starting up., that's probably enough.

Fixed.

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fa9ee37..065c1db 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1166,14 +1166,14 @@ include 'filename'
   /listitem
  /varlistentry
 
- varlistentry id=guc-huge-tlb-pages xreflabel=huge_tlb_pages
-  termvarnamehuge_tlb_pages/varname (typeenum/type)/term
+ varlistentry id=guc-huge-pages xreflabel=huge_pages
+  termvarnamehuge_pages/varname (typeenum/type)/term
   indexterm
-   primaryvarnamehuge_tlb_pages/ configuration parameter/primary
+   primaryvarnamehuge_pages/ configuration parameter/primary
   /indexterm
   listitem
para
-Enables/disables the use of huge TLB pages. Valid values are
+Enables/disables the use of huge memory pages. Valid values are
 literaltry/literal (the default), literalon/literal,
 and literaloff/literal.
/para
@@ -1181,19 +1181,16 @@ include 'filename'
para
 At present, this feature is supported only on Linux. The setting is
 ignored on other systems when set to literaltry/literal.
-productnamePostgreSQL/productname will
-refuse to start when set to literalon/literal.
/para
 
para
-The use of huge TLB pages results in smaller page tables and
-less CPU time spent on memory management, increasing performance. For
-more details, see
-xref linkend=linux-huge-tlb-pages.
+The use of huge pages results in smaller page tables and less CPU time
+spent on memory management, increasing performance. For more details,
+see xref linkend=linux-huge-pages.
/para
 
para
-With varnamehuge_tlb_pages/varname set to literaltry/literal,
+With varnamehuge_pages/varname set to literaltry/literal,
 the server will try to use huge pages, but fall back to using
 normal allocation if that fails. With literalon/literal, failure
 to use huge pages will prevent the server from starting up. With
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 5f9fa61..7f4a235 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1308,13 +1308,13 @@ echo -1000  /proc/self/oom_score_adj
/note
   /sect2
 
-  sect2 id=linux-huge-tlb-pages
-   titleLinux huge TLB pages/title
+  sect2 id=linux-huge-pages
+   titleLinux huge pages/title
 
para
-Using huge TLB pages reduces overhead when using large contiguous chunks
-of memory, like PostgreSQL does. To enable this feature
-in productnamePostgreSQL/productname you need a kernel
+Using huge pages reduces overhead when using large contiguous chunks of
+memory, like productnamePostgreSQL/productname does. To enable this
+feature in productnamePostgreSQL/productname you need a kernel
 with varnameCONFIG_HUGETLBFS=y/varname and
 varnameCONFIG_HUGETLB_PAGE=y/varname. You also have to tune the system
 setting varnamevm.nr_hugepages/varname. To estimate the number of
@@ -1345,7 +1345,7 @@ $ userinputsysctl -w vm.nr_hugepages=3170/userinput
 productnamePostgreSQL/productname is to use them when possible and
 to fallback to normal pages when failing. To enforce the use of huge
 pages, you can set
-link linkend=guc-huge-tlb-pagesvarnamehuge_tlb_pages/varname/link
+link linkend=guc-huge-pagesvarnamehuge_pages/varname/link
 to literalon/literal. Note that in this case
 productnamePostgreSQL/productname will fail to start if not enough huge
 pages are available.
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 65ad595..51c1a2b 100644
--- a/src/backend/port/sysv_shmem.c
+++

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-03-03 Thread Christian Kruse

Hi,

On 28/02/14 17:58, Peter Geoghegan wrote:
 On Fri, Feb 28, 2014 at 9:43 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  Hmm, I wonder if that could now be misunderstood to have something to do
  with the PostgreSQL page size? Maybe add the word memory or operating
  system in the first sentence in the docs, like this: Enables/disables the
  use of huge memory pages.
 
 Whenever I wish to emphasize that distinction, I tend to use the term
 MMU pages.

I don't like to distinct that much from Linux terminology, this may
lead to confusion. And to use this term only in one place doesn't seem
to make sense, too – naming will then be inconsistent and thus lead to
confusion, too. Do you agree?

Best regards,

-- 
 Christian Kruse   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



pgpkad_A3izOE.pgp
Description: PGP signature

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-03-03 Thread Kyotaro HORIGUCHI

Hello,

  Yes, the old dumped version of typ2 patch did so. It flattened
  appendrel tree for the query prpoerly. Let me hear the reson you
  prefer to do so.
 
 Having reviewed my upthread reasoning for preferring one of those two
 approaches over the other, it's a weak preference.  They have similar runtime
 costs.  Putting the logic with the code that creates appendrels reduces the
 number of code sites one must examine to reason about possible plan
 structures.  We might not flatten RTE_RELATION appendrel parents exactly the
 same way we flatten RTE_SUBQUERY appendrel parents.  I would tend to leave
 inh=true for the former, for reasons explained in my notes on v7, but set
 inh=false for the latter to save needless work.

Unfortunately, RTE_SUBQUERY-es with their inh flag cleard might
cause something inconvenient in preprocess_minmax_aggregates()
and/or add_rtes_to_flat_rtable()..

# I haven't found that related to sepgsql, though :-(

I understood that your concern is to deal parent RTEs defferently
according to their relkinds. But I think the inh flags could not
be modified at all for the reason mentioned above.

Finally the seemingly most safe way to exclude removed
intermediate RTEs in subsequent processing is simplly remove
apprelinfos directly linked to them (as did in v7), for both of
the parents, RTE_RELATION and RTE_SUBQUERY. The difference has
disappeared in this story.


At least as of now, inheritance tables define the bottom bound of
a appendrel tree and on the other hand complex(?)  union_alls
define the upper bound, and both multilevel (simple)union_alls
and inheritances are flattened individually so all possible
inheritance tree to be collapsed by this patch is only, I think,

  Subquery(inh=1)[Relation-inhparent [Relation-child, child, child]]

 On the other hand, a flattening pass is less code overall and
 brings an attractive uniformity-by-default to the area.

Focusing only on the above structure, what we should do to
collapse this tree is only connect the childs to the Subquery
directly, then remove all appendrelinfos connecting to the
Relation-inhparent. inh flag need not to be modified.

# Umm. I strongly suspect that I overlooked something..

Then even widening to general case, the case doesn't seem to
change. All we need to do is, link a child to its grandparent and
isolate the parent removing apprelinfos.

Thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-03-03 Thread Kyotaro HORIGUCHI

I marked this patch as 'Ready for Committer' by myself according
to the following discussion.

Thanks.

 At Tue, 25 Feb 2014 16:35:55 -0500, Robert Haas wrote
  On Tue, Feb 25, 2014 at 3:36 AM, Kyotaro HORIGUCHI
  horiguchi.kyot...@lab.ntt.co.jp wrote:
   May I mark this patch as Ready for Committer by myself since
   this was already marked so in last CF3 and have had no objection
   or new suggestion in this CF4 for more than a month?
  
  Sounds fair.
 
 Thank you, I will send this patch to commtters after waiting
 another few days for more suggestions.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-03 Thread Andres Freund

On 2014-02-28 20:55:20 +0530, Amit Kapila wrote:
 On Thu, Feb 27, 2014 at 4:14 PM, Christian Kruse
  I wouldn't be inclined to dump the whole tuple under any
  circumstances.  That could be a lot more data than what you want
  dumped in your log.  The PK could already be somewhat unreasonably
  large, but the whole tuple could be a lot more unreasonably large.
 
  Well, as I already stated: we don't. I copied the behavior we use in
  CHECK constraints (ExecBuildSlotValueDescription()).
 
 I think now more people are of opinion that displaying whole tuple is
 not useful. I believe it is good to go ahead by displaying just primary key
 for this case and move ahead.

The patch does not, and has never, printed the full tuple. What it does
is copying the behaviour of printing the tuple for check constraint
violations, which is truncating the tuple after a certain length.

I don't think changing this in an isolated manner to the primary key
would be a good idea. Let's use the same logic here, and if somebody
wants to argue for always using the primary key instead of a truncated
tuple, let them submit a separate patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-03 Thread Andres Freund

On 2014-03-01 13:29:18 +0530, Amit Kapila wrote:
 With new patch, the message while updating locked rows will be displayed
 as below:
 
 LOG:  process 364 still waiting for ShareLock on transaction 678 after
 1014.000ms
 CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation 
 publ
 ic.t1 of database postgres
 
 LOG:  process 364 acquired ShareLock on transaction 678 after 60036.000 ms
 CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation 
 publ
 ic.t1 of database postgres
 
 Now I am not sure, if the second message is an improvement, as what it sounds
 is while attempting to lock tuple, it got shared lock on
 transaction'. If you, Robert
 or other feels it is okay, then we can retain it as it is in patch
 else I think either
 we need to rephrase it or may be try with some other way (global variable) 
 such
 that it appears only for required case. I feel the way Robert has
 suggested i.e to
 make it as Detail of particular message (we might need to use global variable 
 to
 pass certain info) is better way and will have minimal impact on the cases 
 where
 this additional information needs to be displayed.

I really don't understand the origins of your arguments here. Why
shouldn't a row lock caused by an UPDATE be relevant? It's currenty very
hard to debug those, just as it's hard to debug tuple/row locks caused
by explicit FOR SHARE/UPDATE/...
And if your argument is that the message might be displayed when a
explicit query cancel (statement timeout) instead of a deadlock_timeout
arrives, so what? It's still correct, and important information? After
all it seems to have waited long enough to get cancelled.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal

2014-03-03 Thread Albe Laurenz

 I'm applying for GSoC 2014 with Postgresql and would appreciate your comments 
 on my proposal
 (attached). I'm looking for technical corrections/comments and your opinions 
 on the project's
 viability. In particular, if the community has doubts about its usefulness, I 
 would start working on
 an extra proposal from https://wiki.postgresql.org/wiki/GSoC_2014, perhaps on 
 the RETURNING clause as
 a student named Karlik did last year.

I am sure that Simon had his reasons when he proposed
http://www.postgresql.org/message-id/CA+U5nMJGgJNt5VXqkR=crtdqxfmuyzwef23-fd5nusns+6n...@mail.gmail.com
but I cannot help asking some questions:

1) Why limit the feature to UTF8 strings?
   Shouldn't the technique work for all multibyte server encodings?

2) There is probably something that makes this necessary, but why should the 
decision
   how toast is sliced be attached to the data type?
   My (probably naive) idea would be to add a new TOAST strategy (e.g. SLICED)
   to PLAIN, MAIN, EXTERNAL and EXTENDED.

The feature only makes sense for string data types, right?

Yours,
Laurenz Albe   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gaussian distribution pgbench

2014-03-03 Thread KONDO Mitsumasa


(2014/03/03 16:51), Fabien COELHO wrote:\setrandom foo 1 10 [uniform]
\setrandom foo 1 :size gaussian 3.6
\setrandom foo 1 100 exponential 7.2
 It's good design. I think it will become more low overhead at part of parsing
 in pgbench, because comparison of strings will be redeced(maybe). And I'd 
like
 to remove [uniform], beacause we have to have compatibility for old scripts,
 and random function always gets uniform distribution in common sense of
 programming.

 I just put uniform as an optional default, hence the brackets.
All right. I was misunderstanding. However, if we select this format, I'd like to 
remove it. Because pgbench needs to check counts of argment number. If we allow 
brackets, it will not be simple.


 Otherwise, what I would have in mind if this would be designed from scratch:

\set foo 124
\set foo string value (?)
\set foo :variable
\set foo 12 + :shift

 And then

\set foo uniform 1 10
\set foo gaussian 1 10 4.2
\set foo exponential 1 100 5.2

 or maybe functions could be repended with something like uniform.
 But that would be for another life:-)
I don't agree with that.. They are more overhead in parsing part and more complex 
for user.


 However, new grammer is little bit long in user script. It seems trade-off 
that
 are visibility of scripts and user writing cost.

 Yep.
OK. I'm not sure which idia is the best. So I wait for comments in community:)

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: Patch FORCE_NULL option for copy COPY in CSV mode

2014-03-03 Thread Michael Paquier

On Mon, Mar 3, 2014 at 1:13 PM, Andrew Dunstan and...@dunslane.net wrote:
 That difference actually made the file_fdw regression results plain
 wrong,
 in my view, in that they expected a quoted empty string to be turned to
 null
 even when the null string was something else.

 I've adjusted this and the docs and propose to apply the attached patch
 in
 the next day or two unless there are any objections.

 Unless I'm overlooking something, output from SELECT * FROM text_csv;
 in 'output/file_fdw.source' still needs updating?
While reading this patch, I found a small typo in copy2.[sql|out] =
s/violiaton/violation/g
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas

On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
 While reading around which references to SnapshotData's members exist, I
 once more came about the following tidbit in heapgetpage():
 /*
  * If the all-visible flag indicates that all tuples on the page are
  * visible to everyone, we can skip the per-tuple visibility tests.
  *
  * Note: In hot standby, a tuple that's already visible to all
  * transactions in the master might still be invisible to a read-only
  * transaction in the standby. We partly handle this problem by 
 tracking
  * the minimum xmin of visible tuples as the cut-off XID while 
 marking a
  * page all-visible on master and WAL log that along with the 
 visibility
  * map SET operation. In hot standby, we wait for (or abort) all
  * transactions that can potentially may not see one or more tuples 
 on the
  * page. That's how index-only scans work fine in hot standby. A 
 crucial
  * difference between index-only scans and heap scans is that the
  * index-only scan completely relies on the visibility map where as 
 heap
  * scan looks at the page-level PD_ALL_VISIBLE flag. We are not sure 
 if
  * the page-level flag can be trusted in the same way, because it 
 might
  * get propagated somehow without being explicitly WAL-logged, e.g. 
 via a
  * full page write. Until we can prove that beyond doubt, let's check 
 each
  * tuple for visibility the hard way.
  */
 all_visible = PageIsAllVisible(dp)  !snapshot-takenDuringRecovery;

 I don't think this is neccessary = 9.2. The are two only interestings place
 where PD_ALL_VISIBLE is set:
 a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
PD_ALL_VISIBLE/the vm is touched and that causes recovery
conflicts. The heap page is locked for cleanup at that point. As the
logging of xl_heap_clean sets the page's LSN there's no way the page
can appear on the standby too early.
 b) empty pages in lazy_scan_heap(). If they always were empty, there's
no need for conflicts. The only other way I can see to end up there
is a previous heap_page_prune() that repaired fragmentation. But that
logs a WAL record with conflict information.

I don't think there's any reason to believe that lazy_scan_heap() can
only hit pages that are empty or have just been defragged.  Suppose
that there's a tuple on the page which was recently inserted; the
inserting transaction has committed but there are some backends that
still have older snapshots.  The page won't be marked all-visible
because it isn't.  Now, eventually those older snapshots will go away,
and sometime after that the relation will get vacuumed again, and
we'll once again look the page.  But this time we notice that it is
all-visible, and mark it so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Defining macro LSNOID for pg_lsn in pg_type.h

2014-03-03 Thread Robert Haas

On Thu, Feb 27, 2014 at 8:12 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 When working on the datatype pg_lsn, we actually did not create a
 define macro for its oid in pg_type.h and this could be useful for
 extension developers. The simple patch attached corrects that by
 naming this macro LSNOID.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Andres Freund

On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  While reading around which references to SnapshotData's members exist, I
  once more came about the following tidbit in heapgetpage():
  /*
   * If the all-visible flag indicates that all tuples on the page are
   * visible to everyone, we can skip the per-tuple visibility tests.
   *
   * Note: In hot standby, a tuple that's already visible to all
   * transactions in the master might still be invisible to a 
  read-only
   * transaction in the standby. We partly handle this problem by 
  tracking
   * the minimum xmin of visible tuples as the cut-off XID while 
  marking a
   * page all-visible on master and WAL log that along with the 
  visibility
   * map SET operation. In hot standby, we wait for (or abort) all
   * transactions that can potentially may not see one or more tuples 
  on the
   * page. That's how index-only scans work fine in hot standby. A 
  crucial
   * difference between index-only scans and heap scans is that the
   * index-only scan completely relies on the visibility map where as 
  heap
   * scan looks at the page-level PD_ALL_VISIBLE flag. We are not 
  sure if
   * the page-level flag can be trusted in the same way, because it 
  might
   * get propagated somehow without being explicitly WAL-logged, e.g. 
  via a
   * full page write. Until we can prove that beyond doubt, let's 
  check each
   * tuple for visibility the hard way.
   */
  all_visible = PageIsAllVisible(dp)  
  !snapshot-takenDuringRecovery;
 
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.
 
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

Hm, right, that can happen. How about adding a LSN interlock in
visibilitymap_set() for those cases as well, not just for checksums?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dump pageinspect to 1.2: page_header with pg_lsn datatype

2014-03-03 Thread Robert Haas

On Thu, Feb 27, 2014 at 8:05 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Feb 26, 2014 at 5:45 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Robert Haas escribió:
 On Mon, Feb 24, 2014 at 9:34 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Yeah, erroring out seems good enough.  Particularly if you add a hint
  saying please upgrade the extension.

 In past instances where this has come up, we have actually made the
 .so backward-compatible.  See pg_stat_statements in particular.  I'd
 prefer to keep to that precedent here.

 But pg_stat_statement is a user tool which is expected to have lots of
 use, and backwards compatibility concerns because of people who might
 have written tools on top of it.  Not so with pageinspect.  I don't
 think we need to put in the same amount of effort.  (Even though,
 really, it's probably not all that difficult to support both cases.  I
 just don't see the point.)
 Actually a little bit of hacking I noticed that supporting both is as
 complicated as supporting only pg_lsn... Here is for example how I can
 get page_header to work across versions:
 -   snprintf(lsnchar, sizeof(lsnchar), %X/%X,
 -(uint32) (lsn  32), (uint32) lsn);

 -   values[0] = CStringGetTextDatum(lsnchar);
 +   /*
 +* Do some version-related checks. pageinspect = 1.2 uses pg_lsn
 +* instead of text when using this function for the LSN field.
 +*/
 +   if (tupdesc-attrs[0]-atttypid == TEXTOID)
 +   {
 +   charlsnchar[64];
 +   snprintf(lsnchar, sizeof(lsnchar), %X/%X,
 +(uint32) (lsn  32), (uint32) lsn);
 +   values[0] = CStringGetTextDatum(lsnchar);
 +   }
 +   else
 +   values[0] = LSNGetDatum(lsn);
 In this case an upgraded 9.4 server using pageinspect 1.1 or older
 simply goes through the text datatype path... You can find that in the
 patch attached.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 7:07 AM, Andres Freund and...@2ndquadrant.com wrote:
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

 Hm, right, that can happen. How about adding a LSN interlock in
 visibilitymap_set() for those cases as well, not just for checksums?

Well, if I'm correctly understanding what you're proposing, that would
involve emitting an FPI for each page when we vacuum the
newly-inserted portion of an insert-only table.  That's been
repeatedly proposed in the past, but I've opposed it on the grounds
that it makes vacuum much more expensive in such cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Defining macro LSNOID for pg_lsn in pg_type.h

2014-03-03 Thread Michael Paquier

On Mon, Mar 3, 2014 at 9:06 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 27, 2014 at 8:12 AM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 When working on the datatype pg_lsn, we actually did not create a
 define macro for its oid in pg_type.h and this could be useful for
 extension developers. The simple patch attached corrects that by
 naming this macro LSNOID.

 Thanks, committed.
Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VACUUM FULL/CLUSTER doesn't update pg_class's pg_class.relfrozenxid

2014-03-03 Thread Robert Haas

On Thu, Feb 27, 2014 at 1:06 PM, Andres Freund and...@2ndquadrant.com wrote:
 As Robert previously complained a database wide VACUUM FULL now (as of
 3cff1879f8d03) reliably increases the relfrozenxid for all tables but
 pg_class itself. That's a bit sad because it means doing a VACUUM FULL
 won't help in a anti-wraparound scenario.

 The reason for that is explained by the following comment:
 /*
  * Update the tuples in pg_class --- unless the target relation of the
  * swap is pg_class itself.  In that case, there is zero point in 
 making
  * changes because we'd be updating the old data that we're about to 
 throw
  * away.  Because the real work being done here for a mapped relation 
 is
  * just to change the relation map settings, it's all right to not 
 update
  * the pg_class rows in this case.
  */

 I think the easiest fix for that is to update pg_class' relfrozenxid in
 finish_heap_swap() after the indexes have been rebuilt, that's just a
 couple of lines. There's more complex solutions that'd avoid the need
 for that special case, but I it's sufficient. A patch doing that is
 attached.

So, this patch is obviously after the final CommitFest deadline, but
I'd like to commit it to 9.4 anyway on admittedly-arguable theory that
it's tying up a loose end introduced by
3cff1879f8d03cb729368722ca823a4bf74c0cac.  Prior to that commit,
VACUUM FULL and CLUSTER *never* updated relfrozenxid; beginning with
that commit, they do so for all relations except pg_class.  This
tidies up that omission.

And I think that's a pretty worthwhile thing to do, because we get
periodic reports from people who have run VACUUM FULL on a database in
danger of wraparound and then wondered why it did not fix the problem.
 The previously-mentioned commit did most of the heavy lifting as far
as tidying that up, but without this adjustment it won't quite get us
over the hump.

But all that having been said, a deadline is a deadline, so if anyone
wishes to declare this untimely please speak up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: VACUUM FULL/CLUSTER doesn't update pg_class's pg_class.relfrozenxid

2014-03-03 Thread Andres Freund

On 2014-03-03 07:52:23 -0500, Robert Haas wrote:
 And I think that's a pretty worthwhile thing to do, because we get
 periodic reports from people who have run VACUUM FULL on a database in
 danger of wraparound and then wondered why it did not fix the problem.
  The previously-mentioned commit did most of the heavy lifting as far
 as tidying that up, but without this adjustment it won't quite get us
 over the hump.
 
 But all that having been said, a deadline is a deadline, so if anyone
 wishes to declare this untimely please speak up.

Now, I am obviously not impartial here, but I think it doesn't make
sense to whack this code around in two releases if not necessary.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-03-03 Thread Andrew Dunstan



On 03/03/2014 02:00 AM, Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:

The only way I can see this being of real use to an attacker is if they
could use this exploit to create a wormed version of PostgresQL on the
target build system.  Is that possible?

It's theoretically possible, since having broken into the build user's
account they could modify the already-built-but-not-yet-packaged PG
executables.

Having said that, though, I concur with the feeling that this probably
isn't a useful exploit in practice.  On Red Hat's build systems, for
example, different packages are built in different chroots.  So even if
a malicious package is being built concurrently, it could not reach the
postmaster's socket.  A breakin would only be possible for somebody who
had outside-the-chroots control of the build machine ... in which case
they can hack pretty much any built package pretty much any way they
want, without need for anything as fiddly as this.

Other vendors might do things differently, but it still seems likely
that there would be easier exploits available to anyone who's managed
to get control on a machine used for package building.





I'm less worried about vendor build systems and more about roll your own 
systems like Gentoo, FreeBSD ports, and Homebrew.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Andres Freund

On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.
 
 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

Right now I am missing how this isn't an actual correctness problem
after a crash. Without an LSN interlock we could crash *after* the heap
page has been written out, but *before* the vm WAL record has been
flushed to disk. Combined with synchronous_commit=off there could be
transactions that appeared as safely committed for vacuum (i.e. are
below GetOldestXmin()), but which are actually aborted after the
commit.
Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
but that doesn't work here.

Am I missing something?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers on foreign tables

2014-03-03 Thread Kohei KaiGai

I tried to check the latest (v8) patch again, then could not find
problem in your design change from the v7.
As Noah pointed out, it uses per query-depth tuplestore being released
on AfterTriggerEndSubXact.

So, may I mark it as ready for committer?

2014-03-03 15:48 GMT+09:00 Ronan Dunklau ronan.dunk...@dalibo.com:
 Hello.

 Did you have time to review the latest version of this patch ? Is there
 anything I can do to get this ready for commiter ?

 Thank you for all the work performed so far.




 Le mardi 4 février 2014 13:16:22 Ronan Dunklau a écrit :
 Le lundi 3 février 2014 23:28:45 Noah Misch a écrit :
  On Sun, Feb 02, 2014 at 11:53:51AM +0100, Ronan Dunklau wrote:
   Le jeudi 30 janvier 2014 14:05:08 Noah Misch a écrit :
On Thu, Jan 23, 2014 at 03:17:35PM +0100, Ronan Dunklau wrote:
 What do you think about this approach ? Is there something I missed
 which
 would make it not sustainable ?
   
Seems basically reasonable.  I foresee multiple advantages from having
one
tuplestore per query level as opposed to one for the entire
transaction.
You would remove the performance trap of backing up the tuplestore by
rescanning. It permits reclaiming memory and disk space in
AfterTriggerEndQuery() rather than at end of transaction.  You could
remove
ate_ptr1 and ate_ptr2 from AfterTriggerEventDataFDW and just store the
flags word: depending on AFTER_TRIGGER_2CTIDS, grab either the next
one
or
the next two tuples from the tuplestore.  Using work_mem per
AfterTriggerBeginQuery() instead of per transaction is no problem.
What
do
you think of that design change?
  
   I agree that this design is better, but I have some objections.
  
   We can remove ate_ptr2 and rely on the AFTER_TRIGGER_2CTIDS flag, but
   the
   rescanning and ate_ptr1 (renamed ate_tupleindex in the attached patch)
   can't go away.
  
   Consider for example the case of a foreign table with more than one
   AFTER
   UPDATE triggers. Unless we store the tuples once for each trigger, we
   will
   have to rescan the tuplestore.
 
  Will we?  Within a given query level, when do (non-deferred) triggers
  execute in an order other than the enqueue order?

 Let me explain what I had in mind.

 Looking at the code in AfterTriggerSaveEvent:

 - we build a template AfterTriggerEvent, and store the tuple(s)
 - for each suitable after trigger that matches the trigger type, as well as
 the WHEN condition if any, a copy of the previously built AfterTriggerEvent
 is queued

 Later, those events are fired in order.

 This means that more than one event can be fired for one tuple.

 Take this example:

 CREATE TRIGGER trig_row_after1
 AFTER UPDATE ON rem2
 FOR EACH ROW
 WHEN (NEW.f1 % 5  3)
 EXECUTE PROCEDURE trigger_func('TRIG1');

 CREATE TRIGGER trig_row_after2
 AFTER UPDATE ON rem2
 FOR EACH ROW
 WHEN (NEW.f1 % 5  4)
 EXECUTE PROCEDURE trigger_func('TRIG2');

 UPDATE rem2 set f2 = 'something';

 Assuming 5 rows with f1 as a serial, the fired AfterTriggerEvent's
 ate_tupleindex will be, in that order. Ass

 0-0-2-2-4-8-8

 So, at least a backward seek is required for trig_row_after2 to be able to
 retrieve a tuple that was already consumed when firing trig_row_after1.

 On a side note, this made me realize that it is better to avoid storing a
 tuple entirely if there is no enabled trigger (the f1 = 4 case above). The
 attached patch does that, so the previous sequence becomes:

 0-0-2-2-4-6-6

 It also prevents from initalizing a tuplestore at all if its not needed.

   To mitigate the effects of this behaviour, I added the option to perform
   a
   reverse_seek when the looked-up tuple is nearer from the current index
   than
   from the start.
 
  If there's still a need to seek within the tuplestore, that should get rid
  of the O(n^2) effect.  I'm hoping that per-query-level tuplestores will
  eliminate the need to seek entirely.

 I think the only case when seeking is still needed is when there are more
 than one after trigger that need to be fired, since the abovementioned
 change prevents from seeking to skip tuples.

If you do pursue that change, make sure the code still does the right
thing
when it drops queued entries during subxact abort.
  
   I don't really understand what should be done at that stage. Since
   triggers on foreign tables are not allowed to be deferred, everything
   should be cleaned up at the end of each query, right ? So, there
   shouldn't be any queued entries.
 
  I suspect that's right.  If you haven't looked over
  AfterTriggerEndSubXact(), please do so and ensure all its actions still
  make sense in the context of this new kind of trigger storage.

 You're right, I missed something here. When aborting a subxact, the
 tuplestores for queries below the subxact query depth should be cleaned, if
 any, because AfterTriggerEndQuery has not been called for the failing query.

 The attached patch fixes that.

 The attached

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Heikki Linnakangas


On 02/16/2014 01:51 PM, Amit Kapila wrote:

On Wed, Feb 5, 2014 at 5:29 PM, Heikki Linnakangas
hlinnakan...@vmware.com  wrote:

I'm pretty sure the overhead of that would be negligible, so we could always
enable it. There are certainly a lot of scenarios where prefix/suffix
detection alone wouldn't help, but so what.

Attached is a quick patch for that, if you want to test it.

I have updated the patch to correct few problems, addressed review comments
by Andres and done some optimizations to improve CPU overhead for worst
case.


Thanks. I have to agree with Robert though that using the pglz encoding 
when we're just checking for a common prefix/suffix is a pretty crappy 
way of going about it [1].


As the patch stands, it includes the NULL bitmap when checking for a 
common prefix. That's probably not a good idea, because it defeats the 
prefix detection in a the common case that you update a field from NULL 
to not-NULL or vice versa.


Attached is a rewritten version, which does the prefix/suffix tests 
directly in heapam.c, and adds the prefix/suffix lengths directly as 
fields in the WAL record. If you could take one more look at this 
version, to check if I've missed anything.


This ought to be tested with the new logical decoding stuff as it 
modified the WAL update record format which the logical decoding stuff 
also relies, but I don't know anything about that.


[1] 
http://www.postgresql.org/message-id/ca+tgmozstdqdku7dhcljchvmbrh1_yfouse+fuxesevnc4j...@mail.gmail.com


- Heikki
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index de4befa..cf23db5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6594,10 +6594,15 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xl_heap_header_len xlhdr;
 	xl_heap_header_len xlhdr_idx;
 	uint8		info;
+	uint16		prefix_suffix[2];
+	uint16		prefixlen = 0,
+suffixlen = 0;
 	XLogRecPtr	recptr;
-	XLogRecData rdata[7];
+	XLogRecData rdata[9];
 	Page		page = BufferGetPage(newbuf);
 	bool		need_tuple_data = RelationIsLogicallyLogged(reln);
+	int			nr;
+	Buffer		newbufref;
 
 	/* Caller should not call me on a non-WAL-logged relation */
 	Assert(RelationNeedsWAL(reln));
@@ -6607,6 +6612,54 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	else
 		info = XLOG_HEAP_UPDATE;
 
+	/*
+	 * If the old and new tuple are on the same page, we only need to log
+	 * the parts of the new tuple that were changed.  That saves on the amount
+	 * of WAL we need to write.  Currently, we just count any unchanged bytes
+	 * in the beginning or end of the tuples. That's quick to check, and
+	 * perfectly covers the common case that only one field is updated.
+	 *
+	 * We could do this even if the old and new tuple are on different pages,
+	 * but only if we don't make a full-page image of the old page, which is
+	 * difficult to know in advance.  Also, if the old tuple is corrupt for
+	 * some reason, it would allow propagate the corruption to the new page,
+	 * so it seems best to avoid.  Under the general assumption that for long
+	 * runs most updates tend to create new tuple version on same page, there
+	 * should not be significant impact on WAL reduction or performance.
+	 *
+	 * Skip this if we're taking a full-page image of the new page, as we don't
+	 * include the new tuple in the WAL record in that case.
+	 */
+	if (oldbuf == newbuf  (need_tuple_data || !XLogCheckBufferNeedsBackup(newbuf)))
+	{
+		char	   *oldp = (char *) oldtup-t_data + oldtup-t_data-t_hoff;
+		char	   *newp = (char *) newtup-t_data + newtup-t_data-t_hoff;
+		int			oldlen = oldtup-t_len - oldtup-t_data-t_hoff;
+		int			newlen = newtup-t_len - newtup-t_data-t_hoff;
+
+		/* Check for common prefix between old and new tuple */
+		for (prefixlen = 0; prefixlen  Min(oldlen, newlen); prefixlen++)
+		{
+			if (newp[prefixlen] != oldp[prefixlen])
+break;
+		}
+		/*
+		 * Storing the length of the prefix takes 2 bytes, so we need to save
+		 * at least 3 bytes or there's no point.
+		 */
+		if (prefixlen  3)
+			prefixlen = 0;
+
+		/* Same for suffix */
+		for (suffixlen = 0; suffixlen  Min(oldlen, newlen) - prefixlen; suffixlen++)
+		{
+			if (newp[newlen - suffixlen - 1] != oldp[oldlen - suffixlen - 1])
+break;
+		}
+		if (suffixlen  3)
+			suffixlen = 0;
+	}
+
 	xlrec.target.node = reln-rd_node;
 	xlrec.target.tid = oldtup-t_self;
 	xlrec.old_xmax = HeapTupleHeaderGetRawXmax(oldtup-t_data);
@@ -6619,41 +6672,119 @@ log_heap_update(Relation reln, Buffer oldbuf,
 	xlrec.newtid = newtup-t_self;
 	if (new_all_visible_cleared)
 		xlrec.flags |= XLOG_HEAP_NEW_ALL_VISIBLE_CLEARED;
+	if (prefixlen  0)
+		xlrec.flags |= XLOG_HEAP_PREFIX_FROM_OLD;
+	if (suffixlen  0)
+		xlrec.flags |= XLOG_HEAP_SUFFIX_FROM_OLD;
 
-	rdata[0].data = (char *) xlrec;
-	rdata[0].len = SizeOfHeapUpdate;
-	rdata[0].buffer = InvalidBuffer;
+	/* If new tuple is the single and first tuple on page... */
+	if (ItemPointerGetOffsetNumber((newtup-t_self)) ==

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Andres Freund

On 2014-03-03 16:27:05 +0200, Heikki Linnakangas wrote:
 Thanks. I have to agree with Robert though that using the pglz encoding when
 we're just checking for a common prefix/suffix is a pretty crappy way of
 going about it [1].
 
 As the patch stands, it includes the NULL bitmap when checking for a common
 prefix. That's probably not a good idea, because it defeats the prefix
 detection in a the common case that you update a field from NULL to not-NULL
 or vice versa.
 
 Attached is a rewritten version, which does the prefix/suffix tests directly
 in heapam.c, and adds the prefix/suffix lengths directly as fields in the
 WAL record. If you could take one more look at this version, to check if
 I've missed anything.

Have you rerun the benchmarks? I'd guess the CPU overhead of this
version is lower than earlier versions, but seing it tested won't be a
bad idea.

 This ought to be tested with the new logical decoding stuff as it modified
 the WAL update record format which the logical decoding stuff also relies,
 but I don't know anything about that.

Hm, I think all it needs to do disable delta encoding if
need_tuple_data (which is dependent on wal_level=logical).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Merlin Moncure

On Fri, Feb 28, 2014 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-28 14:45:29 -0500, Andrew Dunstan wrote:
 Well, the jsonb portion of this is arguably the most ready, certainly it's
 had a lot more on-list review.

 Having crossread both patches I tend to agree with this. I don't think
 it's unrealistic to get jsonb committable, but the hstore bits are
 another story.

hm, do you have any specific concerns/objections about hstore?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Andres Freund

On 2014-03-03 08:57:59 -0600, Merlin Moncure wrote:
 On Fri, Feb 28, 2014 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-02-28 14:45:29 -0500, Andrew Dunstan wrote:
  Well, the jsonb portion of this is arguably the most ready, certainly it's
  had a lot more on-list review.
 
  Having crossread both patches I tend to agree with this. I don't think
  it's unrealistic to get jsonb committable, but the hstore bits are
  another story.
 
 hm, do you have any specific concerns/objections about hstore?

I've listed a fair number in various emails, some have been addressed
since I think. But just take a look at the patch, at least last when I
looked, it was simply far from ready. And it's quite a bit of code, so
it's not something that can be addressed within 5min.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers on foreign tables

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 11:10:30PM +0900, Kohei KaiGai wrote:
 I tried to check the latest (v8) patch again, then could not find
 problem in your design change from the v7.
 As Noah pointed out, it uses per query-depth tuplestore being released
 on AfterTriggerEndSubXact.
 
 So, may I mark it as ready for committer?

Yes.  Re-reviewing this is next on my list.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Oleg Bartunov

Andres,

you can always look at our development repository:
https://github.com/feodor/postgres/tree/hstore - hstore only,
https://github.com/feodor/postgres/tree/jsonb_and_hstore - hstore with jsonb

Since we were concentrated on the jsonb_and_hstore branch we usually
wait Andrew, who publish patch.  You last issues were addressed in
both branches.

Oleg

PS.

We are not native-english and may not well inderstand your criticism
well, but  please try to be a little bit polite.  We are working
together and our common goal is to make postgres better.  Your notes
are very important for quality of postgres, but sometimes you drive us
...

On Mon, Mar 3, 2014 at 7:00 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-03 08:57:59 -0600, Merlin Moncure wrote:
 On Fri, Feb 28, 2014 at 2:01 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-02-28 14:45:29 -0500, Andrew Dunstan wrote:
  Well, the jsonb portion of this is arguably the most ready, certainly it's
  had a lot more on-list review.
 
  Having crossread both patches I tend to agree with this. I don't think
  it's unrealistic to get jsonb committable, but the hstore bits are
  another story.

 hm, do you have any specific concerns/objections about hstore?

 I've listed a fair number in various emails, some have been addressed
 since I think. But just take a look at the patch, at least last when I
 looked, it was simply far from ready. And it's quite a bit of code, so
 it's not something that can be addressed within 5min.

 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Robert Haas

On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Removing SELECT privilege while running a SELECT would be a different
 matter.  This is all a matter of definition; we can make up any rules
 we like. Doing so is IMHO a separate patch and not something to hold
 up the main patch.

So I think this is an interesting point.  There are various things
that could go wrong as a result of using the wrong lock level.  Worst
would be that the server crashes or corrupts data.  A little less bad
would be that sessions error out with inexplicable error conditions,
as in SnapshotNow days.  Alternatively, we could just have arguably
wrong behavior, like basing query results on the old version of the
table's metadata even after it's been changed.

I don't really care about that second category of behavior.  If
somebody changes some property of a table and existing sessions
continue to use the old value until eoxact, well, we can argue about
that, but at least until we have concrete reports of really
undesirable behavior, I don't think it's the primary issue.  What I'm
really concerned about is whether there are other things like the
SnapshotNow issues that can cause stuff to halt and catch fire.  I
don't know whether there are or are not, but that's my concern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Andres Freund

Hi Oleg,

On 2014-03-03 19:17:12 +0400, Oleg Bartunov wrote:
 Since we were concentrated on the jsonb_and_hstore branch we usually
 wait Andrew, who publish patch.  You last issues were addressed in
 both branches.

I'll try to have look sometime soon.

 We are not native-english and may not well inderstand your criticism
 well, but  please try to be a little bit polite.  We are working
 together and our common goal is to make postgres better.  Your notes
 are very important for quality of postgres, but sometimes you drive us
 ...

I am sorry if I came over as impolite. I just tried to point at things I
thought needed improvement, and imo there were quite some. A patch
needing polishing isn't something that carries shame, blame or
anything. It's just a state a patch can be in.

Greetings,

Andres Freund

PS: Not a native speaker either...

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Oleg Bartunov

On Mon, Mar 3, 2014 at 7:22 PM, Andres Freund and...@2ndquadrant.com wrote:
 Hi Oleg,

 On 2014-03-03 19:17:12 +0400, Oleg Bartunov wrote:
 Since we were concentrated on the jsonb_and_hstore branch we usually
 wait Andrew, who publish patch.  You last issues were addressed in
 both branches.

 I'll try to have look sometime soon.

 We are not native-english and may not well inderstand your criticism
 well, but  please try to be a little bit polite.  We are working
 together and our common goal is to make postgres better.  Your notes
 are very important for quality of postgres, but sometimes you drive us
 ...

 I am sorry if I came over as impolite. I just tried to point at things I
 thought needed improvement, and imo there were quite some. A patch
 needing polishing isn't something that carries shame, blame or
 anything. It's just a state a patch can be in.

We have not so much time to go deep onto 100th messages threads and sometimes
just lost directions.


 Greetings,

 Andres Freund

 PS: Not a native speaker either...

That's explain all :)



 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Robert Haas

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 I don't see that parallelizing Append is any easier than any other
 problem in this space.  There's no parallel I/O facility, so you need
 a background worker per append branch to wait on I/O.  And you have
 all the problems of making sure that the workers have the same
 snapshot, making sure they don't self-deadlock, etc. that you have for
 any other case.

 Erm, my thought was to use a select() loop which sends out I/O requests
 and then loops around waiting to see who finishes it.  It doesn't
 parallelize the CPU cost of getting the rows back to the caller, but
 it'd at least parallelize the I/O, and if what's underneath is actually
 a remote FDW running a complex query (because the other side is actually
 a view), it would be a massive win to have all the remote FDWs executing
 concurrently instead of serially as we have today.

I can't really make sense of this.  In general, what's under each
branch of an append node is an arbitrary plan tree, and the only
operation you can count on being able to do for each is get me the
next tuple (i.e. ExecProcNode).  Append has no idea whether that
involves disk I/O or for what blocks.  But even if it did, there's no
standard API for issuing an asynchronous read(), which is how we get
blocks into shared buffers.  We do have an API for requesting the
prefetch of a block on platforms with posix_fadvise(), but can't find
out whether it's completed using select(), and even if you could you
still have to do the actual read() afterwards.

For FDWs, one idea might be to kick off the remote query at
ExecInitNode() time rather than ExecProcNode() time, at least if the
remote query doesn't depend on parameters that aren't available until
run time.  That actually would allow multiple remote queries to run
simultaneously or in parallel with local work.  It would also run them
in cases where the relevant plan node is never executed, which would
be bad but perhaps rare enough not to worry about.  Or we could add a
new API like ExecPrefetchNode() that tells nodes to prepare to have
tuples pulled, and they can do things like kick off asynchronous
queries.  But I still don't see any clean way for the Append node to
find out which one is ready to return results first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] commit fest status and release timeline

2014-03-03 Thread Tomas Vondra

On 1.3.2014 18:01, Peter Eisentraut wrote:
 Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for 
 Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected:
 4. Total: 114.
 
 We're still on track to achieve about 50% committed patches, which
 would be similar to the previous few commit fests. So decent job so
 far.

I'm wondering what is the best way to select a patch to review. I mean,
there are many patches with needs review (and often no reviewer) just
one or two comments, but when I checked the email archives there's often
a lot people discussing it.

Do we have a list of patches that didn't get a proper review yet / badly
need another one?

What about improving the commitfest page by displaying a number of
related e-mail messages / number of people involved? Shouldn't be
difficult to get this from the mail archives ...

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Andres Freund

On 2014-03-03 10:35:03 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I think all it needs to do disable delta encoding if
  need_tuple_data (which is dependent on wal_level=logical).
 
 Why does it need to do that?  The logical decoding stuff should be
 able to reverse out the delta encoding.

Against what should it perform the delta? Unless I misunderstand how the
patch works, it computes the delta against the old tuple in the heap
page?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request improve pg_stat_statements module

2014-03-03 Thread Robert Haas

On Fri, Feb 28, 2014 at 8:01 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 Thanks for your patch!

 On Fri, Feb 28, 2014 at 4:18 PM,  pgsql...@postgresql.kr wrote:
 I patched to add one column in pg_stat_statements module.
 and sent to author but
 I need a last time of query, because I want to analyse order by recent time.
 Hm... I am not sure that this brings much to pg_stat_statements, which
 is interested to gather normalized information about the queries run
 on the server. For example, things like calculating the average time
 of a query by using total_time/calls or even the total time to guess
 where is an application bottleneck is interesting on a busy system,
 while the latest timestamp is not really an information that can be
 used for query tuning. Getting the latest timestamp when a query has
 run is particularly not interesting on OLTP-like applications where
 many short transactions are running. It is more interesting to
 classify query results by either calls, total_time or the combination
 of both IMO.

FWIW, I think it's a neat idea, but I think this proposal is probably
too late for 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hm, I think all it needs to do disable delta encoding if
 need_tuple_data (which is dependent on wal_level=logical).

Why does it need to do that?  The logical decoding stuff should be
able to reverse out the delta encoding.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump reporing version of server pg_dump as comments in the output

2014-03-03 Thread Euler Taveira

On 27-02-2014 21:10, Wang, Jing wrote:
 Using pg_dump can dump the data into the file with format set to be
 'c','t' or plain text. In the existing version the version of server 
 pg_dump is already there when the format of file is 'c' or 't'. And even
 for the plain text format file the version of server  pg_dump is
 already there if using '--verbose' in pg_dump. Using '--verbose' leads
 to some many other prints which are not required always. 
 
I don't buy your argument. Why isn't verbose option sufficient? Did you
read the old thread about this [1]?

AFAICS a lot of people compare pg_dump diffs. If we apply this patch, it
would break those applications. Also, it is *already* available if you
add verbose option (which is sufficient to satisfy those that want the
client and/or server version) in plain mode (the other modes already
include the desired info by default). In the past, timestamps were
removed to avoid noise in diffs.


[1] http://www.postgresql.org/message-id/3677.1253912...@sss.pgh.pa.us


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost sfr...@snowman.net wrote:
  Erm, my thought was to use a select() loop which sends out I/O requests
  and then loops around waiting to see who finishes it.  It doesn't
  parallelize the CPU cost of getting the rows back to the caller, but
  it'd at least parallelize the I/O, and if what's underneath is actually
  a remote FDW running a complex query (because the other side is actually
  a view), it would be a massive win to have all the remote FDWs executing
  concurrently instead of serially as we have today.
 
 I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:
http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net

It'd clearly be more involved than just build a select() loop and
would require adding an async mechanism.  I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers.  Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.

 For FDWs, one idea might be to kick off the remote query at
 ExecInitNode() time rather than ExecProcNode() time, at least if the
 remote query doesn't depend on parameters that aren't available until
 run time.

Right, I had speculated about that also (option #2 in my earlier email).

 That actually would allow multiple remote queries to run
 simultaneously or in parallel with local work.  It would also run them
 in cases where the relevant plan node is never executed, which would
 be bad but perhaps rare enough not to worry about.  

This was my primary concern, along with the fact that we explicitly says
don't do that in the docs for the FDW API.

 Or we could add a
 new API like ExecPrefetchNode() that tells nodes to prepare to have
 tuples pulled, and they can do things like kick off asynchronous
 queries.  But I still don't see any clean way for the Append node to
 find out which one is ready to return results first.

Yeah, that's tricky.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Simon Riggs

On 3 March 2014 15:19, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs si...@2ndquadrant.com wrote:

 What I'm
 really concerned about is whether there are other things like the
 SnapshotNow issues that can cause stuff to halt and catch fire.  I
 don't know whether there are or are not, but that's my concern.

Of course its a concern, I feel it also. But that's why we have beta
period to handle the unknowns.

The question is are there any specific areas of concern here? If not,
then we commit because we've done a lot of work on it and at the
moment the balance is high benefit to users against a non-specific
feeling of risk.

@Noah - Last call...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heapgetpage() and -takenDuringRecovery

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 8:33 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-03 06:57:00 -0500, Robert Haas wrote:
 On Sun, Mar 2, 2014 at 8:39 AM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think this is neccessary = 9.2. The are two only interestings 
  place
  where PD_ALL_VISIBLE is set:
  a) lazy_vacuum_page() where a xl_heap_clean is logged *before*
 PD_ALL_VISIBLE/the vm is touched and that causes recovery
 conflicts. The heap page is locked for cleanup at that point. As the
 logging of xl_heap_clean sets the page's LSN there's no way the page
 can appear on the standby too early.
  b) empty pages in lazy_scan_heap(). If they always were empty, there's
 no need for conflicts. The only other way I can see to end up there
 is a previous heap_page_prune() that repaired fragmentation. But that
 logs a WAL record with conflict information.

 I don't think there's any reason to believe that lazy_scan_heap() can
 only hit pages that are empty or have just been defragged.  Suppose
 that there's a tuple on the page which was recently inserted; the
 inserting transaction has committed but there are some backends that
 still have older snapshots.  The page won't be marked all-visible
 because it isn't.  Now, eventually those older snapshots will go away,
 and sometime after that the relation will get vacuumed again, and
 we'll once again look the page.  But this time we notice that it is
 all-visible, and mark it so.

 Right now I am missing how this isn't an actual correctness problem
 after a crash. Without an LSN interlock we could crash *after* the heap
 page has been written out, but *before* the vm WAL record has been
 flushed to disk.

Yes.  In that case, the PD_ALL_VISIBLE bit will be set on the page,
but the corresponding visibility map bit will be unset.

 Combined with synchronous_commit=off there could be
 transactions that appeared as safely committed for vacuum (i.e. are
 below GetOldestXmin()), but which are actually aborted after the
 commit.
 Normal hint bits circumvent that by checking XLogNeedsFlush(commitLSN),
 but that doesn't work here.

Well, we'd better not try to mark a page all-visible if it's only
all-visible on the assumption that unwritten xlog will be successfully
flushed to disk.  But lazy_scan_heap() has code that only regards the
tuple as all-visible once the tuple is hinted committed, and there's
code elsewhere to keep hint bits from being set too early.  And
heap_page_is_all_visible() follows the same pattern.  So I think it's
OK, but maybe you see something I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 10:38 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-03 10:35:03 -0500, Robert Haas wrote:
 On Mon, Mar 3, 2014 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
  Hm, I think all it needs to do disable delta encoding if
  need_tuple_data (which is dependent on wal_level=logical).

 Why does it need to do that?  The logical decoding stuff should be
 able to reverse out the delta encoding.

 Against what should it perform the delta? Unless I misunderstand how the
 patch works, it computes the delta against the old tuple in the heap
 page?

Oh, maybe I need more caffeine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On 3 March 2014 15:19, Robert Haas robertmh...@gmail.com wrote:
 What I'm
 really concerned about is whether there are other things like the
 SnapshotNow issues that can cause stuff to halt and catch fire.  I
 don't know whether there are or are not, but that's my concern.

 Of course its a concern, I feel it also. But that's why we have beta
 period to handle the unknowns.

I have exactly zero faith that beta testing would catch low-probability
problems in this area.  What's needed, and hasn't happened AFAIK, is
detailed study of the patch by assorted senior hackers.

 The question is are there any specific areas of concern here? If not,
 then we commit because we've done a lot of work on it and at the
 moment the balance is high benefit to users against a non-specific
 feeling of risk.

This is backwards.  The default decision around here has never been
to commit when in doubt.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 10:43 AM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost sfr...@snowman.net wrote:
  Erm, my thought was to use a select() loop which sends out I/O requests
  and then loops around waiting to see who finishes it.  It doesn't
  parallelize the CPU cost of getting the rows back to the caller, but
  it'd at least parallelize the I/O, and if what's underneath is actually
  a remote FDW running a complex query (because the other side is actually
  a view), it would be a massive win to have all the remote FDWs executing
  concurrently instead of serially as we have today.

 I can't really make sense of this.

 Sorry, that was a bit hand-wavey since I had posted about it previously
 here:
 http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net

Huh, somehow I can't remember reading that... but I didn't think I had
missed any posts, either.  Evidently I did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Robert Haas

On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v20 includes slightly re-ordered checks in GetLockLevel, plus more
 detailed comments on each group of subcommands.

 Also corrects grammar as noted by Vik.

 Plus adds an example of usage to the docs.

This patch contains a one line change to
src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.

This hunk in ATRewriteCatalogs() looks scary:

+   /*
+* If we think we might need to add/re-add toast tables then
+* we currently need to hold an AccessExclusiveLock.
+*/
+   if (lockmode  AccessExclusiveLock)
+   return;

It would make sense to me to add an Assert() or elog() check inside
the subsequent loop to verify that the lock level is adequate ... but
just returning silently seems like a bad idea.

I have my doubts about whether it's safe to do AT_AddInherit,
AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
can change the tuple descriptor, and we discussed, back when we did
this the first time, the fact that the executor may get *very* unhappy
if the tuple descriptor changes in mid-execution.  I strongly suspect
these are unsafe with less than a full AccessExclusiveLock.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
  http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net
 
 Huh, somehow I can't remember reading that... but I didn't think I had
 missed any posts, either.  Evidently I did.

You and everyone else- you'll note it got exactly zero responses..

Ah well. :)

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-03 Thread Robert Haas

On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote:
 2. Proposal
 As a GSoC student, I will implement WAL recovery of hash indexes using the
 other index types' WAL code as a guide. Roughly, I will:
 - Devise a way to store and retrieve hashing data within the XLog data
 structures.
 - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record)
 in hash.c, branch to code for the various redo operations: creating an
 index, inserting into an index, deleting an index, and page operations
 (split, delete, update?).
 - Code each branch by drawing on examples from btree_redo, gin_redo, and
 gist_redo, the existing XLog code of the other index types.

Unfortunately, I don't believe that it's possible to do this easily
today because of the way bucket splits are handled.  I wrote about
this previously here, with an idea for solving the problem:

http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com

Sadly, no one responded.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST support for inet datatypes

2014-03-03 Thread Robert Haas

On Thu, Feb 27, 2014 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Florian Pflug f...@phlo.org writes:
 On Feb27, 2014, at 17:56 , Tom Lane t...@sss.pgh.pa.us wrote:
 That's not a bug, it's a feature, for much the same reasons that pg_dump
 tries to minimize explicit schema-qualification.

 I fail to see the value in this for opclasses. It's certainly nice for
 schema qualifications, because dumping one schema and restoring into a
 different schema is probably quite common.

 The value in it is roughly the same as the reason we don't include a
 version number when dumping CREATE EXTENSION.  If you had a default
 opclass in the source database, you probably want a default opclass
 in the destination, even if that's not bitwise the same as what you
 had before.  The implication is that you want best practice for the
 current PG version.

I don't think that argument holds a lot of water in this instance.
The whole reason for having multiple opclasses that each one can
implement different comparison behavior.  It's unlikely that you want
an upgrade to change the comparison behavior you had before; you'd be
sad if, for example, the dump/restore process failed to preserve your
existing collation settings.

But even if that were desirable in general, suppressing it for binary
upgrade dumps certainly seems more than sane.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Simon Riggs

On 3 March 2014 15:53, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 3 March 2014 15:19, Robert Haas robertmh...@gmail.com wrote:
 What I'm
 really concerned about is whether there are other things like the
 SnapshotNow issues that can cause stuff to halt and catch fire.  I
 don't know whether there are or are not, but that's my concern.

 Of course its a concern, I feel it also. But that's why we have beta
 period to handle the unknowns.

 I have exactly zero faith that beta testing would catch low-probability
 problems in this area.  What's needed, and hasn't happened AFAIK, is
 detailed study of the patch by assorted senior hackers.

 The question is are there any specific areas of concern here? If not,
 then we commit because we've done a lot of work on it and at the
 moment the balance is high benefit to users against a non-specific
 feeling of risk.

 This is backwards.  The default decision around here has never been
 to commit when in doubt.

I'm not in doubt.

If anybody can give me some more pointers of things to look at, I will.

But I've looked and I can't see anything more.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello

Hi all,

Is the TODO item make an unlogged table logged [1] a good GSoC project?

Regards,

[1]
http://www.postgresql.org/message-id/aanlktinenzbrxdcwohkqbba2bhubfy8_c5jwrxlod...@mail.gmail.com

-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: [HACKERS] commit fest status and release timeline

2014-03-03 Thread Magnus Hagander

On Mon, Mar 3, 2014 at 4:34 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 1.3.2014 18:01, Peter Eisentraut wrote:
  Status Summary. Needs Review: 36, Waiting on Author: 7, Ready for
  Committer: 16, Committed: 43, Returned with Feedback: 8, Rejected:
  4. Total: 114.
 
  We're still on track to achieve about 50% committed patches, which
  would be similar to the previous few commit fests. So decent job so
  far.

 I'm wondering what is the best way to select a patch to review. I mean,
 there are many patches with needs review (and often no reviewer) just
 one or two comments, but when I checked the email archives there's often
 a lot people discussing it.

 Do we have a list of patches that didn't get a proper review yet / badly
 need another one?

 What about improving the commitfest page by displaying a number of
 related e-mail messages / number of people involved? Shouldn't be
 difficult to get this from the mail archives ...


I have some code for that part, that needs a coupe of rounds of final
hacking and polish. I've had many targets for it, but right now the target
is to be done before pgcon, so we can put it in play for the next set of
commitfests. It's not going to happen for *this* one, and we don't want to
distrupt the flow even more by making big changes to the tooling in the
middle of it.

That said, there is definitely a need :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Simon Riggs

On 3 March 2014 16:06, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v20 includes slightly re-ordered checks in GetLockLevel, plus more
 detailed comments on each group of subcommands.

 Also corrects grammar as noted by Vik.

 Plus adds an example of usage to the docs.

 This patch contains a one line change to
 src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.

 This hunk in ATRewriteCatalogs() looks scary:

 +   /*
 +* If we think we might need to add/re-add toast tables then
 +* we currently need to hold an AccessExclusiveLock.
 +*/
 +   if (lockmode  AccessExclusiveLock)
 +   return;

 It would make sense to me to add an Assert() or elog() check inside
 the subsequent loop to verify that the lock level is adequate ... but
 just returning silently seems like a bad idea.

OK, I will check elog.

 I have my doubts about whether it's safe to do AT_AddInherit,
 AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
 can change the tuple descriptor, and we discussed, back when we did
 this the first time, the fact that the executor may get *very* unhappy
 if the tuple descriptor changes in mid-execution.  I strongly suspect
 these are unsafe with less than a full AccessExclusiveLock.

I'm happy to change those if you feel there is insufficient evidence.

I don't personally feel that it would matter to usability to keep
locks for those at AccessExclusiveLock, especially since they are
otherwise fast.

Some others might be kept higher also. I'm merely trying to balance
between requests to reduce to minimal theoretical level and fears that
anything less than AccessExclusiveLock is a problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 11:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 3 March 2014 16:06, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 2, 2014 at 4:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 v20 includes slightly re-ordered checks in GetLockLevel, plus more
 detailed comments on each group of subcommands.

 Also corrects grammar as noted by Vik.

 Plus adds an example of usage to the docs.

 This patch contains a one line change to
 src/bin/pg_dump/pg_backup_archiver.c which seems not to belong.

 This hunk in ATRewriteCatalogs() looks scary:

 +   /*
 +* If we think we might need to add/re-add toast tables then
 +* we currently need to hold an AccessExclusiveLock.
 +*/
 +   if (lockmode  AccessExclusiveLock)
 +   return;

 It would make sense to me to add an Assert() or elog() check inside
 the subsequent loop to verify that the lock level is adequate ... but
 just returning silently seems like a bad idea.

 OK, I will check elog.

 I have my doubts about whether it's safe to do AT_AddInherit,
 AT_DropInherit, AT_AddOf, or AT_DropOf with a full lock.  All of those
 can change the tuple descriptor, and we discussed, back when we did
 this the first time, the fact that the executor may get *very* unhappy
 if the tuple descriptor changes in mid-execution.  I strongly suspect
 these are unsafe with less than a full AccessExclusiveLock.

 I'm happy to change those if you feel there is insufficient evidence.

Not sure what that means, but yes, I think the lock levels on those
need to be increased.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

I'm pretty sure we found some problems in that design that we couldn't
figure out how to solve.  I don't have a pointer to the relevant
-hackers discussion off-hand, but I think there was one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GiST support for inet datatypes

2014-03-03 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 27, 2014 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The value in it is roughly the same as the reason we don't include a
 version number when dumping CREATE EXTENSION.  If you had a default
 opclass in the source database, you probably want a default opclass
 in the destination, even if that's not bitwise the same as what you
 had before.  The implication is that you want best practice for the
 current PG version.

 I don't think that argument holds a lot of water in this instance.
 The whole reason for having multiple opclasses that each one can
 implement different comparison behavior.

Well, I doubt we'd accept a proposal to change the default opclass
of a datatype to something that had incompatible behavior --- but
we might well accept one to change it to something that had improved
behavior, such as more operators.

The first couple dozen lines in GetIndexOpClass() make for interesting
reading in this context.  That approach to cross-version compatibility
probably doesn't work in the pg_upgrade universe, of course; but what I'd
like to point out here is that those kluges wouldn't have been necessary
in the first place if pg_dump had had the suppress-default-opclasses
behavior at the time.  (No, it didn't always do that; cf commits
e5bbf1965 and 1929a90b6.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

ISTR the discussion going something along the lines of we'd have to WAL
log the entire table to do that, and if we have to do that, what's the
point?.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 07:01:09PM +0900, Kyotaro HORIGUCHI wrote:
   Yes, the old dumped version of typ2 patch did so. It flattened
   appendrel tree for the query prpoerly. Let me hear the reson you
   prefer to do so.
  
  Having reviewed my upthread reasoning for preferring one of those two
  approaches over the other, it's a weak preference.  They have similar 
  runtime
  costs.  Putting the logic with the code that creates appendrels reduces the
  number of code sites one must examine to reason about possible plan
  structures.  We might not flatten RTE_RELATION appendrel parents exactly the
  same way we flatten RTE_SUBQUERY appendrel parents.  I would tend to leave
  inh=true for the former, for reasons explained in my notes on v7, but set
  inh=false for the latter to save needless work.
 
 Unfortunately, RTE_SUBQUERY-es with their inh flag cleard might
 cause something inconvenient in preprocess_minmax_aggregates()
 and/or add_rtes_to_flat_rtable()..

preprocess_minmax_aggregates() only cares about RTEs reachable from the join
tree, and the appendrel parents made obsolete by flattening would not be
reachable from the join tree.  add_rtes_to_flat_rtable() might be unhappy.

 # I haven't found that related to sepgsql, though :-(
 
 I understood that your concern is to deal parent RTEs defferently
 according to their relkinds. But I think the inh flags could not
 be modified at all for the reason mentioned above.

That's fine, then.  It was a minor point.

If you are convinced that a separate flattening pass is best, that suffices
for me at this stage.  Please submit the patch you have in mind, incorporating
any improvements from the v7 patch that are relevant to both approaches.

 At least as of now, inheritance tables define the bottom bound of
 a appendrel tree and on the other hand complex(?)  union_alls
 define the upper bound, and both multilevel (simple)union_alls
 and inheritances are flattened individually so all possible
 inheritance tree to be collapsed by this patch is only, I think,
 
   Subquery(inh=1)[Relation-inhparent [Relation-child, child, child]]
 
  On the other hand, a flattening pass is less code overall and
  brings an attractive uniformity-by-default to the area.
 
 Focusing only on the above structure, what we should do to
 collapse this tree is only connect the childs to the Subquery
 directly, then remove all appendrelinfos connecting to the
 Relation-inhparent. inh flag need not to be modified.
 
 # Umm. I strongly suspect that I overlooked something..
 
 Then even widening to general case, the case doesn't seem to
 change. All we need to do is, link a child to its grandparent and
 isolate the parent removing apprelinfos.

I barely follow what you're saying here.  Do you speak of
union_inh_idx_typ2_v2_20131113.patch, unionall_inh_idx_typ3_v7.patch, or some
future design?  If we use a separate flattening pass, there's no small limit
on how many layers of appendrel we may need to flatten.  pull_up_subqueries()
can create many nested RTE_SUBQUERY appendrel layers; there may be more than
just child, parent and grandparent.  There's never more than one layer of
RTE_RELATION appendrel, though.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: option --if-exists for pg_dump

2014-03-03 Thread Alvaro Herrera

Pavel Stehule escribió:
 This patch has redesigned implementation --if-exists for pg_dumpall. Now it
 is not propagated to pg_dump, but used on pg_dumpall level.

Seems sane, thanks.


BTW after this patch, I still don't see an error-free output from
restoring a database on top of itself.  One problem is plpgsql, which is
now an extension, so pg_dump emits this error message:

ERROR:  cannot drop language plpgsql because extension plpgsql requires it
SUGERENCIA:  You can drop extension plpgsql instead.


Another problem is that some DROP commands don't work.  For instance, if
the public schema in the target database contains objects that haven't
been dropped yet, the DROP command will fail:

ERROR:  cannot drop schema public because other objects depend on it
DETALLE:  function bt_metap(text) depends on schema public
function bt_page_items(text,integer) depends on schema public
function bt_page_stats(text,integer) depends on schema public
function f() depends on schema public
function get_raw_page(text,integer) depends on schema public
function heap_page_items(bytea) depends on schema public
function locate_tuple_corruption() depends on schema public
function page_header(bytea) depends on schema public
SUGERENCIA:  Use DROP ... CASCADE to drop the dependent objects too.


(The way I got this was by using my 8.2 installation, on which I ran the
regression tests; then I dumped the resulting regression database.  The
database on which I restored wasn't clean, as it contained unrelated
junk in the public schema.)

Not sure what's the right answer here to this problem, but it cannot be
attributed to this patch anyway.

I'm about to push this, since other than the above problems, this
functionality seems to be working as designed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

 I'm pretty sure we found some problems in that design that we couldn't
 figure out how to solve.  I don't have a pointer to the relevant
 -hackers discussion off-hand, but I think there was one.

 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

IIRC, the reason you'd have to do that is to make the table contents
appear on slave servers.  If you don't consider replication then it might
seem easier.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC on WAL-logging hash indexes

2014-03-03 Thread Tan Tran

Thanks for alerting me to your previous idea. While I don't know enough
about Postgresql internals to judge its merits yet, I'll write some
pseudocode based on it in my proposal; and I'll relegate it to a reach
proposal alongside a more straightforward one.

Tan

On Mon, Mar 3, 2014 at 8:12 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Mar 2, 2014 at 2:38 PM, Tan Tran tankimt...@gmail.com wrote:
  2. Proposal
  As a GSoC student, I will implement WAL recovery of hash indexes using
 the
  other index types' WAL code as a guide. Roughly, I will:
  - Devise a way to store and retrieve hashing data within the XLog data
  structures.
  - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord
 *record)
  in hash.c, branch to code for the various redo operations: creating an
  index, inserting into an index, deleting an index, and page operations
  (split, delete, update?).
  - Code each branch by drawing on examples from btree_redo, gin_redo, and
  gist_redo, the existing XLog code of the other index types.

 Unfortunately, I don't believe that it's possible to do this easily
 today because of the way bucket splits are handled.  I wrote about
 this previously here, with an idea for solving the problem:


 http://www.postgresql.org/message-id/ca+tgmozymojsrfxhxq06g8jhjxqcskvdihb_8z_7nc7hj7i...@mail.gmail.com

 Sadly, no one responded.  :-(

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Hannu Krosing

On 03/03/2014 05:22 PM, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
...
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.
 IIRC, the reason you'd have to do that is to make the table contents
 appear on slave servers.  If you don't consider replication then it might
 seem easier.
So switch on logging and then perform CLUSTER/VACUUM FULL ?

Should this work, or is something extra needed ?

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund

On 2014-03-03 12:08:26 -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Mon, Mar 3, 2014 at 11:28 AM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   Is the TODO item make an unlogged table logged [1] a good GSoC project?
  
  I'm pretty sure we found some problems in that design that we couldn't
  figure out how to solve.  I don't have a pointer to the relevant
  -hackers discussion off-hand, but I think there was one.
 
 ISTR the discussion going something along the lines of we'd have to WAL
 log the entire table to do that, and if we have to do that, what's the
 point?.

I don't see that as a particularly problematic problem. The primary
reason to want to convert a unlogged to a logged table probably is that
it's easier to do so than to recreate the table + dependencies. Also the
overhead of logging full pages will be noticeably smaller than the
overhead of adding all rows individually, even if using
heap_multi_insert().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
 On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Removing SELECT privilege while running a SELECT would be a different
  matter.  This is all a matter of definition; we can make up any rules
  we like. Doing so is IMHO a separate patch and not something to hold
  up the main patch.
 
 So I think this is an interesting point.  There are various things
 that could go wrong as a result of using the wrong lock level.  Worst
 would be that the server crashes or corrupts data.  A little less bad
 would be that sessions error out with inexplicable error conditions,
 as in SnapshotNow days.  Alternatively, we could just have arguably
 wrong behavior, like basing query results on the old version of the
 table's metadata even after it's been changed.

I would order the concerns like this:

1. Data corruption
2. Transient, clearly-wrong answers without an error
3. Server crash
4. Catalog logical inconsistency
5. Inexplicable, transient errors
6. Valid behavior capable of surprising more than zero upgraders

 I don't really care about that second category of behavior.  If
 somebody changes some property of a table and existing sessions
 continue to use the old value until eoxact, well, we can argue about
 that, but at least until we have concrete reports of really
 undesirable behavior, I don't think it's the primary issue.  What I'm
 really concerned about is whether there are other things like the
 SnapshotNow issues that can cause stuff to halt and catch fire.  I
 don't know whether there are or are not, but that's my concern.

Since we can't know whether something qualifies as (2) or (6) without
analyzing it, I don't find waiting for user complaints to be a good strategy
here.  An ownership change not immediately affecting ACL checks does fall
under (6), for me.  (However, changing ownership without AccessExclusiveLock
might also create hazards in category (4) for concurrent DDL that performs
owner checks.)

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 03:43:46PM +, Simon Riggs wrote:
 The question is are there any specific areas of concern here? If not,
 then we commit because we've done a lot of work on it and at the
 moment the balance is high benefit to users against a non-specific
 feeling of risk.
 
 @Noah - Last call...

I am not specifically aware of any outstanding problems.  I have planned to
give this a close look, but it will be at least two weeks before I dig out far
enough to do so.  If that makes it a post-commit review, so be it.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-03-03 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 If you are convinced that a separate flattening pass is best, that suffices
 for me at this stage.  Please submit the patch you have in mind, incorporating
 any improvements from the v7 patch that are relevant to both approaches.

I went back and re-read the original message, and this time it struck me
that really the issue here is that add_child_rel_equivalences() doesn't
think it has to deal with the case of a parent rel being itself a child.
That's not inherent though; it's just that it didn't occur to me at the
time that such a situation could arise.  It takes only very small changes
to allow that to happen.  

If you do that, as in the attached changes to equivclass.c, you get
could not find pathkey item to sort errors from createplan.c; but
that's just because create_merge_append_plan() is likewise not expecting
the mergeappend's parent rel to be itself a child.  Fix for that is
a one-liner, ie, pass down relids.

That gets you to a point where the code generates a valid plan, but it's
got nested MergeAppends, which are unnecessary expense.  Kyotaro-san's
original fix for that was overcomplicated.  It's sufficient to teach
accumulate_append_subpath() to flatten nested MergeAppendPaths.

In short, the attached patch seems to fix it, for a lot less added
complication than anything else that's been discussed on this thread.
I've only lightly tested it (it could use a new regression test case),
but unless someone can find a flaw I think we should use this approach.

regards, tom lane

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 03be7b1..5777cb2 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*** get_cheapest_parameterized_child_path(Pl
*** 1021,1030 
   * accumulate_append_subpath
   *		Add a subpath to the list being built for an Append or MergeAppend
   *
!  * It's possible that the child is itself an Append path, in which case
!  * we can cut out the middleman and just add its child paths to our
!  * own list.  (We don't try to do this earlier because we need to
!  * apply both levels of transformation to the quals.)
   */
  static List *
  accumulate_append_subpath(List *subpaths, Path *path)
--- 1021,1035 
   * accumulate_append_subpath
   *		Add a subpath to the list being built for an Append or MergeAppend
   *
!  * It's possible that the child is itself an Append or MergeAppend path, in
!  * which case we can cut out the middleman and just add its child paths to
!  * our own list.  (We don't try to do this earlier because we need to apply
!  * both levels of transformation to the quals.)
!  *
!  * Note that if we omit a child MergeAppend in this way, we are effectively
!  * omitting a sort step, which seems fine: if the parent is to be an Append,
!  * its result would be unsorted anyway, while if the parent is to be a
!  * MergeAppend, there's no point in a separate sort on a child.
   */
  static List *
  accumulate_append_subpath(List *subpaths, Path *path)
*** accumulate_append_subpath(List *subpaths
*** 1036,1041 
--- 1041,1053 
  		/* list_copy is important here to avoid sharing list substructure */
  		return list_concat(subpaths, list_copy(apath-subpaths));
  	}
+ 	else if (IsA(path, MergeAppendPath))
+ 	{
+ 		MergeAppendPath *mpath = (MergeAppendPath *) path;
+ 
+ 		/* list_copy is important here to avoid sharing list substructure */
+ 		return list_concat(subpaths, list_copy(mpath-subpaths));
+ 	}
  	else
  		return lappend(subpaths, path);
  }
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 35d2a83..ac12f84 100644
*** a/src/backend/optimizer/path/equivclass.c
--- b/src/backend/optimizer/path/equivclass.c
*** add_child_rel_equivalences(PlannerInfo *
*** 1937,1952 
  		if (cur_ec-ec_has_volatile)
  			continue;
  
! 		/* No point in searching if parent rel not mentioned in eclass */
! 		if (!bms_is_subset(parent_rel-relids, cur_ec-ec_relids))
  			continue;
  
  		foreach(lc2, cur_ec-ec_members)
  		{
  			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
  
! 			if (cur_em-em_is_const || cur_em-em_is_child)
! continue;		/* ignore consts and children here */
  
  			/* Does it reference parent_rel? */
  			if (bms_overlap(cur_em-em_relids, parent_rel-relids))
--- 1937,1956 
  		if (cur_ec-ec_has_volatile)
  			continue;
  
! 		/*
! 		 * No point in searching if parent rel not mentioned in eclass; but
! 		 * we can't tell that for sure if parent rel is itself a child.
! 		 */
! 		if (parent_rel-reloptkind == RELOPT_BASEREL 
! 			!bms_is_subset(parent_rel-relids, cur_ec-ec_relids))
  			continue;
  
  		foreach(lc2, cur_ec-ec_members)
  		{
  			EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
  
! 			if (cur_em-em_is_const)
! continue;		/* ignore consts here */
  
  			/* Does it

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-03-03 Thread Heikki Linnakangas


On 03/03/2014 11:34 AM, Christian Kruse wrote:

Hi,


Attached is a patch with the updated documentation (now uses
consistently huge pages) as well as a renamed GUC, consistent wording
(always use huge pages) as well as renamed variables.


Hmm, I wonder if that could now be misunderstood to have something to do
with the PostgreSQL page size? Maybe add the word memory or operating
system in the first sentence in the docs, like this: Enables/disables the
use of huge memory pages.


Accepted, see attached patch.


Thanks, committed!

I spotted this in section 17.4.1 Shared Memory and Semaphores:


Linux

The default maximum segment size is 32 MB, and the default maximum total size is 
2097152 pages. A page is almost always 4096 bytes except in unusual kernel configurations 
with huge pages (use getconf PAGE_SIZE to verify).


It's not any more wrong now than it's always been, but I don't think 
huge pages ever affect PAGE_SIZE... Could I cajole you into rephrasing 
that, too?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] integrate pg_upgrade analyze_new_cluster.sh into vacuumdb

2014-03-03 Thread Kevin Grittner

Jeff Janes jeff.ja...@gmail.com wrote:

 But I do wonder what experience people have with the 3 stage
 process, how useful is it empirically?  If you can't open the
 database for general use until the 3rd phase is done, then you
 would just jump to doing that stage, rather than working through
 all 3 of them.  If you can open the database and muddle through
 without statistics for a while, why not muddle through for the
 little bit longer that it would take to collect the full set
 right off the bat, rather than making intermediate passes?

It's not always a little bit of time.  For a description of my
experience with a home-grown 3 stage process before one was built
into pg_upgrade, see this post:

http://www.postgresql.org/message-id/1373465348.51692.yahoomail...@web162906.mail.bf1.yahoo.com

Basically, we cut our down time from hours to minutes without
serious impairment of performance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Simon Riggs

On 3 March 2014 18:57, Noah Misch n...@leadboat.com wrote:
 On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
 On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
  Removing SELECT privilege while running a SELECT would be a different
  matter.  This is all a matter of definition; we can make up any rules
  we like. Doing so is IMHO a separate patch and not something to hold
  up the main patch.

 So I think this is an interesting point.  There are various things
 that could go wrong as a result of using the wrong lock level.  Worst
 would be that the server crashes or corrupts data.  A little less bad
 would be that sessions error out with inexplicable error conditions,
 as in SnapshotNow days.  Alternatively, we could just have arguably
 wrong behavior, like basing query results on the old version of the
 table's metadata even after it's been changed.

 I would order the concerns like this:

 1. Data corruption
 2. Transient, clearly-wrong answers without an error
 3. Server crash
 4. Catalog logical inconsistency
 5. Inexplicable, transient errors
 6. Valid behavior capable of surprising more than zero upgraders

I like your model for risk assessment. How can we apply it in detail
in a way that helps us decide? Or do we just go on gut feel?

My experience with mentioning such topics is that without structure it
results in an assessment of unacceptable risk just simply because
somebody has mentioned some scary words.


 I don't really care about that second category of behavior.  If
 somebody changes some property of a table and existing sessions
 continue to use the old value until eoxact, well, we can argue about
 that, but at least until we have concrete reports of really
 undesirable behavior, I don't think it's the primary issue.  What I'm
 really concerned about is whether there are other things like the
 SnapshotNow issues that can cause stuff to halt and catch fire.  I
 don't know whether there are or are not, but that's my concern.

 Since we can't know whether something qualifies as (2) or (6) without
 analyzing it, I don't find waiting for user complaints to be a good strategy
 here.  An ownership change not immediately affecting ACL checks does fall
 under (6), for me.  (However, changing ownership without AccessExclusiveLock
 might also create hazards in category (4) for concurrent DDL that performs
 owner checks.)

err, guys, you do realise that changing ownership is staying at
AccessExclusiveLock in this patch?
(and I haven't ever suggested lowering that).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Gavin Flower


On 04/03/14 04:25, Oleg Bartunov wrote:

On Mon, Mar 3, 2014 at 7:22 PM, Andres Freund and...@2ndquadrant.com wrote:

[...]


PS: Not a native speaker either...
That's explain all :)


[...]

I AM a native English speaker born in England - though if you read some 
of my postings where I've been particularly careless, you well assume 
otherwise!


My Chinese wife sometimes corrects my English, and from time to time she 
is right!


To the extent that I've read the postings of non-native English speakers 
like Oleg  Andres, I've not noticed any difficulty understanding what 
they meant - other than technical issues that would also be the same for 
me from gifted native English speakers!



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 Is the TODO item make an unlogged table logged [1] a good GSoC project?

Another interesting project around unlogged tables would be to make it
possible to have unlogged indexes on fully-logged tables. That is
something that there was some discussion of before, that might be
easier.

FWIW, I don't think that TODO page is a very good resource for finding
a starter project. Picking a good project is a skill in and of itself.
A lot of that stuff is aspirational, either because it's difficult,
or, more often, because it's difficult relative to the rewards, which
can be quite low. To be honest, if I have what I imagine to be a great
idea for a project, I don't put it on that page. Maybe I should, but I
don't, and I don't think that is uncommon. This is not because I'm
particularly guarded about sharing the information.

Why do you think that hash indexes still aren't WAL-logged after all
these years (a project that someone made noise about recently in
relation to GSoC), even though that's generally considered to be a
SMOP?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Andres Freund

On 2014-03-03 12:44:26 -0800, Peter Geoghegan wrote:
 On Mon, Mar 3, 2014 at 8:28 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  Is the TODO item make an unlogged table logged [1] a good GSoC project?
 
 Another interesting project around unlogged tables would be to make it
 possible to have unlogged indexes on fully-logged tables. That is
 something that there was some discussion of before, that might be
 easier.

I'd actually say it's harder because it requires modifying the catalog
or transparently introducing hacks similar to what unlogged matviews are
doing, to make sure the index is marked invalid after a crash restart.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 11:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-27 17:56:08 +0100, Andres Freund wrote:
 * do we modify struct SnapshotData to be polymorphic based on some tag
   or move comments there?

 I tried that, and it got far to invasive. So I've updated the relevant
 comment in snapshot.h, inl

 * How/whether to change the exclusive lock on the ProcArrayLock in
   CreateInitDecodingContext()

 I looked at this, and I believe the current code is the best
 solution. It's pretty far away from any hot codepath and it's a short
 operation. I liked the idea about using snapmgr.c for this in principle,
 but it doesn't have enough smarts by far...

 So, attached is the newest version:
 * Management of historic/timetravel snapshot is now done by snapmgr.c,
   not tqual.c anymore. No -satisfies pointers are redirected anymore
 * removal of the suspend logic for historic snapshot, instead the one
   place that needed it, now explicitly uses a snapshot
 * removal of some pointless CREATE EXTENSIONs from the regression tests
 * splitoff of the slot tests that aren't committable into a separate
   commit.
 * minor doc adjustments

 I am not aware of any further things that need to be fixed now (in
 contrast to features for later releases of which there are aplenty).

OK, I've committed the 0001 patch, which is the core of this feature,
with a bit of minor additional hacking.

I'm sure there are some problems here yet and some things that people
will want fixed, as is inevitable for any patch of this size.  But I
don't have any confidence that further postponing commit is going to
be the best way to find those issues, so in it goes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] building pgadmin4

2014-03-03 Thread Willy-Bas Loos

Hi,

I'm trying to build pgadmin4, out of curiosity.
I'm on a ubuntu 13.10 desktop vm.
I added qt webkitwidgets, and now I run into the next error, which doesn't
seem to make much sense:
wbloos2@vm1:~/pgadmin4/runtime$ qmake
Project MESSAGE: Building for QT5+...
Project ERROR: Unknown module(s) in QT: quick

I haven't found the word quick in any of the code and there's no .qml
file.

any clues?

Cheers,

WBL


-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth

Re: [HACKERS] building pgadmin4

2014-03-03 Thread Robert Haas

On Mon, Mar 3, 2014 at 4:53 PM, Willy-Bas Loos willy...@gmail.com wrote:
 I'm trying to build pgadmin4, out of curiosity.
 I'm on a ubuntu 13.10 desktop vm.
 I added qt webkitwidgets, and now I run into the next error, which doesn't
 seem to make much sense:
 wbloos2@vm1:~/pgadmin4/runtime$ qmake
 Project MESSAGE: Building for QT5+...
 Project ERROR: Unknown module(s) in QT: quick

 I haven't found the word quick in any of the code and there's no .qml
 file.

 any clues?

I suggest that you post to one of the pgadmin mailing lists:

http://www.postgresql.org/list/pgadmin-support/
http://www.postgresql.org/list/pgadmin-hackers/

pgAdmin is off-topic for this mailing list.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-03-03 Thread Joel Jacobson

On Wed, Jan 15, 2014 at 1:34 AM, Marko Tiikkaja ma...@joh.to wrote:
 Hi all!

 It's me again, trying to find a solution to the most common mistakes I make.
 This time it's accidental shadowing of variables, especially input
 variables.  I've wasted several hours banging my head against the wall while
 shouting HOW CAN THIS VARIABLE ALWAYS BE NULL?.  I can't believe I'm the
 only one.  To give you a rough idea on how it works:

+1

I've made the same mistake countless of times.
For me, it always happens when I have a declared variable in a
function which I later on make an input parameter instead and forget
to remove it under the declare section of the function.

Regarding the warning vs error discussion, I think it depends on if we
want to maximize short-term or long-term user-friendliness.
In the short-term it's more user-friendly to lie to the user and
pretend everything is fine by not crashing the PL/pgSQL-application
and instead writing warning messages in the log file which might not
be seen. But in the long-term the user will run into problems caused
by the bugs.
With MySQL, invalid dates are accepted unless special settings are
turned on, but yes, warnings are emitted which most implementations
ignore. This is bad.

I strongly think it should be made an error, because it is most
certainly an error, and even if it's not, it's at least bad coding
style and the code should be fixed anyway, or if one is lazy, turn
this off in the config file and make it a warning instead.

I don't think we should be too afraid to break backward compability if
it only affects a theoretical percentage of the users in a new major
release.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Stephen Frost

KaiGai,

* Stephen Frost (sfr...@snowman.net) wrote:
 And I'm still unconvinced of this approach and worry that it's going to
 break more often than it works.  That's my 2c on it, but I won't get in
 the way if someone else wants to step up and support it.

Alright, having heard from Robert (thanks!) regarding his thoughts
(which are pretty similar to my own, though he doesn't anticipate issues
with API changes), I'm going to step back a bit form the above position.

 As I mentioned
 up-thread, I'd really like to see FDW join push-down, FDW aggregate
 push-down, parallel query execution, and parallel remote-FDW execution
 and I don't see this CustomScan approach as the right answer to any of
 those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support.  In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core?  I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

For one thing, an example where you could have this CustomScan node
calling other nodes underneath would be interesting.  I realize the CTID
scan can't do that directly but I would think your GPU-based system
could; after all, if you're running a join or an aggregate with the GPU,
the rows could come from nearly anything.  Have you considered that, or
is the expectation that users will just go off and access the heap
and/or whatever indexes directly, like ctidscan does?  How would such a
requirement be handled?

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] building pgadmin4

2014-03-03 Thread Willy-Bas Loos


 pgAdmin is off-topic for this mailing list.


so sorry, i misread the adress in the readme file

cheers,

WBL

-- 
Quality comes from focus and clarity of purpose -- Mark Shuttleworth

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 07:19:45PM +, Simon Riggs wrote:
 On 3 March 2014 18:57, Noah Misch n...@leadboat.com wrote:
  On Mon, Mar 03, 2014 at 10:19:55AM -0500, Robert Haas wrote:
  On Thu, Feb 27, 2014 at 3:12 AM, Simon Riggs si...@2ndquadrant.com wrote:
   Removing SELECT privilege while running a SELECT would be a different
   matter.  This is all a matter of definition; we can make up any rules
   we like. Doing so is IMHO a separate patch and not something to hold
   up the main patch.
 
  So I think this is an interesting point.  There are various things
  that could go wrong as a result of using the wrong lock level.  Worst
  would be that the server crashes or corrupts data.  A little less bad
  would be that sessions error out with inexplicable error conditions,
  as in SnapshotNow days.  Alternatively, we could just have arguably
  wrong behavior, like basing query results on the old version of the
  table's metadata even after it's been changed.
 
  I would order the concerns like this:
 
  1. Data corruption
  2. Transient, clearly-wrong answers without an error
  3. Server crash
  4. Catalog logical inconsistency
  5. Inexplicable, transient errors
  6. Valid behavior capable of surprising more than zero upgraders
 
 I like your model for risk assessment. How can we apply it in detail
 in a way that helps us decide? Or do we just go on gut feel?
 
 My experience with mentioning such topics is that without structure it
 results in an assessment of unacceptable risk just simply because
 somebody has mentioned some scary words.

True; it's tough to make use of such a prioritization.  By the time you can
confidently assign something to a category, you can probably just fix it.
(Or, in the case of category (6), document/release-note it.)

Just to be clear, that list is not a commentary on the particular patch at
hand.  Those are merely the kinds of regressions to look for in a patch
affecting this area of the code.

 err, guys, you do realise that changing ownership is staying at
 AccessExclusiveLock in this patch?
 (and I haven't ever suggested lowering that).

Yep.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction v7.9

2014-03-03 Thread Andres Freund

Hi Robert, Everyone!

On 2014-03-03 16:48:15 -0500, Robert Haas wrote:
 OK, I've committed the 0001 patch, which is the core of this feature,
 with a bit of minor additional hacking.

Many, many, thanks!

 I'm sure there are some problems here yet and some things that people
 will want fixed, as is inevitable for any patch of this size.  But I
 don't have any confidence that further postponing commit is going to
 be the best way to find those issues, so in it goes.

Unsurprisingly I do agree with this. It's a big feature, and there's
imperfection. But I think it's a good start.

A very first such imperfection is that the buildfarm doesn't actually
excercise make check in contribs, just make installcheck... Which this
patch doesn't use because the tests require wal_level=logical and
max_replication_slots = 2. Andrew said on IRC that maybe it's a good
idea to add a make-contrib-check stage to the buildfarm.

A patch fixing a couple of absolutely trivial things is attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 4fb0974..3c56238 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -9,7 +9,7 @@
  *
  * NOTES
  *This file coordinates interaction between the various modules that
- *together providethe logical decoding, primarily by providing so
+ *together provide logical decoding, primarily by providing so
  *called LogicalDecodingContexts. The goal is to encapsulate most of the
  *internal complexity for consumers of logical decoding, so they can
  *create and consume a changestream with a low amount of code.
diff --git a/src/backend/replication/logical/logicalfuncs.c b/src/backend/replication/logical/logicalfuncs.c
index 3b8ae38..5fa1848 100644
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -2,7 +2,7 @@
  *
  * logicalfuncs.c
  *
- *	   Support functions for using logical decoding and managemnt of
+ *	   Support functions for using logical decoding and management of
  *	   logical replication slots via SQL.
  *
  *
@@ -400,7 +400,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo fcinfo, bool confirm, bool bin
 			ctx-options.output_type != OUTPUT_PLUGIN_TEXTUAL_OUTPUT)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-	 errmsg(output plugin cannot produce text output)));
+	 errmsg(output plugin cannot produce binary output)));
 
 		ctx-output_writer_private = p;
 
diff --git a/src/include/utils/snapshot.h b/src/include/utils/snapshot.h
index 4b25607..8ee9285 100644
--- a/src/include/utils/snapshot.h
+++ b/src/include/utils/snapshot.h
@@ -36,7 +36,7 @@ typedef bool (*SnapshotSatisfiesFunc) (HeapTuple htup,
  * There are several different kinds of snapshots:
  * * Normal MVCC snapshots
  * * MVCC snapshots taken during recovery (in Hot-Standby mode)
- * * Historic MVCC snapshots used during logical decoding 
+ * * Historic MVCC snapshots used during logical decoding
  * * snapshots passed to HeapTupleSatisfiesDirty()
  * * snapshots used for SatisfiesAny, Toast, Self where no members are
  *   accessed.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Kouhei Kaigai

  As I mentioned
  up-thread, I'd really like to see FDW join push-down, FDW aggregate
  push-down, parallel query execution, and parallel remote-FDW execution
  and I don't see this CustomScan approach as the right answer to any of
  those.
 
 In accordance with the above, what I'd like to see with this patch is removal
 of the postgres_fdw changes and any changes which were for that support.
 
I don't argue this approach. It might be useful to *demonstrate* how custom-
scan node works as replacement of join, however, 

 In addition, I'd like to understand why 'ctidscan' makes any sense to have
 as an example of what to use this for- if that's valuable, why wouldn't
 we simply implement that in core?  I do want an example in contrib of how
 to properly use this capability, but I don't think that's it.
 
Do you think it makes sense if my submission was only interface portion
without working example? The purpose of ctidscan module is, similar to
postgres_fdw, to demonstrate the usage of custom-scan interface with
enough small code scale. If tons of code example were attached, nobody
will want to review the patch.
The cache_scan module that I and Haribabu are discussing in another
thread also might be a good demonstration for custom-scan interface,
however, its code scale is a bit larger than ctidscan.

 For one thing, an example where you could have this CustomScan node calling
 other nodes underneath would be interesting.  I realize the CTID scan can't
 do that directly but I would think your GPU-based system could; after all,
 if you're running a join or an aggregate with the GPU, the rows could come
 from nearly anything.  Have you considered that, or is the expectation that
 users will just go off and access the heap and/or whatever indexes directly,
 like ctidscan does?  How would such a requirement be handled?
 
In case when custom-scan node has underlying nodes, it shall be invoked using
ExecProcNode as built-in node doing, then it will be able to fetch tuples
come from underlying nodes. Of course, custom-scan provider can perform the
tuples come from somewhere as if it came from underlying relation. It is
responsibility of extension module. In some cases, it shall be required to
return junk system attribute, like ctid, for row-level locks or table updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plpgsql.warn_shadow

2014-03-03 Thread Tom Lane

Joel Jacobson j...@trustly.com writes:
 I strongly think it should be made an error, because it is most
 certainly an error, and even if it's not, it's at least bad coding
 style and the code should be fixed anyway, or if one is lazy, turn
 this off in the config file and make it a warning instead.

You're reasoning from a false premise: it's *not* necessarily an error.
If this were such a bad idea as you claim, generations of programming
language designers wouldn't have made their languages work like this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 Just to be clear, that list is not a commentary on the particular patch at
 hand.  Those are merely the kinds of regressions to look for in a patch
 affecting this area of the code.

A complaint on pgsql-bugs just now reminded me of a specific area that
needs to be looked at hard: how bad are the implications for pg_dump?

Up to now, pg_dump could be reasonably confident that once it had
AccessShareLock on every table it intended to dump, there would be no
schema changes happening on those tables until it got done.  This greatly
ameliorates the snapshot-skew problems that arise from its habit of doing
some things for itself and other things via backend-internal functions
(which historically used SnapshotNow and now use a fresh MVCC snapshot,
either way potentially quite newer than the transaction snapshot pg_dump's
own queries will use).

I suspect that lowering the lock requirements for anything that affects
what pg_dump can see is going to make things a whole lot worse in terms of
consistency of pg_dump output in the face of concurrent DDL.  Admittedly,
we're not perfect in that area now, but I'm not sure that's an adequate
excuse for making it worse.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Peter Geoghegan

On Fri, Feb 28, 2014 at 2:12 PM, Peter Geoghegan p...@heroku.com wrote:
 In order to make a rational decision to do the work incrementally, we
 need to know what we're putting off until 9.5. AFAICT, we have these
 operator classes that work fine with jsonb for the purposes of
 hstore-style indexing (the hstore operator classes). Wasn't that the
 point? When there is a dedicated set of jsonb operator classes, what
 will be different about them, other than the fact that they won't be
 hstore operator classes? A decision predicated on waiting for those to
 come in 9.5 must consider what we're actually waiting for, and right
 now that seems very hazy.

I really would like an answer to this question. Even if I totally
agreed with Andrew's assessment of the relative importance of having
jsonb be an in-core type, versus having some more advanced indexing
capabilities right away, this is still a very salient question.

I appreciate that the put jsonb in hstore extension to get indexing
right away trade-off is counter-intuitive, and it may even be that
there is an everybody wins third way that sees us factor out code
that is common to both jsonb and hstore as it exists today (although
I'm not optimistic about that). I would like to emphasize that if you
want to defer working on hstore-style jsonb operator classes for one
release, I don't necessarily have a problem with that. But, I must
insist on an answer here, from either you or Oleg or Teodor (it isn't
apparent that Oleg and Teodor concur with you on what's important):
what did we run out of time for? What will be different about the
jsonb operator classes that you're asking us to wait for in a future
release?

I understand that there are ambitious plans for a VODKA-am that will
support indexing operations on nested structures that are a lot more
advanced than those enabled by the hstore operator classes included in
these patches. However, surely these hstore operator classes have
independent value, or represent incremental progress?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-03-03 Thread Noah Misch

On Mon, Mar 03, 2014 at 01:29:00AM -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  Concerning the immediate fix for non-Windows systems, does any modern system
  ignore modes of Unix domain sockets?  It appears to be a long-fixed problem:
 
 What I was envisioning was that we'd be relying on the permissions of the
 containing directory to keep out bad guys.  Permissions on the socket
 itself might be sufficient, but what does it save us to assume that?

My first preference is to use the simplest code that POSIX requires to have
the behavior we desire.  POSIX specifies as implementation-defined whether
connect() checks filesystem permissions.  That's true of both directory search
permissions and permissions on the socket itself.  POSIX alone can't help us
here.

My second preference is to use the simplest code known to be portable to all
credible PostgreSQL target systems.  Brief research was inconclusive, but it
turned up no solid evidence of a modern target ignoring socket permissions.
(It did turn up solid evidence of 15-year-old targets having that problem.)  I
found no evidence either way concerning the prevalence of systems that ignore
directory search permissions above sockets.

I don't care for interposing a directory based solely on the fact that some
ancient systems needed that.  Changing unix_socket_permissions is a one-liner
in each test driver.  Placing the socket in a directory entails setting PGHOST
in the psql and postmaster environments and cleaning up the directory on exit.
That would be fine if restricted to pg_regress, but it would also show up in
contrib/pg_upgrade/test.sh, perhaps eventually in vcregress.pl:upgradecheck(),
perhaps in the buildfarm code, in the DBD::Pg test suite, and in any other
test suite that creates a temporary cluster.  We should not lead all those
test drivers into using a temporary socket directory based on long-gone bugs
or cargo cult programming.  If there are notable systems today where it helps,
that's a different matter.

Also, test drivers should not be the sole place where we express doubt about
the reliability of socket permissions.  If they are unreliable on a noteworthy
target, then the unix_socket_permissions documentation ought to say so.

nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Josh Berkus

On 03/03/2014 04:50 PM, Peter Geoghegan wrote:
 I understand that there are ambitious plans for a VODKA-am that will
 support indexing operations on nested structures that are a lot more
 advanced than those enabled by the hstore operator classes included in
 these patches. However, surely these hstore operator classes have
 independent value, or represent incremental progress?

Primary value is that in theory the hstore2 opclasses are available
*now*, as opposed to a year from now.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 4:57 PM, Josh Berkus j...@agliodbs.com wrote:
 On 03/03/2014 04:50 PM, Peter Geoghegan wrote:
 I understand that there are ambitious plans for a VODKA-am that will
 support indexing operations on nested structures that are a lot more
 advanced than those enabled by the hstore operator classes included in
 these patches. However, surely these hstore operator classes have
 independent value, or represent incremental progress?

 Primary value is that in theory the hstore2 opclasses are available
 *now*, as opposed to a year from now.

Well, yes, that's right. Although we cannot assume that VODKA will get
into 9.5 - it's a big project. Nor is it obvious to me that a
VODKA-ized set of operator classes would not bring with them exactly
the same dilemma as we now face vis-a-vis hstore code reuse and GIN
operator classes. So it seems reasonable to me to suppose that VODKA
should not influence our decision here. Please correct me if I'm
mistaken.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Andres Freund

On 2014-03-03 19:15:27 -0500, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  Just to be clear, that list is not a commentary on the particular patch at
  hand.  Those are merely the kinds of regressions to look for in a patch
  affecting this area of the code.
 
 A complaint on pgsql-bugs just now reminded me of a specific area that
 needs to be looked at hard: how bad are the implications for pg_dump?
 
 Up to now, pg_dump could be reasonably confident that once it had
 AccessShareLock on every table it intended to dump, there would be no
 schema changes happening on those tables until it got done.

The guarantee wasn't actually that strong. It already was quite possible
that indexes got created/dropped during that time, which probably is the
by far most frequent DDL run in production.

 This greatly
 ameliorates the snapshot-skew problems that arise from its habit of doing
 some things for itself and other things via backend-internal functions
 (which historically used SnapshotNow and now use a fresh MVCC snapshot,
 either way potentially quite newer than the transaction snapshot pg_dump's
 own queries will use).

Yea, I wonder if we shouldn't start to make them use a different
snapshot. It's the pg_get_*def() functions, or is there something else?

Afair (I really haven't rechecked) all the actions that have a changed
locklevels affect things that pg_dump recreates clientside, using a
repeatable read snapshot, so there shouldn't be much change there?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Josh Berkus

On 03/03/2014 05:07 PM, Peter Geoghegan wrote:
 Primary value is that in theory the hstore2 opclasses are available
 *now*, as opposed to a year from now.
 
 Well, yes, that's right. Although we cannot assume that VODKA will get
 into 9.5 - it's a big project. Nor is it obvious to me that a
 VODKA-ized set of operator classes would not bring with them exactly
 the same dilemma as we now face vis-a-vis hstore code reuse and GIN
 operator classes. So it seems reasonable to me to suppose that VODKA
 should not influence our decision here. Please correct me if I'm
 mistaken.

I would agree with you.

Andrew was onsite with a client over the weekend, which is why you
haven't heard from him on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-03-03 Thread Tom Lane

Noah Misch n...@leadboat.com writes:
 On Mon, Mar 03, 2014 at 01:29:00AM -0500, Tom Lane wrote:
 What I was envisioning was that we'd be relying on the permissions of the
 containing directory to keep out bad guys.  Permissions on the socket
 itself might be sufficient, but what does it save us to assume that?

 My first preference is to use the simplest code that POSIX requires to have
 the behavior we desire.  POSIX specifies as implementation-defined whether
 connect() checks filesystem permissions.  That's true of both directory search
 permissions and permissions on the socket itself.

Surely you are misinterpreting that.  If it worked as you suggest,
connect() would provide a trivial method of bypassing directory
permissions, at least to the extent of being able to probe for the
existence of files within supposedly-unreadable directories.  I can
believe that there are implementations that don't examine the permissions
on the socket file itself, but not that there are any that disregard
directory permissions during the file lookup.

 I found no evidence either way concerning the prevalence of systems that
 ignore directory search permissions above sockets.

That's because the question is ridiculous on its face, so nobody ever
bothered to address it.  I think the burden is on you to show that there
has ever been any system that read the spec the way you propose.

 I don't care for interposing a directory based solely on the fact that some
 ancient systems needed that.  Changing unix_socket_permissions is a one-liner
 in each test driver.  Placing the socket in a directory entails setting PGHOST
 in the psql and postmaster environments and cleaning up the directory on exit.

Placing the socket anywhere besides the default location will require
setting PGHOST anyway, so I don't see that this argument holds much water.
The cleanup aspect is likewise not that exciting; pg_regress creates a lot
of stuff it doesn't remove.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-03 19:15:27 -0500, Tom Lane wrote:
 This greatly
 ameliorates the snapshot-skew problems that arise from its habit of doing
 some things for itself and other things via backend-internal functions
 (which historically used SnapshotNow and now use a fresh MVCC snapshot,
 either way potentially quite newer than the transaction snapshot pg_dump's
 own queries will use).

 Yea, I wonder if we shouldn't start to make them use a different
 snapshot. It's the pg_get_*def() functions, or is there something else?

See past discussions.  By the time you trace all of ruleutils.c's
dependencies, a dauntingly large fraction of the backend's basic catalog
access support is implicated.  For example, you'd need some way of making
the catcaches return data that they know to be outdated.  And I'm pretty
sure pg_dump is making free use of stuff that isn't even in ruleutils.

I would like to see pg_dump doing something much more bulletproof, but
I'm afraid that there isn't any nice simple fix available.

One relatively narrow fix that would probably make it a lot better
*in the current state of affairs* is to provide a way whereby, once
pg_dump has locks on everything it wants to dump, it can advance
its transaction snapshot to current time.  Then at least both its own
queries and the backend's lookups will see post-locking state.  However,
if AccessShareLock isn't enough to block DDL on the tables then we're
still hosed.

 Afair (I really haven't rechecked) all the actions that have a changed
 locklevels affect things that pg_dump recreates clientside, using a
 repeatable read snapshot, so there shouldn't be much change there?

You're missing the point entirely if you think pg_dump recreates
everything client-side.  It's never done that 100%, and we've migrated
more and more stuff to the server side over time to avoid duplicate
implementations of things like index expression decompilation.  So while
a theoretical answer would be to remove all of pg_dump's dependency on
server-side functions, I am pretty sure that's never going to happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row-security on updatable s.b. views

2014-03-03 Thread Craig Ringer

On 02/25/2014 01:28 AM, Dean Rasheed wrote:
 On 13 February 2014 04:12, Craig Ringer cr...@2ndquadrant.com wrote:

 It's crashing while pulling up the query over emp (hl7.employee) and
 part (hl7.participation).

 Given the simplicity of what the row-security code its self is doing,
 I'm wondering if this is a case that isn't handled in updatable s.b.
 views. I'll look into it.
 
 I'm not sure how much further you've got with this, but I think the
 issue is that the securityQuals that you're adding don't refer to the
 correct RTE. When adding securityQuals to an RTE, they are expected to
 have Vars whose varno matches the rt_index of the RTE (see for example
 the code in rewriteTargetView() which calls ChangeVarNodes() on
 viewqual before adding the qual to securityQuals or the main query
 jointree). prepend_row_security_quals() doesn't appear to have any
 similar code, and it would need to be passed the rt_index to do that.

Thanks for the pointer. That was indeed the issue.

I've pushed an update to the branch with the fix for varno handling.
Thanks. It's tagged rls-9.4-upd-sb-views-v8 .

I've almost run out of time to spend on row security for this
commitfest, unfortunately. I'm putting a blog together with a current
status update. Frustrating, as it's coming together now.

Open issues include:

- Passing plan inval items from rewriter into planner
- COPY support pending
- Clear syntax in DDL

Most of the rest are solved; it's actually looking pretty good.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Andres Freund

On 2014-03-03 20:32:13 -0500, Tom Lane wrote:
  Afair (I really haven't rechecked) all the actions that have a changed
  locklevels affect things that pg_dump recreates clientside, using a
  repeatable read snapshot, so there shouldn't be much change there?
 
 You're missing the point entirely if you think pg_dump recreates
 everything client-side. 

No, I am not obviously not thinking that. What I mean is that the things
that actually change their locking requirement in the proposed patch
primarily influence things that are reconstructed clientside by
pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-03-03 Thread Tom Lane

I wrote:
 Placing the socket anywhere besides the default location will require
 setting PGHOST anyway, so I don't see that this argument holds much water.
 The cleanup aspect is likewise not that exciting; pg_regress creates a lot
 of stuff it doesn't remove.

There's another point here, if you think back to the discussion as it
stood before we noticed that there was a security problem.  What was
originally under discussion was making life easier for packagers who
want to build with a default socket location like /var/run/postgresql/.
In a build environment, that directory may not exist, and even if it
does, there is no way that the build user should have write permission
on it.  So it is already the case that some packagers have to override
the socket location if they want to do make check while packaging,
and what we were originally on about was making that part of the normal
pg_regress procedures instead of being something requiring patching.

So, whether or not unix_socket_permissions would be a bulletproof
security fix by itself, I'd still want to see some provisions made
for putting the socket into a local directory during make check.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 5:09 PM, Josh Berkus j...@agliodbs.com wrote:
 On 03/03/2014 05:07 PM, Peter Geoghegan wrote:
 Primary value is that in theory the hstore2 opclasses are available
 *now*, as opposed to a year from now.

 Well, yes, that's right. Although we cannot assume that VODKA will get
 into 9.5 - it's a big project. Nor is it obvious to me that a
 VODKA-ized set of operator classes would not bring with them exactly
 the same dilemma as we now face vis-a-vis hstore code reuse and GIN
 operator classes. So it seems reasonable to me to suppose that VODKA
 should not influence our decision here. Please correct me if I'm
 mistaken.

 I would agree with you.

Good. Hopefully you also mean that you recognize the dilemma referred
to above - that the hstore code reuse made a certain amount of sense,
and that more than likely the best way forward is to work out a way to
make it work. I'm not immediately all that concerned about what the
least worst way of doing that is (I just favor putting everything in
hstore on practical grounds). Another way to solve this problem might
be to simply live with the code duplication between core and hstore on
the grounds that hstore will eventually be deprecated as people switch
to jsonb over time (so under that regime nothing new would ever be
added to hstore, which we'd eventually remove from contrib entirely,
while putting everything new here in core). I don't favor that
approach, but it wouldn't be totally unreasonable, based on the
importance that is attached to jsonb, and based on what I'd estimate
to be the actual amount of code redundancy that that would create
(assuming that hstore gets no new functions and operators, since an
awful lot of the hstore-local code after this patch is applied is new
to hstore). I wouldn't stand in the way of this approach.

In my view the most important thing right now is that before anything
is committed, at the very least there needs to be a strategy around
getting hstore-style GIN indexing in jsonb. I cannot understand how
you can have an operator class today that works fine for hstore-style
indexing of jsonb (as far as that goes), but that that code is out of
bounds just because it's nominally (mostly new) hstore code, and you
cannot figure out a way of making that work that is acceptable from a
code maintenance perspective. If you cannot figure that out in a few
days, why should you ever be able to figure it out? We need to bite
the bullet here, whatever that might actually entail. Can we agree on
that much?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2014-03-03 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-03 20:32:13 -0500, Tom Lane wrote:
 You're missing the point entirely if you think pg_dump recreates
 everything client-side. 

 No, I am not obviously not thinking that. What I mean is that the things
 that actually change their locking requirement in the proposed patch
 primarily influence things that are reconstructed clientside by
 pg_dump. E.g ALTER TABLE ... CLUSTER ON, SET(...), ...

[ raised eyebrow... ]  I'm pretty sure that no such constraint was
part of the design discussion to start with.  Even if it accidentally
happens to be the case now, it sounds utterly fragile.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Andrew Dunstan



On 03/03/2014 07:50 PM, Peter Geoghegan wrote:

On Fri, Feb 28, 2014 at 2:12 PM, Peter Geoghegan p...@heroku.com wrote:

In order to make a rational decision to do the work incrementally, we
need to know what we're putting off until 9.5. AFAICT, we have these
operator classes that work fine with jsonb for the purposes of
hstore-style indexing (the hstore operator classes). Wasn't that the
point? When there is a dedicated set of jsonb operator classes, what
will be different about them, other than the fact that they won't be
hstore operator classes? A decision predicated on waiting for those to
come in 9.5 must consider what we're actually waiting for, and right
now that seems very hazy.

I really would like an answer to this question. Even if I totally
agreed with Andrew's assessment of the relative importance of having
jsonb be an in-core type, versus having some more advanced indexing
capabilities right away, this is still a very salient question.



(Taking a break from some frantic customer work)

My aim for 9.4, given constraints of both the development cycle and my 
time budget, has been to get jsonb to a point where it has equivalent 
functionality to json, so that nobody is forced to say well I'll have 
to use json because it lacks function x. For the processing functions, 
i.e. those that don't generate json from non-json, this should be true 
with what's proposed. The jsonb processing functions should be about as 
fast as, or in some cases significantly faster than, their json 
equivalents. Parsing text input takes a little longer (surprisingly 
little, actually), and reserializing takes significantly longer - I 
haven't had a chance to look and see if we can improve that. Both of 
these are more or less expected results.


For 9.5 I would hope that we have at least the equivalent of the 
proposed hstore classes. But that's really just a start. Frankly, I 
think we need to think a lot harder about how we want to be able to 
index this sort of data. The proposed hstore operators appear to me to 
be at best just scratching the surface of that. I'd like to be able to 
index jsonb's # and # operators, for example. Unanchored subpath 
operators could be an area that's interesting to implement and index.


I also would like to see some basic insert/update/delete/merge operators 
for json/jsonb - that's an area I wanted to work on for this lease but 
wasn't able to arrange.


Note that future developments is a major topic of my pgcon talk, and I'm 
hoping that we can get some good discussion going there.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Josh Berkus

On 03/03/2014 06:17 PM, Peter Geoghegan wrote:
 Good. Hopefully you also mean that you recognize the dilemma referred
 to above - that the hstore code reuse made a certain amount of sense,
 and that more than likely the best way forward is to work out a way to
 make it work. I'm not immediately all that concerned about what the
 least worst way of doing that is (I just favor putting everything in
 hstore on practical grounds).

Well, I don't see how this is on practical grounds at this point.
Whether we keep jsonb in core or not, we have an equal amount of work
ahead of us.  That's why I said the single-extension approach was
conceptually simpler rather than actually simpler.  It's easier to
understand, not necessarily easier to implement at this point.

Also, please recognize that the current implementation was what we
collectively decided on three months ago, and what Andrew worked pretty
hard to implement based on that collective decision.  So if we're going
to change course, we need a specific reason to change course, not just
it seems like a better idea now or I wasn't paying attention then.

The one extension to rule them all approach also has the issue of
Naming Things, but I think that could be solved with a symlink or two.

 Another way to solve this problem might
 be to simply live with the code duplication between core and hstore on

What code duplication?

 the grounds that hstore will eventually be deprecated as people switch
 to jsonb over time (so under that regime nothing new would ever be
 added to hstore, which we'd eventually remove from contrib entirely,
 while putting everything new here in core). I don't favor that
 approach, but it wouldn't be totally unreasonable, based on the
 importance that is attached to jsonb, and based on what I'd estimate
 to be the actual amount of code redundancy that that would create
 (assuming that hstore gets no new functions and operators, since an
 awful lot of the hstore-local code after this patch is applied is new
 to hstore). I wouldn't stand in the way of this approach.

Realistically, hstore will never go away.  I'll bet you a round or two
of pints that, if we get both hstore2 and jsonb, within 2 years the
users of jsonb will be an order of magnitude more numerous that then
users of hstore, but folks out there have apps already built against
hstore1 and they're going to keep on the hstore path.

In the discussion you haven't apparently caught up on yet, we did
discuss moving *hstore* into core to make this whole thing easier.
However, that fell afoul of the fact that we currently have no mechanism
to move types between extensions and core without breaking upgrades for
everyone.  So the only reason hstore is still an extension is because of
backwards-compatibility.

 In my view the most important thing right now is that before anything
 is committed, at the very least there needs to be a strategy around
 getting hstore-style GIN indexing in jsonb. 

I think it's a good idea to have a strategy.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 6:54 PM, Andrew Dunstan and...@dunslane.net wrote:
 My aim for 9.4, given constraints of both the development cycle and my time
 budget, has been to get jsonb to a point where it has equivalent
 functionality to json, so that nobody is forced to say well I'll have to
 use json because it lacks function x. For the processing functions, i.e.
 those that don't generate json from non-json, this should be true with
 what's proposed. The jsonb processing functions should be about as fast as,
 or in some cases significantly faster than, their json equivalents. Parsing
 text input takes a little longer (surprisingly little, actually), and
 reserializing takes significantly longer - I haven't had a chance to look
 and see if we can improve that. Both of these are more or less expected
 results.

Okay, that's fine. I'm sure that jsonb has some value without
hstore-style indexing. That isn't really in question. What is in
question is why you would choose to give up on those capabilities.

 For 9.5 I would hope that we have at least the equivalent of the proposed
 hstore classes.

But the equivalent code to the proposed hstore operator classes is
*exactly the same* C code. The two types are fully binary coercible in
the patch, so why delay? Why is that additional step appreciably
riskier than adopting jsonb? I don't see why the functions associated
with the operators that comprise, say, the gin_hstore_ops operator
class represent much additional risk, assuming that jsonb is itself in
good shape. For example, the new hstore_contains() looks fairly
innocuous compared to much of the code you are apparently intent on
including in the first cut at jsonb. Have I missed something? Why are
those operators riskier than the operators you are intent on
including?

If it is true that you think that's a significant additional risk, a
risk too far, then it makes sense that you'd defer doing this. I would
like to know why that is, though, since I don't see it. Speaking of
missing operator classes, I'm pretty sure that it's ipso facto
unacceptable that there is no default btree operator class for the
type jsonb:

[local]/postgres=# \d+ bar
 Table public.bar
 Column | Type  | Modifiers | Storage  | Stats target | Description
+---+---+--+--+-
 i  | jsonb |   | extended |  |
Has OIDs: no

[local]/postgres=# select * from bar order by i;
ERROR:  42883: could not identify an ordering operator for type jsonb
LINE 1: select * from bar order by i;
   ^
HINT:  Use an explicit ordering operator or modify the query.
LOCATION:  get_sort_group_operators, parse_oper.c:221
Time: 6.424 ms
[local]/postgres=# select distinct i from bar;
ERROR:  42883: could not identify an equality operator for type jsonb
LINE 1: select distinct i from bar;
^
LOCATION:  get_sort_group_operators, parse_oper.c:226
Time: 6.457 ms

 But that's really just a start. Frankly, I think we need to
 think a lot harder about how we want to be able to index this sort of data.
 The proposed hstore operators appear to me to be at best just scratching the
 surface of that. I'd like to be able to index jsonb's # and # operators,
 for example. Unanchored subpath operators could be an area that's
 interesting to implement and index.

I'm sure that's true, but it's not our immediate concern. We need to
think very hard about it to get everything we want, but we also need
to think somewhat harder about it in order to get even a basic jsonb
type committed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire

2014-03-03 Thread Amit Kapila

On Mon, Mar 3, 2014 at 3:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-03-01 13:29:18 +0530, Amit Kapila wrote:
 With new patch, the message while updating locked rows will be displayed
 as below:

 LOG:  process 364 still waiting for ShareLock on transaction 678 after
 1014.000ms
 CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation 
 publ
 ic.t1 of database postgres

 LOG:  process 364 acquired ShareLock on transaction 678 after 60036.000 ms
 CONTEXT:  while attempting to lock tuple (0,2) with values (2) in relation 
 publ
 ic.t1 of database postgres

 Now I am not sure, if the second message is an improvement, as what it sounds
 is while attempting to lock tuple, it got shared lock on
 transaction'. If you, Robert
 or other feels it is okay, then we can retain it as it is in patch
 else I think either
 we need to rephrase it or may be try with some other way (global variable) 
 such
 that it appears only for required case. I feel the way Robert has
 suggested i.e to
 make it as Detail of particular message (we might need to use global 
 variable to
 pass certain info) is better way and will have minimal impact on the cases 
 where
 this additional information needs to be displayed.

 I really don't understand the origins of your arguments here. Why
 shouldn't a row lock caused by an UPDATE be relevant?

The point I am trying to say is about below part of statement in
Context message:
while attempting to lock tuple (0,2) .

In above situation, we are actually trying to acquire a lock on transaction to
operate on a tuple, so I think it will be better to rephrase it (may be by using
'operate on' instead of 'lock').


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Peter Geoghegan

On Mon, Mar 3, 2014 at 7:39 PM, Peter Geoghegan p...@heroku.com wrote:
 But that's really just a start. Frankly, I think we need to
 think a lot harder about how we want to be able to index this sort of data.
 The proposed hstore operators appear to me to be at best just scratching the
 surface of that. I'd like to be able to index jsonb's # and # operators,
 for example. Unanchored subpath operators could be an area that's
 interesting to implement and index.

 I'm sure that's true, but it's not our immediate concern. We need to
 think very hard about it to get everything we want, but we also need
 to think somewhat harder about it in order to get even a basic jsonb
 type committed.

By the way, I think it would be fine to defer adding many of the new
hstore operators and functions until 9.5 (as hstore infrastructure, or
in-core jsonb infrastructure, or anything else), if you felt you had
to, provided that you included just those sufficient to create jsonb
operator classes (plus the operator classes themselves, of course).
There is absolutely no question about having to do this for
B-Tree...why not go a couple of operator classes further?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

2014-03-03 Thread Fabrízio de Royes Mello

On Mon, Mar 3, 2014 at 2:40 PM, Hannu Krosing ha...@2ndquadrant.com wrote:

 On 03/03/2014 05:22 PM, Tom Lane wrote:
  Stephen Frost sfr...@snowman.net writes:
 ...
  ISTR the discussion going something along the lines of we'd have to
WAL
  log the entire table to do that, and if we have to do that, what's the
  point?.
  IIRC, the reason you'd have to do that is to make the table contents
  appear on slave servers.  If you don't consider replication then it
might
  seem easier.
 So switch on logging and then perform CLUSTER/VACUUM FULL ?

 Should this work, or is something extra needed ?


Today I do something like that:

1) create unlogged table tmp_foo ...
2) populate 'tmp_foo' table (ETL scripts or whatever)
3) start transaction
4) lock table tmp_foo in access exclusive mode
5) update pg_class set relpersistence = 'p' where oid = 'tmp_foo':regclass
6) drop table foo; -- the old foo table
7) alter table tmp_foo rename to foo;
8) end transaction
9) run pg_repack in table 'foo'

I know it's very ugly, but works... and works for standbys too... :-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-03-03 Thread Ashutosh Bapat

On Mon, Mar 3, 2014 at 9:13 PM, Stephen Frost sfr...@snowman.net wrote:

 * Robert Haas (robertmh...@gmail.com) wrote:
  On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost sfr...@snowman.net
 wrote:
   Erm, my thought was to use a select() loop which sends out I/O requests
   and then loops around waiting to see who finishes it.  It doesn't
   parallelize the CPU cost of getting the rows back to the caller, but
   it'd at least parallelize the I/O, and if what's underneath is actually
   a remote FDW running a complex query (because the other side is
 actually
   a view), it would be a massive win to have all the remote FDWs
 executing
   concurrently instead of serially as we have today.
 
  I can't really make sense of this.

 Sorry, that was a bit hand-wavey since I had posted about it previously
 here:

 http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net

 It'd clearly be more involved than just build a select() loop and
 would require adding an async mechanism.  I had been thinking about this
 primairly with the idea of FDWs and you're right that it'd require more
 thought to deal with getting data into/through shared_buffers.  Still,
 we seqscan into a ring buffer, I'd think we could make it work but it
 would require additional work.

  For FDWs, one idea might be to kick off the remote query at
  ExecInitNode() time rather than ExecProcNode() time, at least if the
  remote query doesn't depend on parameters that aren't available until
  run time.

 Right, I had speculated about that also (option #2 in my earlier email).


During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
to foreign servers, those would be fired while EXPLAINing a query as well.
We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
server. But again, not all foreign servers would be able to EXPLAIN the
query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
if it's for EXPLAIN (except for ANALYSE may be).


  That actually would allow multiple remote queries to run
  simultaneously or in parallel with local work.  It would also run them
  in cases where the relevant plan node is never executed, which would
  be bad but perhaps rare enough not to worry about.

 This was my primary concern, along with the fact that we explicitly says
 don't do that in the docs for the FDW API.

  Or we could add a
  new API like ExecPrefetchNode() that tells nodes to prepare to have
  tuples pulled, and they can do things like kick off asynchronous
  queries.  But I still don't see any clean way for the Append node to
  find out which one is ready to return results first.

 Yeah, that's tricky.

 Thanks,

 Stephen




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

1 2 >

1 - 100 of 115 matches

Mail list logo