Re: [HACKERS] scan on inheritance parent with no children in current session

2017-08-13 Thread Ashutosh Bapat
On Fri, Aug 11, 2017 at 9:08 PM, Robert Haas  wrote:
>
> rhaas=# create table parent (a int) partition by list (a);
> CREATE TABLE
> rhaas=# create temp table child partition of parent for values in (1);
> CREATE TABLE
> rhaas=# explain verbose select * from parent;
>QUERY PLAN
> -
>  Append  (cost=0.00..35.50 rows=2550 width=4)
>->  Seq Scan on pg_temp_3.child  (cost=0.00..35.50 rows=2550 width=4)
>  Output: child.a
> (3 rows)
>
> But the comments say:
>
>  * A childless table is never considered to be an inheritance set; therefore
>  * a parent RTE must always have at least two associated AppendRelInfos.
>
> Yet, not.  So at least the comments need to be updated;

A partitioned table with at least a single partition is not childless.
So, above comment doesn't apply here. In a regular inheritance there
will be at least two AppendRelInfos one representing a scan on the
parent table and others representing the scan on children. The comment
needs to be distinguish between regular inheritance and partitioned
inheritance. I have modified the comments that way.

> not sure if we
> want to try to eliminate the Append node in this case also.
>

The rest of the query tree and plan tree, expects parent's targetlist
which may be different from that of the child. A notable difference is
whole row expressions getting translated using ConvertRowExpr, which
are not expected to pop up in a scan's targetlist. So we can't
eliminate the Append node.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index cf46b74..23c66f2 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -1360,8 +1360,12 @@ expand_inherited_tables(PlannerInfo *root)
  * table, but with inh = false, to represent the parent table in its role
  * as a simple member of the inheritance set.
  *
- * A childless table is never considered to be an inheritance set; therefore
- * a parent RTE must always have at least two associated AppendRelInfos.
+ * A childless table is never considered to be an inheritance set. In case of
+ * regular inheritance a parent RTE must always have at least two associated
+ * AppendRelInfos: one corresponding to the parent table as a simple member of
+ * inheritance set and one or more corresponding the actual children. We do not
+ * scan a partitioned table, and hence do not create AppendRelInfo for it.
+ * Hence a partitioned table must have at least one asscociated AppendRelInfo.
  */
 static void
 expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
@@ -1374,7 +1378,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 	List	   *inhOIDs;
 	List	   *appinfos;
 	ListCell   *l;
-	bool		need_append;
+	bool		has_child;
 	PartitionedChildRelInfo *pcinfo;
 	List	   *partitioned_child_rels = NIL;
 
@@ -1448,7 +1452,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 
 	/* Scan the inheritance set and expand it */
 	appinfos = NIL;
-	need_append = false;
+	has_child = false;
 	foreach(l, inhOIDs)
 	{
 		Oid			childOID = lfirst_oid(l);
@@ -1502,7 +1506,10 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 		 */
 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
 		{
-			need_append = true;
+			/* Remember if we saw a real child. */
+			if (childOID != parentOID)
+has_child = true;
+
 			appinfo = makeNode(AppendRelInfo);
 			appinfo->parent_relid = rti;
 			appinfo->child_relid = childRTindex;
@@ -1582,7 +1589,7 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 	 * the parent table is harmless, so we don't bother to get rid of it;
 	 * ditto for the useless PlanRowMark node.
 	 */
-	if (!need_append)
+	if (!has_child)
 	{
 		/* Clear flag before returning */
 		rte->inh = false;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-13 Thread Rushabh Lathia
On Fri, Aug 11, 2017 at 10:50 PM, Robert Haas  wrote:

> On Fri, Aug 11, 2017 at 5:36 AM, Rushabh Lathia
>  wrote:
> > Please find attach patch with the changes.
>
> I found the way that you had the logic structured in flagInhTables()
> to be quite hard to follow, so I rewrote it in the attached version.
> This version also has a bunch of minor cosmetic changes.  Barring
> objections, I'll commit it once the tree opens for v11 development.
>
>
Thanks Robert.

Patch changes looks good to me.



regards,
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] WIP: Failover Slots

2017-08-13 Thread Craig Ringer
On 12 August 2017 at 08:03, Andres Freund  wrote:

> On 2017-08-02 16:35:17 -0400, Robert Haas wrote:
> > I actually think failover slots are quite desirable, especially now
> > that we've got logical replication in core.  In a review of this
> > thread I don't see anyone saying otherwise.  The debate has really
> > been about the right way of implementing that.
>
> Given that I presumably was one of the people pushing back more
> strongly: I agree with that.  Besides disagreeing with the proposed
> implementation our disagreements solely seem to have been about
> prioritization.
>
> I still think we should have a halfway agreed upon *design* for logical
> failover, before we introduce a concept that's quite possibly going to
> be incompatible with that, however. But that doesn't mean it has to
> submitted/merged to core.
>

How could it be incompatible? The idea here is to make physical failover
transparent to logical decoding clients. That's not meant to sound
confrontational, I mean that I can't personally see any way it would be and
could use your ideas.

I understand that it might be *different* and you'd like to see more
closely aligned approaches that work more similarly. For which we first
need to know more clearly how logical failover will look. But it's hard not
to also see this as delaying and blocking until your preferred approach via
pure logical rep and logical failover gets in, and physical failover can be
dismissed with "we don't need that anymore". I'm sure that's not your
intent, I just struggle not to see it that way anyway when there's always
another reason not to proceed to solve this problem because of a loosely
related development effort on another problem.

I think there's a couple design goals we need to agree upon, before
> going into the weeds of how exactly we want this to work. Some of the
> axis I can think of are:
>
> - How do we want to deal with cascaded setups, do slots have to be
>   available everywhere, or not?
>

Personally, I don't care either way.


> - What kind of PITR integration do we want? Note that simple WAL based
>   slots do *NOT* provide proper PITR support, there's not enough
>   interlock easily available (you'd have to save slots at the end, then
>   increment minRecoveryLSN to a point later than the slot saving)
>

Interesting. I haven't fully understood this, but think I see what you're
getting at.

As outlined in the prior mail, I'd like to have working PITR with logical
slots but think it's pretty niche as it can't work usefully without plenty
of co-operation from the rest of the logical replication software in use.
You can't just restore and resume normal operations. So I don't think it's
worth making it a priority.

It's possible to make PITR safe with slots by blocking further advance of
catalog_xmin on the running master for the life of the PITR base backup
using a slot for retention. There's plenty of room for operator error
until/unless we add something like catalog_xmin advance xlog'ing, but it
can be done now with external tools if you're careful. Details in the prior
mail.

I don't think PITR for logical slots is important given there's a
workaround and it's not simple to actually do anything with it if you have
it.


> - How much divergence are we going to accept between logical decoding on
>   standbys, and failover slots. I'm probably a lot closer to closer than
>   than Craig is.
>

They're different things to me, but I think you're asking "to what extent
should failover slots functionality be implemented strictly on top of
decoding on standby?"

"Failover slots" provides a mechanism by which a logical decoding client
can expect a slot it creates on a master (or physical streaming replica
doing decoding on standby) to continue to exist. The client can ignore
physical HA and promotions of the master, which can continue to be managed
using normal postgres tools. It's the same as, say, an XA transaction
manager expecting that if your master dies and you fail over to a standby,
the TM should't have to have been doing special housekeeping on the
promotion candidate before promotion in order for 2PC to continue to work.
It Just Works.

Logical decoding on standby is useful with, or without, failover slots, as
you can use it to extract data from a replica, and now decoding timeline
following is in a decoding connection on a replica will survive promotion
to master.

But in addition to its main purpose of allowing logical decoding from a
standby server to offload work, it can be used to implement client-managed
support for failover to physical replicas. For this, the client must have
an inventory of promotion-candidates of the master and their connstrings so
it can maintain slots on them too. The client must be able to connect to
all promotion-candidates and advance their slots via decoding along with
the master slots it's actually replaying from. If a client isn't "told"
about a promotion candidate, decoding will break when 

[HACKERS] Simplify ACL handling for large objects and removal of superuser() checks

2017-08-13 Thread Michael Paquier
Hi all,

Recent commit 8d98819 has added a missing permissoin check to lo_put()
to make sure that the write permissions of the object are properly set
before writing to a large object. When studying the problem, we bumped
into the fact that it is possible to actually simplify those ACL
checks and replace them by checks when opening the large object. This
makes all the checks now in be-fsstubs.c happen in inv_api.c, which is
where all the large object handling happens at a lower level. This
way, it is also possible to remove the extra logic in place to have
the permission checks happen only once.

At the same time, we have bumped into two things:
- ALLOW_DANGEROUS_LO_FUNCTIONS has been introduced in 1999, so it
could be time to let it go. I have known of no place where this is
actively used.
- lo_import and lo_export on the backend have superuser checks. We
could remove them and replace them with proper REVOKE permissions to
allow administrators to give access to those functions.

Attached is a set of 3 patches:
- 0001 removes ALLOW_DANGEROUS_LO_FUNCTIONS
- 0002 replaces the superuser checks with GRANT permissions
- 0003 refactors the LO code to improve the ACL handling. Note that
this patch introduces a more debatable change: now there is no
distinction between INV_WRITE and INV_WRITE|INV_READ, as when
INV_WRITE is used INV_READ is implied. The patch as proposed does a
complete split of both permissions to give a system more natural and
more flexible. The current behavior is documented. Please note as well
that it is possible to implement a patch that keeps compatibility as
well, but I would welcome a debate on the matter. This patch owns also
a good deal to Tom.

I am parking them to the next commit fest for PG11.

Thanks,
-- 
Michael
From 2a2b2beddc5b01ffe6de7e442e20e00b4e518859 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 14 Aug 2017 12:01:58 +0900
Subject: [PATCH 3/3] Move ACL checks for large objects when opening them

Up to now, ACL checks for large objects happened at the the level of
the SQL-callable functions, which led to CVE-2017-7548 because of a
missing check. It is actually possible to simplify those checks when
opening a large object, which simplifies the code checking them so as
there is no need to track if a check has already been done or not and
this removes any risk of missing again permission checks if a function
related to large objects is added in the future.

A more debatable change that this patch introduces is that opening
a large object for write is equivalent to read-write, which is a
behavior documented, into a complete split of the checks. This gives
the user more flexibility at the risk of breaking some applications,
and makes the new behavior more natural.

Patch by Tom Lane and Michael Paquier.
---
 doc/src/sgml/lobj.sgml |  25 +++
 src/backend/catalog/objectaddress.c|   2 +-
 src/backend/libpq/be-fsstubs.c |  88 +---
 src/backend/storage/large_object/inv_api.c | 103 ++---
 src/backend/utils/misc/guc.c   |   2 +-
 src/include/libpq/be-fsstubs.h |   5 --
 src/include/storage/large_object.h |  13 ++--
 src/test/regress/expected/privileges.out   |   8 +--
 src/test/regress/sql/privileges.sql|   2 +-
 9 files changed, 121 insertions(+), 127 deletions(-)

diff --git a/doc/src/sgml/lobj.sgml b/doc/src/sgml/lobj.sgml
index 7757e1e441..2025545c75 100644
--- a/doc/src/sgml/lobj.sgml
+++ b/doc/src/sgml/lobj.sgml
@@ -276,20 +276,17 @@ int lo_open(PGconn *conn, Oid lobjId, int mode);
 
 
 
- The server currently does not distinguish between modes
- INV_WRITE and INV_READ |
- INV_WRITE: you are allowed to read from the descriptor
- in either case.  However there is a significant difference between
- these modes and INV_READ alone: with INV_READ
- you cannot write on the descriptor, and the data read from it will
- reflect the contents of the large object at the time of the transaction
- snapshot that was active when lo_open was executed,
- regardless of later writes by this or other transactions.  Reading
- from a descriptor opened with INV_WRITE returns
- data that reflects all writes of other committed transactions as well
- as writes of the current transaction.  This is similar to the behavior
- of REPEATABLE READ versus READ COMMITTED transaction
- modes for ordinary SQL SELECT commands.
+ With only INV_READ you cannot write on the descriptor,
+ and the data read from it will reflect the contents of the large object
+ at the time of the transaction snapshot that was active when
+ lo_open was executed, regardless of later writes by this
+ or other transactions.  Reading from a descriptor opened with
+ INV_WRITE or INV_READ |
+ INV_WRITE returns data that reflects all writes of
+ other committed transactions as well as writes of 

Re: [HACKERS] Comment in snapbuild.c file

2017-08-13 Thread Masahiko Sawada
On Sun, Aug 13, 2017 at 12:28 PM, Alvaro Herrera
 wrote:
> Alvaro Herrera wrote:
>> Masahiko Sawada wrote:
>> > Hi all,
>> >
>> > In snapbuild.c file, there is a comment as follows.
>> >
>> >* NB: Because of that xmax can be lower than xmin, because we only
>> >* increase xmax when a catalog modifying transaction commits. While odd
>> >* looking, it's correct and actually more efficient this way since we 
>> > hit
>> >* fast paths in tqual.c.
>> >*/
>>
>> I think the whole para needs to be rethought.
>
> Pushed, thanks.
>

Sorry for the late response. I agreed with your proposal. Thank you
for committing!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on unit testing?

2017-08-13 Thread Craig Ringer
On 14 August 2017 at 03:19, Peter Geoghegan  wrote:

> On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
>  wrote:
> > The current regression tests, isolation tests and TAP tests are very
> > good (though I admit my experience with TAP is limited), but IMHO we
> > are lacking support for C-level unit testing.  Complicated, fiddly
> > things with many states, interactions, edge cases etc can be hard to
> > get full test coverage on from the outside.  Consider
> > src/backend/utils/mmgr/freepage.c as a case in point.
>
> It is my understanding that much of the benefit of unit testing comes
> from maintainability. It's something that goes well with design by
> contract. Writing unit tests is a forcing function. It requires
> testable units, making the units more composable. The programmer must
> be very deliberate about state, and must delineate code as testable
> units. Unit tests are often supposed to be minimal, to serve as a kind
> of design document.


This is why I personally only find true unit tests useful as part of large
software packages like Pg when they cover functional units that can be
fairly well isolated.

However, I'm not sure that Thomas meant unit tests in the formal sense
isolated and mocked, but rather in the sense of test procedures written in
C to excercise portions of Pg's code that are not easily/conveniently
tested from SQL.

Pg lacks the strict module delineation needed to make formal unit testing
practical, as you have noted. That doesn't mean test frameworks couldn't be
useful. I routinely write tests for reasonably isolated functionality in my
code by writing throwaway SQL-callable functions to exercise it and
pg_regress tests to run them. It's not strict unit testing as the rest of
Pg's APIs aren't mocked away, but it's very practical small-unit
integration testing that helps catch issues.

I wouldn't mind having an easier and nicer way to do that built in to Pg,
but don't have many ideas about practical, low-maintenance ways to achieve
it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Comment typo in plannodes.h

2017-08-13 Thread Etsuro Fujita

On 2017/08/11 2:18, Robert Haas wrote:

On Thu, Aug 10, 2017 at 8:25 AM, Etsuro Fujita
 wrote:

Here is a small patch for fixing a comment typo in plannodes.h: s/all the
partitioned table/all the partitioned tables/.


Committed.


Thank you.

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-13 Thread Tom Lane
Andres Freund  writes:
> You both are obviously right.  Another point of potential concern could
> be that we'd pretyt much fully rely on dsm/dht's being available, for
> the server to function correctly. Are we ok with that? Right now
> postgres still works perfectly well, leaving parallelism aside, with
> dynamic_shared_memory_type = none.

In principle we could keep the existing mechanism as a fallback.
Whether that's worth the trouble is debatable.  The current code
in initdb believes that every platform has some type of DSM support
(see choose_dsm_implementation).  Nobody's complained about that,
and it certainly works on every buildfarm animal.  So for all we know,
dynamic_shared_memory_type = none is broken already.

> I think there's two other relevant points here:

> a) It'd be quite useful to avoid needing a whole cluster's stats in
>memory. Even if $subject would save memory, I'm hesitant committing
>to something requiring all stats to be in memory forever. As a first
>step it seems reasonable to e.g. not require state for all databases
>to be in memory. The first time per-database stats are required, it
>could be "paged in". Could even be more aggressive and do that on a
>per-table level and just have smaller placeholder entries for
>non-accessed tables, but that seems more work.

Huh?  As long as we don't lock the shared memory into RAM, which on most
platforms we couldn't do without being root anyway, ISTM the kernel should
do just fine at paging out little-used areas of the stats.  Let's not
reinvent that wheel until there's demonstrable proof of need.

> b) I think our tendency to dump all stats whenever we crash isn't really
>tenable, given how autovacuum etc are tied to them.

Eh, maybe.  I don't think crashes are really so common on production
systems.  As developers we have an inflated view of their frequency ;-)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql 10: hash indexes testing

2017-08-13 Thread Amit Kapila
On Mon, Aug 14, 2017 at 6:10 AM, AP  wrote:
> On Fri, Aug 11, 2017 at 07:33:51AM +0530, Amit Kapila wrote:
>> On Thu, Aug 10, 2017 at 4:11 PM, AP  wrote:
>> > mdstash=# select * from pgstathashindex('link_datum_id_idx');
>> >  version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | 
>> > live_items | dead_items |   free_percent
>> > -+--++--+--+++--
>> >4 | 12391325 |5148912 |  158 |   191643 | 
>> > 4560007478 |  0 | 1894.29056075982
>> > (1 row)
>>
>> The free_percent calculation seems to be wrong.  Can you please once
>> try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf?  I
>> feel this should be fixed by that commit.
>
> Sorry I couldn't get to help you debugging this myself. Work got annoying. :/
>
> That said, I think that this is the first time that I've seen the value be
> under 100:
>

Thanks for confirming the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] shared memory based stat collector (was: Sharing record typmods between backends)

2017-08-13 Thread Andres Freund
Hi,

Since we're getting a bit into the weeds of a different topic, and since
I think it's an interesting feature, I'm detaching this into a separate
thread.

On 2017-08-12 23:37:27 -0400, Tom Lane wrote:
> >> On 2017-08-12 22:52:57 -0400, Robert Haas wrote:
> >>> I think it'd be pretty interesting to look at replacing parts of the
> >>> stats collector machinery with something DHT-based.
> > On Sat, Aug 12, 2017 at 11:30 PM, Andres Freund  wrote:
> >> That seems to involve a lot more than this though, given that currently
> >> the stats collector data doesn't entirely have to be in memory. I've
> >> seen sites with a lot of databases with quite some per-database stats
> >> data. Don't think we can just require that to be in memory :(
>
> Robert Haas  writes:
> > Hmm.  I'm not sure it wouldn't end up being *less* memory.  Don't we
> > end up caching 1 copy of it per backend, at least for the database to
> > which that backend is connected?  Accessing a shared copy would avoid
> > that sort of thing.
>
> Yeah ... the collector itself has got all that in memory anyway.
> We do need to think about synchronization issues if we make that
> memory globally available, but I find it hard to see how that would
> lead to more memory consumption overall than what happens now.

You both are obviously right.  Another point of potential concern could
be that we'd pretyt much fully rely on dsm/dht's being available, for
the server to function correctly. Are we ok with that? Right now
postgres still works perfectly well, leaving parallelism aside, with
dynamic_shared_memory_type = none.


What are your thoughts about how to actually implement this? It seems
we'd have to do something like:

1) Keep the current per-backend & per-transaction state in each
   backend. That allows both to throw away the information and avoids
   increasing contention quite noticeably.

2) Some plain shared memory with metadata.  A set of shared hashtables
   for per database, per relation contents.

3) Individual database/relation entries are either individual atomics
   (we don't rely on consistency anyway), or seqcount (like
   st_changecount) based.

4) Instead of sending stats at transaction end, copy them into a
   "pending" entry.  Nontransactional contents can be moved to
   the pending entry more frequently.

5) Occasionally, try to flush the pending array into the global hash.
   The lookup in the table would be protected by something
   LWLockConditionalAcquire() based, to avoid blocking - don't want to
   introduce chokepoints due to commonly used tables and such.  Updating
   the actual stats can happen without the partition locks being held.

I think there's two other relevant points here:

a) It'd be quite useful to avoid needing a whole cluster's stats in
   memory. Even if $subject would save memory, I'm hesitant committing
   to something requiring all stats to be in memory forever. As a first
   step it seems reasonable to e.g. not require state for all databases
   to be in memory. The first time per-database stats are required, it
   could be "paged in". Could even be more aggressive and do that on a
   per-table level and just have smaller placeholder entries for
   non-accessed tables, but that seems more work.

   On the other hand, autoavcuum is likely going to make that approach
   useless anyway, given it's probably going to access otherwise unneded
   stats regularly.

b) I think our tendency to dump all stats whenever we crash isn't really
   tenable, given how autovacuum etc are tied to them. We should think
   about ways to avoid that if we're going to do a major rewrite of the
   stats stuff, which this certainly sounds like.


If there weren't HS to worry about, these two points kinda sound like
the data should be persisted into an actual table, rather than some
weird other storage format. But HS seems to make that untenable.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH : Generational memory allocator (was PATCH: two slab-like memory allocators)

2017-08-13 Thread Tomas Vondra

Hi,

Attached is a rebased version of the Generational context, originally 
submitted with SlabContext (which was already committed into Pg 10).


The main change is that I've abandoned the pattern of defining a Data 
structure and then a pointer typedef, i.e.


typedef struct GenerationContextData { ... } GenerationContextData;
typedef struct GenerationContextData *GenerationContext;

Now it's just

typedef struct GenerationContext { ... } GenerationContext;

mostly because SlabContext was committed like that, and because Andres 
was complaining about this code pattern ;-)


Otherwise the design is the same as repeatedly discussed before.

To show that this is still valuable change (even after SlabContext and 
adding doubly-linked list to AllocSet), I've repeated the test done by 
Andres in [1] using the test case described in [2], that is


  -- generate data
  SELECT COUNT(*) FROM (SELECT test1()
  FROM generate_series(1, 5)) foo;

  -- benchmark (measure time and VmPeak)
  SELECT COUNT(*) FROM (SELECT *
  FROM pg_logical_slot_get_changes('test', NULL,
NULL, 'include-xids', '0')) foo;

with different values passed to the first step (instead of the 5). 
The VmPeak numbers look like this:


 N   masterpatched
--
10   1155220 kB  361604 kB
20   2020668 kB  434060 kB
30   2890236 kB  502452 kB
40   3751592 kB  570816 kB
50   4621124 kB  639168 kB

and the timing (on assert-enabled build):

 N   masterpatched
--
10  1103.182 ms 412.734 ms
20  2216.711 ms 820.438 ms
30  3320.095 ms1223.576 ms
40  4584.919 ms1621.261 ms
50  5590.444 ms2113.820 ms

So it seems it's still a significant improvement, both in terms of 
memory usage and timing. Admittedly, this is a single test, so ideas of 
other useful test cases are welcome.


regards


[1] 
https://www.postgresql.org/message-id/20170227111732.vrx5v72ighehwpkf%40alap3.anarazel.de


[2] 
https://www.postgresql.org/message-id/20160706185502.1426.28143%40wrigleys.postgresql.org


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1c46d25ffa9bb104c415cba7c7b3a013958b6ab5 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 14 Aug 2017 01:52:50 +0200
Subject: [PATCH] Generational memory allocator

This memory context is based on the assumption that the allocated chunks
have similar lifespan, i.e. that chunks allocated close from each other
(by time) will also be freed in close proximity, and mostly in the same
order. This is typical for various queue-like use cases, i.e. when
tuples are constructed, processed and then thrown away.

The memory context uses a very simple approach to free space management.
Instead of a complex global freelist, each block tracks a number
of allocated and freed chunks. The space released by freed chunks is not
reused, and once all chunks are freed (i.e. when nallocated == nfreed),
the whole block is thrown away. When the allocated chunks have similar
lifespan, this works very well and is extremely cheap.
---
 src/backend/replication/logical/reorderbuffer.c |  74 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/generation.c | 768 
 src/include/nodes/memnodes.h|   4 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   5 +
 7 files changed, 790 insertions(+), 79 deletions(-)
 create mode 100644 src/backend/utils/mmgr/generation.c

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5567bee..5309170 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -150,15 +150,6 @@ typedef struct ReorderBufferDiskChange
  */
 static const Size max_changes_in_memory = 4096;
 
-/*
- * We use a very simple form of a slab allocator for frequently allocated
- * objects, simply keeping a fixed number in a linked list when unused,
- * instead pfree()ing them. Without that in many workloads aset.c becomes a
- * major bottleneck, especially when spilling to disk while decoding batch
- * workloads.
- */
-static const Size max_cached_tuplebufs = 4096 * 2;	/* ~8MB */
-
 /* ---
  * primary reorderbuffer support routines
  * ---
@@ -248,6 +239,10 @@ ReorderBufferAllocate(void)
 			SLAB_DEFAULT_BLOCK_SIZE,
 			sizeof(ReorderBufferTXN));
 
+	

Re: [HACKERS] POC: Sharing record typmods between backends

2017-08-13 Thread Andres Freund
Hi,

On 2017-08-11 20:39:13 +1200, Thomas Munro wrote:
> Please find attached a new patch series.

Review for 0001:

I think you made a few long lines even longer, like:

@@ -1106,11 +1106,11 @@ pltcl_trigger_handler(PG_FUNCTION_ARGS, 
pltcl_call_state *call_state,
Tcl_ListObjAppendElement(NULL, tcl_trigtup, Tcl_NewObj());
for (i = 0; i < tupdesc->natts; i++)
{
-   if (tupdesc->attrs[i]->attisdropped)
+   if (TupleDescAttr(tupdesc, i)->attisdropped)
Tcl_ListObjAppendElement(NULL, tcl_trigtup, 
Tcl_NewObj());
else
Tcl_ListObjAppendElement(NULL, tcl_trigtup,
-   
 Tcl_NewStringObj(utf_e2u(NameStr(tupdesc->attrs[i]->attname)), -1));
+   
 Tcl_NewStringObj(utf_e2u(NameStr(TupleDescAttr(tupdesc, i)->attname)), -1));


as it's not particularly pretty to access tupdesc->attrs[i] repeatedly,
it'd be good if you instead had a local variable for the individual
attribute.

Similar:
if 
(OidIsValid(get_base_element_type(TupleDescAttr(tupdesc, i)->atttypid)))
sv = plperl_ref_from_pg_array(attr, 
TupleDescAttr(tupdesc, i)->atttypid);
else if ((funcid = 
get_transform_fromsql(TupleDescAttr(tupdesc, i)->atttypid, 
current_call_data->prodesc->lang_oid, current_call_data->prodesc->trftypes)))
sv = (SV *) 
DatumGetPointer(OidFunctionCall1(funcid, attr));


@@ -150,7 +148,7 @@ ValuesNext(ValuesScanState *node)
 */
values[resind] = 
MakeExpandedObjectReadOnly(values[resind],

isnull[resind],
-   
att[resind]->attlen);
+   
TupleDescAttr(slot->tts_tupleDescriptor, 
resind)->attlen);

@@ -158,9 +158,9 @@ convert_tuples_by_position(TupleDesc indesc,
 * must agree.
 */
if (attrMap[i] == 0 &&
-   indesc->attrs[i]->attisdropped &&
-   indesc->attrs[i]->attlen == 
outdesc->attrs[i]->attlen &&
-   indesc->attrs[i]->attalign == 
outdesc->attrs[i]->attalign)
+   TupleDescAttr(indesc, i)->attisdropped &&
+   TupleDescAttr(indesc, i)->attlen == 
TupleDescAttr(outdesc, i)->attlen &&
+   TupleDescAttr(indesc, i)->attalign == 
TupleDescAttr(outdesc, i)->attalign)
continue;


I think you get the drift, there's more

Otherwise this seems fairly boring...


Review for 0002:

@@ -71,17 +71,17 @@ typedef struct tupleConstr
 typedef struct tupleDesc
 {
int natts;  /* number of attributes 
in the tuple */
-   Form_pg_attribute *attrs;
-   /* attrs[N] is a pointer to the description of Attribute Number N+1 */
TupleConstr *constr;/* constraints, or NULL if none */
Oid tdtypeid;   /* composite type ID 
for tuple type */
int32   tdtypmod;   /* typmod for tuple type */
booltdhasoid;   /* tuple has oid attribute in 
its header */
int tdrefcount; /* reference count, or 
-1 if not counting */
+   /* attrs[N] is the description of Attribute Number N+1 */
+   FormData_pg_attribute attrs[FLEXIBLE_ARRAY_MEMBER];
 } *TupleDesc;

sorry if I'm beating on my hobby horse, but if we're re-ordering anyway,
can you move TupleConstr to the second-to-last? That a) seems more
consistent but b) (hobby horse, sorry) avoids unnecessary alignment
padding.


@@ -734,13 +708,13 @@ BuildDescForRelation(List *schema)
/* Override TupleDescInitEntry's settings as requested */
TupleDescInitEntryCollation(desc, attnum, attcollation);
if (entry->storage)
-   desc->attrs[attnum - 1]->attstorage = entry->storage;
+   desc->attrs[attnum - 1].attstorage = entry->storage;

/* Fill in additional stuff not handled by TupleDescInitEntry */
-   desc->attrs[attnum - 1]->attnotnull = entry->is_not_null;
+   desc->attrs[attnum - 1].attnotnull = entry->is_not_null;
has_not_null |= entry->is_not_null;
-   desc->attrs[attnum - 1]->attislocal 

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2017 at 2:22 PM, Andres Freund  wrote:
> On 2017-08-13 16:55:33 -0400, Tom Lane wrote:
>> Peter Geoghegan  writes:
>> > I think that it's useful for these things to be handled in an
>> > adversarial manner, in the same way that litigation is adversarial in
>> > a common law court. I doubt that Noah actually set out to demoralize
>> > anyone. He is just doing the job he was assigned.
>>
>> FWIW, I agree that Noah is just carrying out the RMT's task as
>> assigned.
>
> Well, then that's a sign that the tasks/process need to be rescoped.

Why? I might agree if the RMT had an outsized influence on final
outcome. If that's the case, then it's something that I missed.

I also don't think that it's fair to expect Noah to spend a lot of
time poring over whether sending out one of those pro forma status
update policy violation mails is warranted. I expect someone is his
position to aim to err on the side of sending out too many of those.
It's easy for the patch author to explain the situation, but hard for
the RMT to understand the situation fully at all times.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Andres Freund
On 2017-08-13 17:43:10 -0400, Robert Haas wrote:
> On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
> >> I'd vote for including this in v10.  There doesn't seem to be any
> >> downside to this: it's a no brainer to avoid our exploding hash table
> >> case when we can see it coming.
> >
> > Anybody else want to vote that way?  For myself it's getting a bit late
> > in the beta process to be including inessential changes, but I'm willing
> > to push it to v10 not just v11 if there's multiple people speaking for
> > that.
> 
> I'd vote for waiting until v11.  I think it's too late to be doing
> things that might change good plans into bad ones or visca versa;
> that's a recipe for having to put out 10.1 and 10.2 a little quicker
> than I'd like.

Similar here, there doesn't seem to be that much urgency.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-13 Thread Robert Haas
On Sun, Aug 13, 2017 at 3:05 PM, Tom Lane  wrote:
> Not having heard anyone arguing against that, I'll go make it so,
> ie AtEOXact_CatCache is toast in all branches.

Great, thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Robert Haas
On Sun, Aug 13, 2017 at 5:24 PM, Tom Lane  wrote:
>> I'd vote for including this in v10.  There doesn't seem to be any
>> downside to this: it's a no brainer to avoid our exploding hash table
>> case when we can see it coming.
>
> Anybody else want to vote that way?  For myself it's getting a bit late
> in the beta process to be including inessential changes, but I'm willing
> to push it to v10 not just v11 if there's multiple people speaking for
> that.

I'd vote for waiting until v11.  I think it's too late to be doing
things that might change good plans into bad ones or visca versa;
that's a recipe for having to put out 10.1 and 10.2 a little quicker
than I'd like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Tom Lane
Andres Freund  writes:
> As a measure of last restart we could add a libpq workaround that forces
> a pqSocketCheck() at the right moment, while still establishing a
> connection.  That's not good from an interruptability perspective, but
> better than blocking for the entire connection establishment.

Probably a better idea is to fix things so that can't-send-yet results
in returning PGRES_POLLING_WRITING rather than a failure.  That would
result in a busy-wait in this scenario on Windows, which is arguably
better than blocking.  It would also make this part of libpq more robust
against applications being sloppy about socket readiness checks (which,
you could argue, is exactly what libpqwalreceiver is being).  But it
would be a somewhat ticklish change because of the portability hazards,
so I'm really disinclined to do it this late in beta.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Tom Lane
Thomas Munro  writes:
> On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:
>> 1. check-hash-bucket-size-against-work_mem-2.patch from
>> https://www.postgresql.org/message-id/13698.1487283...@sss.pgh.pa.us

> +1

> I'd vote for including this in v10.  There doesn't seem to be any
> downside to this: it's a no brainer to avoid our exploding hash table
> case when we can see it coming.

Anybody else want to vote that way?  For myself it's getting a bit late
in the beta process to be including inessential changes, but I'm willing
to push it to v10 not just v11 if there's multiple people speaking for
that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Andres Freund
On 2017-08-13 16:55:33 -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> > I think that it's useful for these things to be handled in an
> > adversarial manner, in the same way that litigation is adversarial in
> > a common law court. I doubt that Noah actually set out to demoralize
> > anyone. He is just doing the job he was assigned.
> 
> FWIW, I agree that Noah is just carrying out the RMT's task as
> assigned.

Well, then that's a sign that the tasks/process need to be rescoped.


> However, the only thing that Peter could really do about this of his own
> authority is to revert 1e8a850, which I certainly think should be the last
> resort not the first.  We are continuing to make progress towards finding
> a better solution, I think, and since no particular deadline is imminent
> we should let that process play out.

I agree that that'd be a bad fix. There's other things that should, but
don't, really use the asynchronous API, e.g. postgres_fdw.

As a measure of last restart we could add a libpq workaround that forces
a pqSocketCheck() at the right moment, while still establishing a
connection.  That's not good from an interruptability perspective, but
better than blocking for the entire connection establishment.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Tom Lane
"Augustine, Jobin"  writes:
> Appears that patch is not helping.
> Errors like below are still appearing in the log
> ===
> 2017-08-11 12:22:35 UTC [2840]: [1-1] user=,db=,app=,client= FATAL:  could
> not connect to the primary server: could not send data to server: Socket is
> not connected (0x2749/10057)
> could not send startup packet: Socket is not connected
> (0x2749/10057)

BTW, I notice that this is a bit different from your first report, because
that mentioned "could not send SSL negotiation packet" rather than "could
not send startup packet".  If the initial report was about a setup in
which SSL was enabled, and in this report's setup it isn't, then that
difference is unsurprising; but otherwise it needs explanation.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Tom Lane
Peter Geoghegan  writes:
> I think that it's useful for these things to be handled in an
> adversarial manner, in the same way that litigation is adversarial in
> a common law court. I doubt that Noah actually set out to demoralize
> anyone. He is just doing the job he was assigned.

FWIW, I agree that Noah is just carrying out the RMT's task as assigned.
However, the only thing that Peter could really do about this of his own
authority is to revert 1e8a850, which I certainly think should be the last
resort not the first.  We are continuing to make progress towards finding
a better solution, I think, and since no particular deadline is imminent
we should let that process play out.

I think what Peter should do to satisfy the RMT process is to state that
if no better solution is found by date X, he will revert 1e8a850 (or
somehow mask the problem by removing functionality), and he sees no
need for additional progress reports before that.  Date X could be
sometime early in September, perhaps.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING

2017-08-13 Thread Marko Tiikkaja
On Fri, Jul 1, 2016 at 2:12 AM, I wrote:

> Currently the tuple returned by INSTEAD OF triggers on DELETEs is only
> used to determine whether to pretend that the DELETE happened or not, which
> is often not helpful enough; for example, the actual tuple might have been
> concurrently UPDATEd by another transaction and one or more of the columns
> now hold values different from those in the planSlot tuple. Attached is an
> example case which is impossible to properly implement under the current
> behavior.  For what it's worth, the current behavior seems to be an
> accident; before INSTEAD OF triggers either the tuple was already locked
> (in case of BEFORE triggers) or the actual pre-DELETE version of the tuple
> was fetched from the heap.
>
> So I'm suggesting to change this behavior and allow INSTEAD OF DELETE
> triggers to modify the OLD tuple and use that for RETURNING instead of
> returning the tuple in planSlot.  Attached is a WIP patch implementing that.
>
> Is there any reason why we wouldn't want to change the current behavior?


Since nobody seems to have came up with a reason, here's a patch for that
with test cases and some documentation changes.  I'll also be adding this
to the open commit fest, as is customary.


.m


instead_of_delete_returning_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Tom Lane
Andres Freund  writes:
> On 2017-08-11 18:11:03 +0530, Augustine, Jobin wrote:
>> Appears that patch is not helping.

> That's too bad. Any chance you could install
> https://docs.microsoft.com/en-us/sysinternals/downloads/procmon and
> activate monitoring just for that phase? I think it can export to a text
> file...

It strikes me that maybe the misuse of io_flag could be contributing
to this: if the walreceiver process's latch were set, we'd end up calling
PQconnectPoll before the socket had necessarily come ready, which would
produce the described symptom.  That's grasping at straws admittedly,
because I'm not sure why the walreceiver process's latch would be set
at this point; but it seems like we ought to test a version of the patch
that we believe correct before deciding that we still have a problem.

To move things along, here's a corrected patch --- Jobin, please test.

regards, tom lane

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index de03362..37b481c 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -168,13 +168,18 @@ libpqrcv_connect(const char *conninfo, bool logical, const char *appname,
 	status = PGRES_POLLING_WRITING;
 	do
 	{
-		/* Wait for socket ready and/or other events. */
 		int			io_flag;
 		int			rc;
 
-		io_flag = (status == PGRES_POLLING_READING
-   ? WL_SOCKET_READABLE
-   : WL_SOCKET_WRITEABLE);
+		if (status == PGRES_POLLING_READING)
+			io_flag = WL_SOCKET_READABLE;
+#ifdef WIN32
+		/* Windows needs a different test while waiting for connection-made */
+		else if (PQstatus(conn->streamConn) == CONNECTION_STARTED)
+			io_flag = WL_SOCKET_CONNECTED;
+#endif
+		else
+			io_flag = WL_SOCKET_WRITEABLE;
 
 		rc = WaitLatchOrSocket(MyLatch,
 			   WL_POSTMASTER_DEATH |
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 07b1364..c7345de 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -374,11 +374,11 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 		AddWaitEventToSet(set, WL_POSTMASTER_DEATH, PGINVALID_SOCKET,
 		  NULL, NULL);
 
-	if (wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))
+	if (wakeEvents & WL_SOCKET_MASK)
 	{
 		int			ev;
 
-		ev = wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
+		ev = wakeEvents & WL_SOCKET_MASK;
 		AddWaitEventToSet(set, ev, sock, NULL, NULL);
 	}
 
@@ -390,8 +390,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	{
 		ret |= event.events & (WL_LATCH_SET |
 			   WL_POSTMASTER_DEATH |
-			   WL_SOCKET_READABLE |
-			   WL_SOCKET_WRITEABLE);
+			   WL_SOCKET_MASK);
 	}
 
 	FreeWaitEventSet(set);
@@ -640,10 +639,12 @@ FreeWaitEventSet(WaitEventSet *set)
  * Add an event to the set. Possible events are:
  * - WL_LATCH_SET: Wait for the latch to be set
  * - WL_POSTMASTER_DEATH: Wait for postmaster to die
- * - WL_SOCKET_READABLE: Wait for socket to become readable
- *	 can be combined in one event with WL_SOCKET_WRITEABLE
- * - WL_SOCKET_WRITEABLE: Wait for socket to become writeable
- *	 can be combined with WL_SOCKET_READABLE
+ * - WL_SOCKET_READABLE: Wait for socket to become readable,
+ *	 can be combined in one event with other WL_SOCKET_* events
+ * - WL_SOCKET_WRITEABLE: Wait for socket to become writeable,
+ *	 can be combined with other WL_SOCKET_* events
+ * - WL_SOCKET_CONNECTED: Wait for socket connection to be established,
+ *	 can be combined with other WL_SOCKET_* events
  *
  * Returns the offset in WaitEventSet->events (starting from 0), which can be
  * used to modify previously added wait events using ModifyWaitEvent().
@@ -652,9 +653,9 @@ FreeWaitEventSet(WaitEventSet *set)
  * i.e. it must be a process-local latch initialized with InitLatch, or a
  * shared latch associated with the current process by calling OwnLatch.
  *
- * In the WL_SOCKET_READABLE/WRITEABLE case, EOF and error conditions are
- * reported by returning the socket as readable/writable or both, depending on
- * WL_SOCKET_READABLE/WRITEABLE being specified.
+ * In the WL_SOCKET_READABLE/WRITEABLE/CONNECTED cases, EOF and error
+ * conditions are reported by returning the socket as readable/writable or
+ * both, depending on WL_SOCKET_READABLE/WRITEABLE/CONNECTED being specified.
  *
  * The user_data pointer specified here will be set for the events returned
  * by WaitEventSetWait(), allowing to easily associate additional data with
@@ -685,8 +686,7 @@ AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch,
 	}
 
 	/* waiting for socket readiness without a socket indicates a bug */
-	if (fd == PGINVALID_SOCKET &&
-		(events & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)))
+	if (fd == PGINVALID_SOCKET && (events & (WL_SOCKET_MASK)))
 		elog(ERROR, 

Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Peter Geoghegan
On Sun, Aug 13, 2017 at 12:57 PM, Andres Freund  wrote:
> FWIW, I'm personally quite demotivated by this style of handling
> issues. You're essentially saying that any code change, even if it just
> increases exposure of a preexisting bug, needs to be handled by the
> committer of the exposing change. And even if that bug is on a platform
> the committer doesn't have.  And all that despite the issue getting
> attention.

I don't think you can generalize from what Noah said like that,
because it's always a matter of degree (the degree to which the
preexisting bug was a problem). Abbreviated keys for collated text
were disabled, though not due to bug in strxfrm(). Technically, it was
due to a bug in strcoll(), which glibc always had. strxfrm() therefore
only failed to be bug compatible with glibc's strcoll(). Does that
mean that we were wrong to disable the use of strxfrm() for
abbreviated keys?

I think that it's useful for these things to be handled in an
adversarial manner, in the same way that litigation is adversarial in
a common law court. I doubt that Noah actually set out to demoralize
anyone. He is just doing the job he was assigned.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Andres Freund
Hi,

On 2017-08-11 18:11:03 +0530, Augustine, Jobin wrote:
> Appears that patch is not helping.
> Errors like below are still appearing in the log
> ===
> 2017-08-11 12:22:35 UTC [2840]: [1-1] user=,db=,app=,client= FATAL:  could
> not connect to the primary server: could not send data to server: Socket is
> not connected (0x2749/10057)
> could not send startup packet: Socket is not connected
> (0x2749/10057)
> 2017-08-11 12:22:40 UTC [1964]: [1-1] user=,db=,app=,client= FATAL:  could
> not connect to the primary server: could not send data to server: Socket is
> not connected (0x2749/10057)
> could not send startup packet: Socket is not connected
> (0x2749/10057)
> 2017-08-11 12:22:45 UTC [248]: [1-1] user=,db=,app=,client= FATAL:  could
> not connect to the primary server: could not send data to server: Socket is
> not connected (0x2749/10057)
> could not send startup packet: Socket is not connected
> (0x2749/10057)
> ===

That's too bad. Any chance you could install
https://docs.microsoft.com/en-us/sysinternals/downloads/procmon and
activate monitoring just for that phase? I think it can export to a text
file...

Greetings,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Andres Freund
On 2017-08-11 20:56:22 -0700, Noah Misch wrote:
> > > If nobody volunteers, you could always resolve this by reverting 1e8a850 
> > > and
> > > successors.
> > 
> > I think you're blaming the victim.  Our current theory about the cause
> > of this is that on Windows, WaitLatchOrSocket cannot be used to wait for
> > completion of a nonblocking connect() call.  That seems pretty broken
> > independently of whether libpqwalreceiver needs the capability.
> 
> Yes, the theorized defect lies in APIs commit 1e8a850 used, not in the commit
> itself.  Nonetheless, commit 1e8a850 promoted the defect from one reachable
> only by writing C code to one reachable by merely configuring replication on
> Windows according to the documentation.  For that, its committer owns this
> open item.  Besides the one approach I mentioned, there exist several other
> fine ways to implement said ownership.

FWIW, I'm personally quite demotivated by this style of handling
issues. You're essentially saying that any code change, even if it just
increases exposure of a preexisting bug, needs to be handled by the
committer of the exposing change. And even if that bug is on a platform
the committer doesn't have.  And all that despite the issue getting
attention.

I'm not ok with this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-13 Thread Noah Misch
On Fri, Aug 11, 2017 at 08:56:22PM -0700, Noah Misch wrote:
> On Thu, Aug 10, 2017 at 09:59:40PM -0400, Tom Lane wrote:
> > Noah Misch  writes:
> > > On Tue, Aug 08, 2017 at 07:25:37PM -0400, Peter Eisentraut wrote:
> > >> I don't think I can usefully contribute to this.  Could someone else
> > >> take it?
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-08-14 20:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-08-13 Thread Pavel Stehule
Hi

2017-08-13 20:33 GMT+02:00 Fabien COELHO :

>
> Here is a v11.
>
> It is basically a simple rebase after Tom committed the "pgbench -M order"
> patch. It interfered because the compound command management also needs
> to delay part of the SQL command initialization. Some patch are luckier
> than others:-)
>
> Here is a v10:
>>
>> - does not talk about ASCII variable name constraint, as a patch has been
>>   submitted independently to lift this constraint.
>>
>> - rename gcset to cset (compound set, \; + \set), where gset is ; + \set,
>>   because "\gcset" looked really strange.
>>
>> - simplify the code a little bit.
>>
>> Also attached is an updated test script.
>>
>>
looks little bit strange, but it has sense

+1

Pavel


>>
> --
> Fabien.


[HACKERS] Re: ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

2017-08-13 Thread Noah Misch
On Thu, Aug 10, 2017 at 04:51:16AM +, Noah Misch wrote:
> On Mon, Aug 07, 2017 at 06:23:56PM -0400, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > On 8/6/17 20:07, Peter Geoghegan wrote:
> > >> I've looked into this. I'll give an example of what keyword variants
> > >> there are for Greek, and then discuss what I think each is.
> > 
> > > I'm not sure why we want to get into editorializing this.  We query ICU
> > > for the names of distinct collations and use that.  It's more than most
> > > people need, sure, but it doesn't cost us anything.
> > 
> > Yes, *it does*.  The cost will be borne by users who get screwed at update
> > time, not by developers, but that doesn't make it insignificant.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Thoughts on unit testing?

2017-08-13 Thread Peter Geoghegan
On Thu, Aug 10, 2017 at 2:53 PM, Thomas Munro
 wrote:
> The current regression tests, isolation tests and TAP tests are very
> good (though I admit my experience with TAP is limited), but IMHO we
> are lacking support for C-level unit testing.  Complicated, fiddly
> things with many states, interactions, edge cases etc can be hard to
> get full test coverage on from the outside.  Consider
> src/backend/utils/mmgr/freepage.c as a case in point.

It is my understanding that much of the benefit of unit testing comes
from maintainability. It's something that goes well with design by
contract. Writing unit tests is a forcing function. It requires
testable units, making the units more composable. The programmer must
be very deliberate about state, and must delineate code as testable
units. Unit tests are often supposed to be minimal, to serve as a kind
of design document.

In order to unit test something like an index access method, or
VACUUM, or the transaction manager, it would be necessary to mock the
buffer manager, and have a large amount of scaffolding code. This
would be an awful lot of work to do as an afterthought. I wouldn't
know where to draw the line between one unit and another, because the
code just wasn't written with that in mind to begin with. Is snapshot
acquisition a unit that includes heapam.c? If so, does that mean that
heapam.c has to have its own unit for other functionality, or does
that get lumped in with the snapshot acquisition unit?

As I've said a couple of times already, one thought behind amcheck was
that it would allow us to formalize the design of access methods in a
way that somewhat resembles unit testing. The code that we already
have for nbtree is extremely well commented, and is intended to
document how nbtree really works in terms of invariants. Although the
tests are very comprehensive, there really isn't all that much code,
because it's supposed to minimally describe correct states for a
B-Tree. It manages to do this well, perhaps because the Lehman & Yao
design formalizes everything as a small number of per-page atomic
operations, and is very clear about legal and illegal states.

It would be nice if amcheck eventually *guided* the development of the
heap access method. If comprehensive testing of invariants for heapam
becomes very complicated, it could be instructive to consider how to
refactor heapam to make the required verification code simpler but no
less comprehensive. Mostly, this is desirable because it will lead to
an overall simpler, more robust design, and because it lowers the
barrier to entry to assess the correctness of the code. Of course,
this requires buy-in from patch authors to work.

To put it another way, a patch author whose patch touches storage
implicitly asserts that their patch is correct. It would be useful if
they provided a precise falsifiable statement about its correctness up
front, in the form of verification code.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-13 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 8, 2017 at 10:48 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane  wrote:
 In the meantime, I think my vote would be to remove AtEOXact_CatCache.

>>> In all supported branches?

>> Whatever we do about this issue, I don't feel a need to do it further
>> back than HEAD.  It's a non-problem except in an assert-enabled build,
>> and we don't recommend running those for production, only development.

> Sure, but people still do testing and development against older
> branches - bug fixes, for example.  It doesn't make much sense to me
> to leave code that we know does the wrong thing in the back branches.

Not having heard anyone arguing against that, I'll go make it so,
ie AtEOXact_CatCache is toast in all branches.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-08-13 Thread Fabien COELHO


Here is a v11.

It is basically a simple rebase after Tom committed the "pgbench -M order" 
patch. It interfered because the compound command management also needs
to delay part of the SQL command initialization. Some patch are luckier 
than others:-)



Here is a v10:

- does not talk about ASCII variable name constraint, as a patch has been
  submitted independently to lift this constraint.

- rename gcset to cset (compound set, \; + \set), where gset is ; + \set,
  because "\gcset" looked really strange.

- simplify the code a little bit.

Also attached is an updated test script.




--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 03e1212..ea154bc 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -816,6 +816,51 @@ pgbench  options  dbname
   
 
   
+   
+
+ \cset [prefix] or
+ \gset [prefix]
+
+
+
+ 
+  These commands may be used to end SQL queries, replacing a semicolon.
+  \cset replaces an embedded semicolon (\;) within
+  a compound SQL command, and \gset replaces a final
+  (;) semicolon which ends the SQL command. 
+ 
+
+ 
+  When these commands are used, the preceding SQL query is expected to
+  return one row, the columns of which are stored into variables named after
+  column names, and prefixed with prefix if provided.
+ 
+
+ 
+  The following example puts the final account balance from the first query
+  into variable abalance, and fills variables
+  one, two and p_three with
+  integers from a compound query.
+
+UPDATE pgbench_accounts
+  SET abalance = abalance + :delta
+  WHERE aid = :aid
+  RETURNING abalance \gset
+-- compound of two queries
+SELECT 1 AS one, 2 AS two \cset
+SELECT 3 AS three \gset p_
+
+ 
+
+ 
+  
+\cset and \gset commands do not work when
+empty SQL queries appear within a compound SQL command.
+  
+ 
+
+   
+

 
  \set varname expression
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae78c7b..b1488c0 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -375,11 +375,14 @@ static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
 typedef struct
 {
-	char	   *line;			/* text of command line */
+	char	   *line;			/* first line for short display */
+	char	   *lines;			/* full multi-line text of command */
 	int			command_num;	/* unique index of this Command struct */
 	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
 	int			argc;			/* number of command words */
 	char	   *argv[MAX_ARGS]; /* command word list */
+	int			compound;   /* last compound command (number of \;) */
+	char	  **gset;   /* per-compound command prefix */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
 	SimpleStats stats;			/* time spent in this command */
 } Command;
@@ -1225,6 +1228,104 @@ getQueryParams(CState *st, const Command *command, const char **params)
 		params[i] = getVariable(st, command->argv[i + 1]);
 }
 
+/* read all responses from backend */
+static bool
+read_response(CState *st, char **gset)
+{
+	PGresult   *res;
+	int			compound = 0;
+
+	while ((res = PQgetResult(st->con)) != NULL)
+	{
+		switch (PQresultStatus(res))
+		{
+			case PGRES_COMMAND_OK: /* non-SELECT commands */
+			case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */
+if (gset[compound] != NULL)
+{
+	fprintf(stderr,
+			"client %d file %d command %d compound %d: "
+			"\\gset expects a row\n",
+			st->id, st->use_file, st->command, compound);
+	st->ecnt++;
+	return false;
+}
+break; /* OK */
+
+			case PGRES_TUPLES_OK:
+if (gset[compound] != NULL)
+{
+	/* store result into variables */
+	int ntuples = PQntuples(res),
+		nfields = PQnfields(res),
+		f;
+
+	if (ntuples != 1)
+	{
+		fprintf(stderr,
+"client %d file %d command %d compound %d: "
+"expecting one row, got %d\n",
+st->id, st->use_file, st->command, compound, ntuples);
+		st->ecnt++;
+		PQclear(res);
+		discard_response(st);
+		return false;
+	}
+
+	for (f = 0; f < nfields ; f++)
+	{
+		char *varname = PQfname(res, f);
+		if (*gset[compound] != '\0')
+			varname = psprintf("%s%s", gset[compound], varname);
+
+		/* store result as a string */
+		if (!putVariable(st, "gset", varname,
+		 PQgetvalue(res, 0, f)))
+		{
+			/* internal error, should it rather abort? */
+			fprintf(stderr,
+	"client %d file %d command %d compound %d: "
+	"error storing into var %s\n",
+	st->id, st->use_file, st->command, compound,
+	varname);
+			st->ecnt++;
+			PQclear(res);
+			discard_response(st);
+			return false;
+		}
+
+		if (*gset[compound] != '\0')
+			free(varname);
+	}
+}
+break;	/* OK */
+
+			default:
+/* 

Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-13 Thread Tom Lane
Andreas Seltenreich  writes:
> Tom Lane writes:
>> I wonder if Andreas would be interested in trying the randomly-timed-
>> SIGTERM thing with sqlsmith.

> So far, most of the core dumps generated are Jeevan's assertion failing
> with backtraces through SearchCatCacheList.  The rest is failing this
> assertion:
> TRAP: FailedAssertion("!(portal->cleanup == ((void *)0))", File: 
> "portalmem.c", Line: 846)
> Example backtrace below.  They all happened during a rollback statement.
> Testing was done on master at 2336f84284.

Interesting.  I can reproduce this by forcing a FATAL exit right there,
eg by adding

if (pstmt->utilityStmt && IsA(pstmt->utilityStmt, TransactionStmt) &&
((TransactionStmt *) pstmt->utilityStmt)->kind == 
TRANS_STMT_ROLLBACK)
InterruptPending = ProcDiePending = true;

before PortalRunMulti's CHECK_FOR_INTERRUPTS.  But it only fails if the
rollback is trying to exit a transaction that's already suffered an error.
The explanation seems to be:

1. Because we already aborted the transaction, xact.c is in state
blockState == TBLOCK_ABORT.

2. This causes AbortOutOfAnyTransaction to think that it can skip
doing AbortTransaction() and go straight to CleanupTransaction().

3. AtCleanup_Portals() is expecting that we already ran, and cleared,
the portal cleanup hook (PortalCleanup) for every live portal.  But
we have not done so for the active portal that we were in the midst
of running ROLLBACK in.  So it fails the mentioned assertion.

Thus, the basic problem is that we get to CleanupTransaction() without
having done PortalDrop on the portal we're running ROLLBACK in.

We could take this as an argument for simply removing that assertion,
which would mean that when AtCleanup_Portals calls PortalDrop, the cleanup
hook would get run, the same as it would have if exec_simple_query had had
a chance to drop the portal.  However, I'm still pretty afraid of allowing
arbitrary code to execute during CleanupTransaction().  What seems like
a better idea is to make AtCleanup_Portals just summarily clear the
cleanup hook, perhaps while emitting a warning.

I tried that (see portalmem.c changes in attached patch), and what I got
was this:

regression=# rollback;
WARNING:  skipping cleanup for portal ""
FATAL:  terminating connection due to administrator command
FATAL:  cannot drop active portal ""
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

or in the server log:

2017-08-13 13:49:50.312 EDT [8737] FATAL:  terminating connection due to 
administrator command
2017-08-13 13:49:50.312 EDT [8737] STATEMENT:  rollback;
2017-08-13 13:49:50.312 EDT [8737] WARNING:  skipping cleanup for portal ""
2017-08-13 13:49:50.312 EDT [8737] FATAL:  cannot drop active portal ""

Well, we could tolerate that maybe, but it doesn't seem like it's actually
acting nicely; the active portal is still creating problems for us.

After some further thought, I decided to try making
AbortOutOfAnyTransaction call AtAbort_Portals() in this situation, thus
basically doing the minimum part of AbortTransaction() needed to clean up
the active portal.  That almost worked --- my initial try got an assertion
failure in mcxt.c, because somebody was trying to drop the
CurrentMemoryContext.  So the minimum part of AbortTransaction that we
need here is really AtAbort_Memory + AtAbort_Portals.  After further
thought I decided it'd be a good idea to phrase that as an unconditional
AtAbort_Memory at the top of AbortOutOfAnyTransaction, thus making sure
we are in some valid context to start with; and then, in case the loop
itself doesn't have anything to do, we need AtCleanup_Memory() at the
bottom of the function to revert CurrentMemoryContext to TopMemoryContext.

In short, the attached patch seems to fix it nicely.  We definitely need
the xact.c changes, but the ones in portalmem.c are perhaps optional, as
in theory now the assertion will never be violated.  But I'm inclined
to keep the portalmem changes anyway, as that will make it more robust
if the situation ever happens in a non-assert build.

Note: there are a couple of comments elsewhere in portalmem.c that need
adjustment if we keep the portalmem changes.  I have not bothered to do
that yet, as it's just cosmetic.

Comments?

regards, tom lane

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 50c3c3b..1eabc67 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*** AbortOutOfAnyTransaction(void)
*** 4233,4238 
--- 4233,4241 
  {
  	TransactionState s = CurrentTransactionState;
  
+ 	/* Ensure we're not running in a doomed memory context */
+ 	AtAbort_Memory();
+ 
  	/*
  	 * Get out of any transaction or nested transaction
  	 */
*** 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-13 Thread Fabien COELHO


Hello Alik,


Now “a” does not have upper bound, that’s why on using iterative algorithm with a 
>= 1 program will stuck on infinite loop because of following line of code:
double b = pow(2.0, s - 1.0);
Because after overflow “b” becomes “+Inf”.


Yep, overflow can happen.


So should upper bound for “a" be set?


Yes, I agree. a >= 1 does not make much sense... If you want uniform 
you should use random(), not call random_zipfian with a = 1. Basically 
it suggests that too large values of "a" should be rejected. Not sure 
where to put the limit, though.


Should I mention in docs that there are two algorithms are used 
depending on values of a(s/theta)?


Yes, as a general principle I think that the documentation should reflect 
the implementation.


In attaching patch, I have added computeIterativeZipfian method and it’s 
usage in getZipfianRand. Is it better to move code of computing via 
cache to new method, so that getZipfianRand will contain only 2 
computeXXXZipfian method calls?


I have not looked in detail, but from what you say I would agree that the 
implementation should be symmetric, so having one function calling one 
method or the other sounds good.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

2017-08-13 Thread Tom Lane
Christoph Berg  writes:
> Seems to be a gcc-7 problem affecting several packages on mips64el:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871514

Hm, unless there is a use of sigsetjmp earlier in that clamav
routine, I would not assume that that's the same issue.  The bug
I suspect we are looking at here is very specific to sigsetjmp
callers: it usually amounts to the compiler unsafely trying to
use the same temporary location for multiple purposes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

2017-08-13 Thread Tom Lane
Christoph Berg  writes:
> 10beta3 and 9.6.4 are both failing during initdb on mips64el on
> Debian/sid (unstable):
> All other architectures have succeeded, as well as the 9.6.4 build for
> Debian/stretch (stable) on mips64el. The difference might be the
> compiler version (6.3.0 vs 7.1.0).

It's hard to explain that stack trace other than as a compiler bug.
There shouldn't be any event triggers active here, so 
EventTriggerBeginCompleteQuery should have done nothing and returned
false.  I don't put complete faith in gdb reports of local variable
values, but it says
needCleanup = 0 '\000'
which agrees with that.  Also the core dump appears to be because
currentEventTriggerState is NULL (please check that), which is
expected if EventTriggerBeginCompleteQuery did nothing.  However, then
EventTriggerEndCompleteQuery should not have gotten called at all.

I suspect you could work around this with

boolisCompleteQuery = (context <= PROCESS_UTILITY_QUERY);
-   boolneedCleanup;
+   volatile bool   needCleanup;
boolcommandCollected = false;

If that fixes it, it's definitely a compiler bug.  That function does
not change needCleanup after the sigsetjmp call, so per POSIX it
should not have to label the variable volatile.  This is far from
being the first such bug we've seen though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

2017-08-13 Thread Christoph Berg
Re: To PostgreSQL Hackers 2017-08-13 
<20170813130127.g3tcyzzvuvlpz...@msg.df7cb.de>
> 10beta3 and 9.6.4 are both failing during initdb on mips64el on
> Debian/sid (unstable):
> 
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.6=mips64el=9.6.4-1=1502374949=0
> https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=mips64el=10%7Ebeta3-1=1502535836=0
> 
> All other architectures have succeeded, as well as the 9.6.4 build for
> Debian/stretch (stable) on mips64el. The difference might be the
> compiler version (6.3.0 vs 7.1.0).

Seems to be a gcc-7 problem affecting several packages on mips64el:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=871514

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian a bit)

2017-08-13 Thread Sokolov Yura
В Fri, 11 Aug 2017 14:05:08 -0400
Robert Haas  пишет:

> On Thu, Aug 10, 2017 at 11:12 AM, Alexander Korotkov
>  wrote:
> > These results look very cool!
> > I think CSN is eventually inevitable, but it's a long distance
> > feature. Thus, this optimization could make us a good serve before
> > we would have CSN. Do you expect there are cases when this patch
> > causes slowdown?  What about case when we scan each xip array only
> > once (not sure how to simulate that using pgbench)?  
> 
> Just a random thought here:
> 
> The statements pgbench executes are pretty simple: they touch one row
> in one table.  You wouldn't expect them to scan the xip array very
> many times.  If even those statements touch the array enough for this
> to win, it seems like it might be hard to construct an even worse
> case.  I might be missing something, though.

With zipfian distribution, many concurrent transactions tries to update
the same row. Every transaction eventually updates this row (may be
after waiting), so there are a lot of concurrent row version, and
transaction scans through its snapshot many times.

> 
> We're not going to accept code like this, though:
> 
> +d = xip[i] >> 6;
> +j = k = xip[i] & mask;
> +while (xiphash[j] != InvalidTransactionId)
> +{
> +j = (k + d) & mask;
> +d = d*5 + 1;
> +}
> 
> Single-character variable names, hard-coded constants, and no
> comments!

I totally agree. First version (which you've cited) was clear POC.
Second version (I've sent in Thursday) looks a bit better, but I agree
there is room for improvement. I'll try to make it prettier.

BTW, I have a question: there is CopySnapshot function, so snapshot is
clearly changed after copy. But I could not follow: is xip and subxip
array changes in a copy, or it remains the same to original, but only
other snapshot fields could be changed?

> I kind of doubt as a general point that we really want another
> open-coded hash table -- I wonder if this could be made to use
> simplehash.

I like simplehash very much (although I'm missing couple of features).
Unfortunately, siplehash is not simple enough for this use case:
- it will use at least twice more memory for hash table itself (cause
  it have to have 'entry->status' field),
- it has large header (ad-hoc hash-table consumes 1 byte in
  SnapshotData structure),
- its allocation will be trickier (custom hash-table is co-allocated
  after xip-array itself),
- its insert algorithm looks to be much more expensive (at least, it is
  more complex in instructions count).

I thing, using simplehash here will lead to much more invasive patch,
than this ad-hoc table. And I believe it will be slower (and this place
is very performance critical), though I will not bet on.

BTW, ad-hoc hash tables already exist: at least recourse owner uses one.

-- 
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-13 Thread Andreas Seltenreich
Tom Lane writes:
> I wonder if Andreas would be interested in trying the randomly-timed-
> SIGTERM thing with sqlsmith.

So far, most of the core dumps generated are Jeevan's assertion failing
with backtraces through SearchCatCacheList.  The rest is failing this
assertion:

TRAP: FailedAssertion("!(portal->cleanup == ((void *)0))", File: 
"portalmem.c", Line: 846)

Example backtrace below.  They all happened during a rollback statement.
Testing was done on master at 2336f84284.

regards,
Andreas

Core was generated by `postgres: smith regression [local] ROLLBACK  
'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
#1  0x7f4c26d3240a in __GI_abort () at abort.c:89
#2  0x559d18897a73 in ExceptionalCondition 
(conditionName=conditionName@entry=0x559d18a81370 "!(portal->cleanup == ((void 
*)0))", errorType=errorType@entry=0x559d188e3f7d "FailedAssertion", 
fileName=fileName@entry=0x559d18a81013 "portalmem.c", 
lineNumber=lineNumber@entry=846) at assert.c:54
#3  0x559d188c42f1 in AtCleanup_Portals () at portalmem.c:846
#4  0x559d18536cb7 in CleanupTransaction () at xact.c:2652
#5  0x559d1853b825 in AbortOutOfAnyTransaction () at xact.c:4278
#6  0x559d188a7799 in ShutdownPostgres (code=, 
arg=) at postinit.c:1146
#7  0x559d1876b4e9 in shmem_exit (code=code@entry=1) at ipc.c:228
#8  0x559d1876b5fa in proc_exit_prepare (code=code@entry=1) at ipc.c:185
#9  0x559d1876b688 in proc_exit (code=code@entry=1) at ipc.c:102
#10 0x559d188999b1 in errfinish (dummy=) at elog.c:543
#11 0x559d1878fefa in ProcessInterrupts () at postgres.c:2841
#12 0x559d18790829 in ProcessInterrupts () at postgres.c:2828
#13 0x559d18795395 in PortalRunMulti (portal=portal@entry=0x559d197f2bf0, 
isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapshot@entry=0 
'\000', dest=dest@entry=0x559d19850c40, altdest=altdest@entry=0x559d19850c40, 
completionTag=completionTag@entry=0x7ffc04f1b560 "") at pquery.c:1239
#14 0x559d18796069 in PortalRun (portal=portal@entry=0x559d197f2bf0, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x559d19850c40, 
altdest=altdest@entry=0x559d19850c40, completionTag=0x7ffc04f1b560 "") at 
pquery.c:799
#15 0x559d18791dca in exec_simple_query (query_string=0x559d1984fe00 
"ROLLBACK;") at postgres.c:1099
#16 0x559d18793af1 in PostgresMain (argc=, 
argv=argv@entry=0x559d197fa078, dbname=, username=) at postgres.c:4090
#17 0x559d184a3428 in BackendRun (port=0x559d197e8f00) at postmaster.c:4357
#18 BackendStartup (port=0x559d197e8f00) at postmaster.c:4029
#19 ServerLoop () at postmaster.c:1753
#20 0x559d1871ad65 in PostmasterMain (argc=3, argv=0x559d197be5a0) at 
postmaster.c:1361
#21 0x559d184a4a6d in main (argc=3, argv=0x559d197be5a0) at main.c:228


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-13 Thread Haribabu Kommi
On Sun, Aug 13, 2017 at 5:21 PM, Amit Kapila 
wrote:

> On Sat, Aug 12, 2017 at 10:34 AM, Haribabu Kommi
>  wrote:
> >
> > On Tue, Aug 8, 2017 at 2:21 PM, Amit Kapila 
> wrote:
> >>
> >> +typedef struct StorageAmRoutine
> >> +{
> >>
> >> In this structure, you have already covered most of the API's that a
> >> new storage module needs to provide, but I think there could be more.
> >> One such API could be heap_hot_search.  This seems specific to current
> >> heap where we have the provision of HOT.  I think we can provide a new
> >> API tuple_search or something like that.
> >
> >
> > Thanks for the review.
> >
> > Yes, the storageAmRoutine needs more function pointers. Currently I am
> > adding all the functions that are present in the heapam.h and some slot
> > related function from tuptable.h.
> >
>
> Hmm, this API is exposed via heapam.h.  Am I missing something?
>

Sorry I was not clearly explaining in my earlier mail. Yes your are right
that the heap_hot_search function exists in heapam.h, but I am yet to
add all the exposed functions that are present in the heapam.h file to
the storageAmRoutine structure.

> Once I stabilize the code and API's that
> > are
> > currently added, then I will further enhance it with remaining functions
> > that
> > are necessary to support pluggable storage API.
> >
>
> Sure, but I think if we found any thing during development/review,
> then we should either add it immediately or at the very least add fix
> me in a patch to avoid forgetting the finding.


OK. I will add all the functions that are identified till now.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] parallelize queries containing initplans

2017-08-13 Thread Haribabu Kommi
On Fri, Aug 11, 2017 at 1:18 AM, Amit Kapila 
wrote:

> On Wed, Aug 9, 2017 at 6:51 PM, Haribabu Kommi 
> wrote:
> >
> >
> > + if (IsA(plan, Gather))
> > + ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
> > initSetParam);
> > + else if (IsA(plan, GatherMerge))
> > + ((GatherMerge *) plan)->initParam =
> > bms_intersect(plan->lefttree->extParam, initSetParam);
> > + else
> > + elog(ERROR, "unrecognized node type: %d", nodeTag(plan));
> >
> > The else case is not possible, because it is already validated for
> Gather or
> > GatherMerge.
> > Can we change it simple if and else?
> >
>
> As we already have an assert in this function to protect from any
> other node type (nodes other than Gather and Gather Merge), it makes
> sense to change the code to just if...else, so changed accordingly.


Thanks for the updated patch. Patch looks fine. I just have some
minor comments.

+ * ExecEvalParamExecParams
+ *
+ * Execute the subplan stored in PARAM_EXEC initplans params, if not
executed
+ * till now.
+ */
+void
+ExecEvalParamExecParams(Bitmapset *params, EState *estate)

I feel it is better to explain when this function executes the sub plans
that are
not executed till now? Means like in what scenario?


+ if (params == NULL)
+ nparams = 0;
+ else
+ nparams = bms_num_members(params);

bms_num_members return 0 in case if the params is NULL.
Is it fine to keep the specific check for NULL is required for performance
benefit
or just remove it? Anything is fine for me.


+ if (IsA(plan, Gather))
+ ((Gather *) plan)->initParam = bms_intersect(plan->lefttree->extParam,
initSetParam);
+ else
+ ((GatherMerge *) plan)->initParam =
bms_intersect(plan->lefttree->extParam, initSetParam);


I think the above code is to find out the common parameters that are prsent
in the external
and out params. It may be better to explain the logic in the comments.


Regards,
Hari Babu
Fujitsu Australia


[HACKERS] initdb failure on Debian sid/mips64el in EventTriggerEndCompleteQuery

2017-08-13 Thread Christoph Berg
10beta3 and 9.6.4 are both failing during initdb on mips64el on
Debian/sid (unstable):

https://buildd.debian.org/status/fetch.php?pkg=postgresql-9.6=mips64el=9.6.4-1=1502374949=0
https://buildd.debian.org/status/fetch.php?pkg=postgresql-10=mips64el=10%7Ebeta3-1=1502535836=0

All other architectures have succeeded, as well as the 9.6.4 build for
Debian/stretch (stable) on mips64el. The difference might be the
compiler version (6.3.0 vs 7.1.0).

Command was: "initdb" -D 
"/home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/src/t
est/regress/./tmp_check/data" --noclean --nosync > 
"/home/myon/postgresql-9.6/postgr
esql-9.6-9.6.3/build/src/test/regress/log/initdb.log" 2>&1

 build/src/test/regress/log/initdb.log 
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "myon".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  de_DE.utf8
  CTYPE:de_DE.utf8
  MESSAGES: C
  MONETARY: de_DE.utf8
  NUMERIC:  de_DE.utf8
  TIME: de_DE.utf8
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "german".

Data page checksums are disabled.

creating directory 
/home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/src/test/regress/./tmp_check/data
 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... Segmentation fault (core dumped)
child process exited with exit code 139


$ gdb build/tmp_install/usr/lib/postgresql/9.6/bin/postgres 
build/src/test/regress/tmp_check/data/core 
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips64el-linux-gnuabi64".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from 
build/tmp_install/usr/lib/postgresql/9.6/bin/postgres...done.
[New LWP 24217]
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/mips64el-linux-gnuabi64/libthread_db.so.1".
Core was generated by 
`/home/myon/postgresql-9.6/postgresql-9.6-9.6.3/build/tmp_install/usr/lib/postgr'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00aababa6634 in EventTriggerEndCompleteQuery ()
at ./build/../src/backend/commands/event_trigger.c:1263
1263MemoryContextDelete(currentEventTriggerState->cxt);
(gdb) bt full
#0  0x00aababa6634 in EventTriggerEndCompleteQuery ()
at ./build/../src/backend/commands/event_trigger.c:1263
prevstate = 
#1  0x00aabad6d508 in ProcessUtilitySlow (
parsetree=parsetree@entry=0xaac3688428, 
queryString=queryString@entry=0xaac3687888 "REVOKE ALL on pg_authid FROM 
public;\n", context=context@entry=PROCESS_UTILITY_TOPLEVEL, 
params=params@entry=0x0, 
completionTag=completionTag@entry=0x985218 "", 
dest=0xaabb0a0378 ) at 
./build/../src/backend/tcop/utility.c:1582
isTopLevel = 1 '\001'
isCompleteQuery = 1 '\001'
needCleanup = 0 '\000'
commandCollected = 
address = {classId = 2, objectId = 0, objectSubId = 0}
secondaryObject = {classId = 0, objectId = 0, objectSubId = 0}
#2  0x00aabad6c6cc in standard_ProcessUtility (parsetree=0xaac3688428, 
queryString=0xaac3687888 "REVOKE ALL on pg_authid FROM public;\n", 
context=, params=0x0, dest=0xaabb0a0378 , 
completionTag=0x985218 "") at ./build/../src/backend/tcop/utility.c:907
isTopLevel = 1 '\001'
__func__ = "standard_ProcessUtility"
#3  0x00aabad6d33c in ProcessUtility (parsetree=, 
queryString=, context=, params=, 
dest=, completionTag=)
at ./build/../src/backend/tcop/utility.c:336
No locals.
#4  0x00aabad68e80 in PortalRunUtility (portal=portal@entry=0xaac368a8a8, 
utilityStmt=utilityStmt@entry=0xaac3688428, 
isTopLevel=isTopLevel@entry=1 '\001', 
setHoldSnapshot=setHoldSnapshot@entry=0 '\000', 
dest=0xaabb0a0378 , completionTag=0x985218 "")
at ./build/../src/backend/tcop/pquery.c:1193
snapshot = 0xaac368c8e8
__func__ = "PortalRunUtility"
#5  0x00aabad69d70 in PortalRunMulti (portal=portal@entry=0xaac368a8a8, 

Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-08-13 Thread Alik Khilazhev
Hello Fabien,

> 
> I think that this method should be used for a>1, and the other very rough one 
> can be kept for parameter a in [0, 1), a case which does not make much sense 
> to a mathematician as it diverges if unbounded.

Now “a” does not have upper bound, that’s why on using iterative algorithm with 
a >= 1 program will stuck on infinite loop because of following line of 
code:
double b = pow(2.0, s - 1.0); 
Because after overflow “b” becomes “+Inf”.

So should upper bound for “a" be set? 

Should I mention in docs that there are two algorithms are used depending on 
values of a(s/theta)?

In attaching patch, I have added computeIterativeZipfian method and it’s usage 
in getZipfianRand.
Is it better to move code of computing via cache to new method, so that 
getZipfianRand will contain only 2 computeXXXZipfian method calls?


pgbench-zipf-06v.patch
Description: Binary data
—
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches I'm thinking of pushing shortly

2017-08-13 Thread Gavin Flower

On 13/08/17 16:19, Thomas Munro wrote:

On Sat, Aug 12, 2017 at 3:24 AM, Tom Lane  wrote:

[...]


I'd vote for including this in v10.  There doesn't seem to be any
downside to this: it's a no brainer to avoid our exploding hash table
case when we can see it coming.


But explosions are fun!
< ducks, and runs away VERY rapidly>

More seriously:
Up to now I'd been avoiding hash indexes, so great news!


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-13 Thread Amit Kapila
On Sat, Aug 12, 2017 at 10:34 AM, Haribabu Kommi
 wrote:
>
> On Tue, Aug 8, 2017 at 2:21 PM, Amit Kapila  wrote:
>>
>> On Tue, Jun 13, 2017 at 7:20 AM, Haribabu Kommi
>>  wrote:
>> >
>> >
>> > On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera
>> > 
>> > wrote:
>> >>
>> >> I have sent the partial patch I have to Hari Babu Kommi.  We expect
>> >> that
>> >> he will be able to further this goal some more.
>> >
>> >
>> > Thanks Alvaro for sharing your development patch.
>> >
>> > Most of the patch design is same as described by Alvaro in the first
>> > mail
>> > [1].
>> > I will detail the modifications, pending items and open items (needs
>> > discussion)
>> > to implement proper pluggable storage.
>> >
>> > Here I attached WIP patches to support pluggable storage. The patch
>> > series
>> > are may not work individually. Still so many things are under
>> > development.
>> > These patches are just to share the approach of the current development.
>> >
>>
>> +typedef struct StorageAmRoutine
>> +{
>>
>> In this structure, you have already covered most of the API's that a
>> new storage module needs to provide, but I think there could be more.
>> One such API could be heap_hot_search.  This seems specific to current
>> heap where we have the provision of HOT.  I think we can provide a new
>> API tuple_search or something like that.
>
>
> Thanks for the review.
>
> Yes, the storageAmRoutine needs more function pointers. Currently I am
> adding all the functions that are present in the heapam.h and some slot
> related function from tuptable.h.
>

Hmm, this API is exposed via heapam.h.  Am I missing something?

> Once I stabilize the code and API's that
> are
> currently added, then I will further enhance it with remaining functions
> that
> are necessary to support pluggable storage API.
>

Sure, but I think if we found any thing during development/review,
then we should either add it immediately or at the very least add fix
me in a patch to avoid forgetting the finding.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-08-13 Thread Amit Kapila
On Sat, Aug 12, 2017 at 10:31 AM, Haribabu Kommi
 wrote:
>>
>> Why do we need to store handler function in TupleDesc?  As of now, the
>> above patch series has it available in RelationData and
>> TupleTableSlot, I am not sure if instead of that keeping it in
>> TupleDesc is a good idea.  Which all kind of places require TupleDesc
>> to contain handler?  If those are few places, can we think of passing
>> it as a parameter?
>
>
> Till now I am to able to proceed without adding any storage handler
> functions to
> TupleDesc structure. Sure, I will try the way of passing as a parameter when
> there is a need of it.
>

Okay, I think it is better if you discuss such locations before
directly modifying those.

> During the progress of the patch, I am facing problems in designing the
> storage API
> regarding the Buffer. For example To replace all the HeapTupleSatisfiesMVCC
> and
> related functions with function pointers, In HeapTuple format, the tuple may
> belongs
> to one buffer, so the buffer is passed to the HeapTupleSatifisifes***
> functions along
> with buffer, But in case of other storage formats, the single buffer may not
> contains
> the actual data.
>

Also, it is quite possible that some of the storage Am's don't even
want to return bool as a parameter from HeapTupleSatisfies* API's.  I
guess what we need here is to provide a way so that different storage
am's can register their function pointer for an equivalent to
satisfies function.  So, we need to change
SnapshotData.SnapshotSatisfiesFunc in some way so that different
handlers can register their function instead of using that directly.
I think that should address the problem you are planning to solve by
omitting buffer parameter.

> This buffer is used to set the Hint bits and mark the
> buffer as dirty.
> In case if the buffer is not available, the performance may affect for the
> following
> queries if the hint bits are not set.
>

I don't think it is advisable to change that for the current heap.

> And also the Buffer is used to get from heap_fetch, heap_lock_tuple and
> related
> functions to check the Tuple visibility, but currently returning a buffer
> from the above
> heap_** function is not possible for other formats.
>

Why not?  I mean if we consider that all the formats we are worried at
this stage have TID (block number, tuple location), then we can get
the buffer.  We might want to consider passing TID as a parameter to
these API's if required to make that possible.  You also agreed above
[1] that we can first design the API considering storage formats
having TID.

> And also for the
> HeapTuple data,
> the tuple data is copied into palloced buffer instead of pointing directly
> to the page.
> So, returning a Buffer is a valid or not here?
>

Yeah, but I think for the sake of compatibility and not changing too
much in the current API's signature, we should try to avoid it.

> Currently I am proceeding to remove the Buffer as parameter in the API and
> proceed
> further, In case if it affects the performance, we need to find out a
> different appraoch
> in handling the hint bits.
>

Leaving aside the performance concern, I am not convinced that it is a
good idea to remove Buffer as a parameter from the API's you have
mentioned above.  Would you mind thinking once again keeping the
suggestions provided above in this email to see if we can avoid
removing Buffer as a parameter?


[1] - 
https://www.postgresql.org/message-id/CAJrrPGd8%2Bi8sqZCdhfvBhs2d1akEb_kEuBvgRHSPJ9z2Z7VBJw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers