Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-18 Thread Amit Langote
On 2016/10/19 12:20, Etsuro Fujita wrote:
> Having said that, I like the latest version (v6), so I'd vote for marking
> this as Ready For Committer.

+1, thanks a lot!

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2016-10-18 Thread Dilip Kumar
On Tue, Oct 18, 2016 at 1:45 AM, Andres Freund  wrote:
> I don't quite understand why the bitmap has to be parallel at all. As
> far as I understand your approach as described here, the only thing that
> needs to be shared are the iteration arrays.  Since they never need to
> be resized and such, it seems to make a lot more sense to just add an
> API to share those, instead of the whole underlying hash.

You are right that we only share iteration arrays. But only point is
that each entry of iteration array is just a pointer to hash entry.
So either we need to build hash in shared memory (my current approach)
or we need to copy each hash element at shared location (I think this
is going to be expensive).

Let me know if I am missing something..

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel bitmap heap scan

2016-10-18 Thread Dilip Kumar
On Tue, Oct 18, 2016 at 1:42 AM, Robert Haas  wrote:
> But what's the impact on performance?  Presumably parallel bitmap heap
> scan was already slower than the non-parallel version, and that commit
> presumably widens the gap.  Seems like something to worry about...

I have checked the performance in my local machine and there is no
impact on the gap.
If you see the explain analyze output of all queries which got
benefited which parallel bitmap map heap scan, BitmapIndex node is
taking very less time compare to BitmapHeap.


Actual execution time on head (before efficient hash table patch)
 BitmapHeapNode  BitmapIndexNode
Q6   38997  6951
Q14 14516569
Q15 28530  1442

Out of 4 queries, Q4 is converted from parallel seqscan to parallel
bitmap scan so no impact.
Q14, Q15 time spent in BitmapIndex node is < 5% of time spent in
BitmapHeap Node. Q6 it's 20% but I did not see much impact on this in
my local machine. However I will take the complete performance reading
and post the data on my actual performance machine.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Change of schedule for the next batch of PG update releases

2016-10-18 Thread Tom Lane
We previously stated that the next regularly-scheduled set of PG
back-branch updates would be the week of Nov. 7th, announcing on
the 10th.  In view of some particularly nasty bugs that have
surfaced in 9.6.0 (e.g. pg_upgrade's corruption of visibility maps),
we're going to move that up by two weeks: we'll wrap a set of releases
on Monday Oct 24 for announcement on Thursday Oct 27.

Note that this will be the last release in the 9.1.X series.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-18 Thread Etsuro Fujita

On 2016/10/18 22:04, Ashutosh Bapat wrote:

On Tue, Oct 18, 2016 at 6:18 PM, Etsuro Fujita
 wrote:

On 2016/10/17 20:12, Ashutosh Bapat wrote:

On 2016/10/07 10:26, Amit Langote wrote:

I think this (v4) patch is in the best shape so far.



+1, except one small change.



The comment "... On relcache invalidation events or the relevant
syscache invalidation events, ..." specifies relcache separately. I
think, it should either needs to specify all the syscaches or just
mention "On corresponding syscache invalidation events, ...".

Attached patch uses the later version. Let me know if this looks good.
If you are fine, I think we should mark this as "Ready for committer".



I'm not sure that is a good idea because ISTM the comments there use the
words "syscache" and "relcache" properly.  I'd like to leave that for
committers.



Fine with me.


OK


One thing I'd like to propose to improve the patch is to update the
following comments for PlanCacheFuncCallback: "Note that the coding would
support use for multiple caches, but right now only user-defined functions
are tracked this way.".  We now use this for tracking FDW-related objects as
well, so the comments needs to be updated. Please find attached an updated
version.



Fine with me.


OK


Sorry for speaking this now, but one thing I'm not sure about the patch is
whether we should track the dependencies on FDW-related objects more
accurately; we modified extract_query_dependencies so that the query tree's
dependencies are tracked, regardless of the command type, but the query tree
would be only affected by those objects in AddForeignUpdateTargets, so it
would be enough to collect the dependencies for the case where the given
query is UPDATE/DELETE.  But I thought the patch would be probably fine as
proposed, because we expect updates on such catalogs to be infrequent.  (I
guess the changes needed for the accuracy would be small, though.)



In fact, I am not in
favour of tracking the query dependencies for UPDATE/DELETE since we
don't have any concrete example as to when that would be needed.


Right, but as I said before, some FDW might consult FDW options stored 
in those objects during AddForeignUpdateTargets, so we should do that.



Besides
that, I modified add_rte_to_flat_rtable so that the plan's dependencies are
tracked, but that would lead to tracking the dependencies of unreferenced
foreign tables in dead subqueries or the dependencies of foreign tables
excluded from the plan by eg, constraint exclusion.  But I thought that
would be also OK by the same reason as above.  (Another reason for that was
it seemed better to me to collect the dependencies in the same place as for
relation OIDs.)



If those unreferenced relations become references because of the
changes in options, we will require those query dependencies to be
recorded. So, if we are recording query dependencies, we should record
the ones on unreferenced relations as well.


I mean plan dependencies here, not query dependencies.

Having said that, I like the latest version (v6), so I'd vote for 
marking this as Ready For Committer.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" and pg_hba.conf

2016-10-18 Thread Peter Eisentraut
On 10/18/16 6:31 PM, Jeff Janes wrote:
> I would want to do that so that the code dealing with password-based
> logins doesn't go completely untested by "make check", like it currently is.

Write a TAP test for it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-10-18 Thread Amit Kapila
On Wed, Oct 19, 2016 at 5:57 AM, Amit Kapila  wrote:
> On Tue, Oct 18, 2016 at 10:52 PM, Robert Haas  wrote:
>> On Tue, Oct 18, 2016 at 5:37 AM, Amit Kapila  wrote:
>>> On Wed, Oct 5, 2016 at 10:22 PM, Robert Haas  wrote:
 On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila  
 wrote:
> I think one way to avoid the risk of deadlock in above scenario is to
> take the cleanup lock conditionally, if we get the cleanup lock then
> we will delete the items as we are doing in patch now, else it will
> just mark the tuples as dead and ensure that it won't try to remove
> tuples that are moved-by-split.  Now, I think the question is how will
> these dead tuples be removed.  We anyway need a separate mechanism to
> clear dead tuples for hash indexes as during scans we are marking the
> tuples as dead if corresponding tuple in heap is dead which are not
> removed later.  This is already taken care in btree code via
> kill_prior_tuple optimization.  So I think clearing of dead tuples can
> be handled by a separate patch.

 That seems like it could work.
>>>
>>> I have implemented this idea and it works for MVCC scans.  However, I
>>> think this might not work for non-MVCC scans.  Consider a case where
>>> in Process-1, hash scan has returned one row and before it could check
>>> it's validity in heap, vacuum marks that tuple as dead and removed the
>>> entry from heap and some new tuple has been placed at that offset in
>>> heap.
>>
>> Oops, that's bad.
>>
>>> Now when Process-1 checks the validity in heap, it will check
>>> for different tuple then what the index tuple was suppose to check.
>>> If we want, we can make it work similar to what btree does as being
>>> discussed on thread [1], but for that we need to introduce page-scan
>>> mode as well in hash indexes.   However, do we really want to solve
>>> this problem as part of this patch when this exists for other index am
>>> as well?
>>
>> For what other index AM does this problem exist?
>>
>
> By this problem, I mean to say deadlocks for suspended scans, that can
> happen in btree for non-Mvcc or other type of scans where we don't
> release pin during scan.  In my mind, we have below options:
>
> a. problem of deadlocks for suspended scans should be tackled as a
> separate patch as it exists for other indexes (at least for some type
> of scans).
> b. Implement page-scan mode and then we won't have deadlock problem
> for MVCC scans.
> c. Let's not care for non-MVCC scans unless we have some way to hit
> those for hash indexes and proceed with Dead tuple marking idea.  I
> think even if we don't care for non-MVCC scans, we might hit this
> problem (deadlocks) when the index relation is unlogged.
>

oops, my last sentence is wrong. What I wanted to say is: "I think
even if we don't care for non-MVCC scans, we might hit the problem of
TIDs reuse when the index relation is unlogged."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Michael Paquier
On Tue, Oct 18, 2016 at 11:40 PM, Tom Lane  wrote:
> I wrote:
>> The cleanest fix might be to change those various "long" variables
>> to uint32.  You'd have to think about how to handle the ntohl/htonl
>> calls that are used on them, though.
>
> Or actually, no, you wouldn't have to think very hard.  I was supposing
> that those calls were declared to traffic in "long"s, but they aren't
> and never have been, at least not since SUSv2:
>
> uint32_t htonl(uint32_t hostlong);
> uint16_t htons(uint16_t hostshort);
> uint32_t ntohl(uint32_t netlong);
> uint16_t ntohs(uint16_t netshort);
>
> So s/long/uint32/ would probably fix it just fine.

The pg_strong_random patch has been reverted, what's done is done.
FWIW only doing that is needed to fix the problem as already mentioned
by Tom:
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index f232083..634578d 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -38,7 +38,7 @@ volatile uint32 CritSectionCount = 0;
 intMyProcPid;
 pg_time_t  MyStartTime;
 struct Port *MyProcPort;
-long   MyCancelKey;
+uint32 MyCancelKey;
 intMyPMChildSlot;

 /*
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 78545da..5e623a1 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -162,7 +162,7 @@ extern PGDLLIMPORT int MyProcPid;
 extern PGDLLIMPORT pg_time_t MyStartTime;
 extern PGDLLIMPORT struct Port *MyProcPort;
 extern PGDLLIMPORT struct Latch *MyLatch;
-extern long MyCancelKey;
+extern uint32 MyCancelKey;
 extern int MyPMChildSlot;

 extern char OutputFileName[];

I'll send an update for that, as well as a solution for the pademelon
problem on the SCRAM thread.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-10-18 Thread Haribabu Kommi
On Wed, Oct 19, 2016 at 5:11 AM, Robert Haas  wrote:

> On Thu, Sep 29, 2016 at 1:45 AM, Haribabu Kommi
>  wrote:
> > Currently, The SQL stats is a fixed size counter to track the all the
> ALTER
> > cases as single counter. So while sending the stats from the backend to
> > stats collector at the end of the transaction, the cost is same, because
> of
> > it's fixed size. This approach adds overhead to send and read the stats
> > is minimal.
> >
> > With the following approach, I feel it is possible to support the
> counter at
> > command tag level.
> >
> > Add a Global and local Hash to keep track of the counters by using the
> > command tag as the key, this hash table increases dynamically whenever
> > a new type of SQL command gets executed. The Local Hash data is passed
> > to stats collector whenever the transaction gets committed.
> >
> > The problem I am thinking is that, Sending data from Hash and populating
> > the Hash from stats file for all the command tags adds some overhead.
>
> Yeah, I'm not very excited about that overhead.  This seems useful as
> far as it goes, but I don't really want to incur measurable overhead
> when it's in use.  Having a hash table rather than a fixed array of
> slots means that you have to pass this through the stats collector
> rather than updating shared memory directly, which is fairly heavy
> weight.  If each backend could have its own copy of the slot array and
> just update that, and readers added up the values across the whole
> array, this could be done without any locking at all, and it would
> generally be much lighter-weight than this approach.


Using limited information like combining all ALTER XXX into Alter counter,
the fixed array logic works fine without having any problems. But people
are suggesting to provide more details like ALTER VIEW and etc, so I
checked the following ways,

1. Using of nodetag as an array of index to update the counter. But this
approach doesn't work for some cases like T_DropStmt where the tag
varies based on the removeType of the DropStmt. So I decide to drop
this approach.

2. Using of tag name for searching the fixed size array that is sorted with
all command tags that are possible in PostgreSQL. Using a binary search,
find out the location and update the counter. In this approach, first the
array
needs to be filed with TAG information in the sorted order.

3. Using of Hash table to store the counters information based on the TAG
name as key.

I choose the approach-3 as it can scale for any additional commands.

Any better alternatives with that directly it can point to the array index?

if we are going with ALTER XXX into single Alter counter is approach, then
I will change the code with a fixed array approach.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Hash Indexes

2016-10-18 Thread Amit Kapila
On Tue, Oct 18, 2016 at 10:52 PM, Robert Haas  wrote:
> On Tue, Oct 18, 2016 at 5:37 AM, Amit Kapila  wrote:
>> On Wed, Oct 5, 2016 at 10:22 PM, Robert Haas  wrote:
>>> On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila  
>>> wrote:
 I think one way to avoid the risk of deadlock in above scenario is to
 take the cleanup lock conditionally, if we get the cleanup lock then
 we will delete the items as we are doing in patch now, else it will
 just mark the tuples as dead and ensure that it won't try to remove
 tuples that are moved-by-split.  Now, I think the question is how will
 these dead tuples be removed.  We anyway need a separate mechanism to
 clear dead tuples for hash indexes as during scans we are marking the
 tuples as dead if corresponding tuple in heap is dead which are not
 removed later.  This is already taken care in btree code via
 kill_prior_tuple optimization.  So I think clearing of dead tuples can
 be handled by a separate patch.
>>>
>>> That seems like it could work.
>>
>> I have implemented this idea and it works for MVCC scans.  However, I
>> think this might not work for non-MVCC scans.  Consider a case where
>> in Process-1, hash scan has returned one row and before it could check
>> it's validity in heap, vacuum marks that tuple as dead and removed the
>> entry from heap and some new tuple has been placed at that offset in
>> heap.
>
> Oops, that's bad.
>
>> Now when Process-1 checks the validity in heap, it will check
>> for different tuple then what the index tuple was suppose to check.
>> If we want, we can make it work similar to what btree does as being
>> discussed on thread [1], but for that we need to introduce page-scan
>> mode as well in hash indexes.   However, do we really want to solve
>> this problem as part of this patch when this exists for other index am
>> as well?
>
> For what other index AM does this problem exist?
>

By this problem, I mean to say deadlocks for suspended scans, that can
happen in btree for non-Mvcc or other type of scans where we don't
release pin during scan.  In my mind, we have below options:

a. problem of deadlocks for suspended scans should be tackled as a
separate patch as it exists for other indexes (at least for some type
of scans).
b. Implement page-scan mode and then we won't have deadlock problem
for MVCC scans.
c. Let's not care for non-MVCC scans unless we have some way to hit
those for hash indexes and proceed with Dead tuple marking idea.  I
think even if we don't care for non-MVCC scans, we might hit this
problem (deadlocks) when the index relation is unlogged.

Here, even if we want to go with (b), I think we can handle it in a
separate patch, unless you think otherwise.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2016-10-18 Thread Bruce Momjian
On Wed, Oct 12, 2016 at 08:33:00AM -0400, Tom Lane wrote:
> As others have noted, there is no likelihood that we'd take a disk-format-
> compatibility-breaking patch for v10.  Even if we wanted to do that, the
> above proposal would also break send/recv (binary COPY) compatibility for
> macaddr.
> 
> I think that probably the best bet here is to have two types and put some
> thought into making them interoperate where appropriate, as the various
> sizes of int do.  It's kind of a shame that this won't look like the
> approach used for inet addresses, but we're stuck.

If feels like we are going into VARCHAR2 territory where we end up
telling people to use an oddly-named data type forever.  Some are
suggesting JSONB is in that category.

I wish I had a suggestion, but I am not above adding trickery to
pg_upgrade to improve the outcome.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Claudio Freire
On Tue, Oct 18, 2016 at 5:48 PM, Simon Riggs  wrote:
> On 18 October 2016 at 22:04, Claudio Freire  wrote:
>> On Tue, Oct 18, 2016 at 3:28 PM, Alvaro Herrera
>>  wrote:
>>> I propose we introduce the concept of "indirect indexes".  I have a toy
>>> implementation and before I go further with it, I'd like this assembly's
>>> input on the general direction.
>>>
>>> Indirect indexes are similar to regular indexes, except that instead of
>>> carrying a heap TID as payload, they carry the value of the table's
>>> primary key.  Because this is laid out on top of existing index support
>>> code, values indexed by the PK can only be six bytes long (the length of
>>> ItemPointerData); in other words, 281,474,976,710,656 rows are
>>> supported, which should be sufficient for most use cases.[1]
>>
>>
>> You don't need that limitation (and vacuum will be simpler) if you add
>> the PK as another key, akin to:
>>
>> CREATE INDIRECT INDEX idx ON tab (a, b, c);
>>
>> turns into
>>
>> CREATE INDEX idx ON tab (a, b, c, pk);
>>
>> And is queried appropriately (using an index-only scan, extracting the
>> PK from the index tuple, and then querying the PK index to get the
>> tids).
>>
>> In fact, I believe that can work with all index ams supporting index-only 
>> scans.
>
> But if you did that, an UPDATE of a b or c would cause a non-HOT
> update, so would defeat the purpose of indirect indexes.

I meant besides all the other work, omitting the tid from the index
(as only the PK matters), marking them indirect, and all that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" and pg_hba.conf

2016-10-18 Thread Tom Lane
Jeff Janes  writes:
> On Tue, Oct 18, 2016 at 2:28 PM, Tom Lane  wrote:
>> No.  Why would you want that?  External connections to the test DB seem
>> like exactly what you *don't* want, for reproducibility's sake.

> I would want to do that so that the code dealing with password-based logins
> doesn't go completely untested by "make check", like it currently is.

Well, unless you create a user with a password and use that password to
log in, you're not performing much of a test.  Getting the hba entry
in there seems to me to be about the least of the worries for making
a usable test case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" and pg_hba.conf

2016-10-18 Thread Jeff Janes
On Tue, Oct 18, 2016 at 2:28 PM, Tom Lane  wrote:

> Jeff Janes  writes:
> > Is there a way to get "make check" to install a custom pg_hba.conf for
> its
> > temporary installation? Something like pre-prending the line:
> > local  all  password_user  md5
> > To the default pg_hba.conf?
>
> No.  Why would you want that?  External connections to the test DB seem
> like exactly what you *don't* want, for reproducibility's sake.
>

I would want to do that so that the code dealing with password-based logins
doesn't go completely untested by "make check", like it currently is.

I don't see how connecting to an obscure unix socket with a password is
more "external" than connecting to the same obscure linux socket without a
password.

Cheers,

Jeff


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-18 Thread Petr Jelinek
On 18/10/16 22:25, Robert Haas wrote:
> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
>  wrote:
>> attached is v3 of the patches, with a few minor fixes in Slab, and much
>> larger fixes in GenSlab.
>>
>> Slab (minor fixes)
>> --
>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
>> still need to zero the free bitmap at the end of the block.
>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
>> why/how firstFreeChunk and minFreeChunks are maintained.
>> - Fixed / improved a bunch of additional comments, based on feedback.
> 
> I had a look at 0001 today, but it seems to me that it still needs
> work.  It's still got a lot of remnants of where you've
> copy-and-pasted aset.c.  I dispute this allegation:
> 
> + * SlabContext is our standard implementation of MemoryContext.
> 

Are you looking at old version of the patch? I complained about this as
well and Tomas has changed that.

> And then there's this:
> 
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> +fprintf(stderr, "AllocFree: %s: %p, %d\n", \
> +(_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
> +(_cxt)->header.name, (_chunk), (_chunk)->size)
> 
> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
> AllocAlloc, a think that, as far as I can tell, is not real. But
> having this new context type also pretend to be AllocAlloc is even
> dumber.

You are definitely looking at old version.

> 
> More broadly, I'm not sure I like this design very much.  The whole
> point of a slab context is that all of the objects are the same size.
> I wouldn't find it too difficult to support this patch if we were
> adding an allocator for fixed-size objects that was then being used to
> allocate objects which are fixed size.  However, what we seem to be
> doing is creating an allocator for fixed-size objects and then using
> it for variable-size tuples.  That's really pretty weird.  Isn't the
> root of this problem that aset.c is utterly terrible at handling large
> number of allocations?  Maybe we should try to attack that problem
> more directly.

It's used for TXNs which are fixed and some tuples (there is assumption
that the decoded tuples have more or less normal distribution).

I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.

I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Renaming of pg_xlog and pg_clog

2016-10-18 Thread Bruce Momjian
On Fri, Oct 14, 2016 at 07:19:02PM +0200, Christoph Berg wrote:
> Re: Stephen Frost 2016-10-14 <20161014113523.gz13...@tamriel.snowman.net>
> > > We have a tool called pg_xlogdump in the standard installation.  initdb
> > > has an option --xlogdir, pg_basebackup has --xlog and others.  Renaming
> > > the xlog directory would make this all a bit confusing, unless we're
> > > prepared to rename the programs and options as well.
> > 
> > pg_xlogdump is not a user-facing tool, frankly, so I don't believe we
> > should be terribly concerned about either leaving it named as-is or
> > renaming it.  I agree that we should consider adding alternative names
> > to the options for initdb and pg_basebackup.
> 
> There's actually another instance of "rename so people shoot their
> feet less often" here: pg_resetxlog, which is a user-facing tool.
> Folks on #postgresql have repeatedly been joking that it should rather
> be named pg_eatmydata, given the number of "howtos" on the web that
> make use of it. Maybe this is the chance to find a less
> innocent-sounding name. (Or add a mandatory --rewind-my-data switch.)

I wonder how many of the uses of pg_resetxlog were caused by mistakenly
removing the pg_xlog directory.   My point is renaming pg_xlog might
reduce mistake use of pg_resetxlog.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mention column name in error messages

2016-10-18 Thread Michael Paquier
On Wed, Oct 19, 2016 at 3:14 AM, Robert Haas  wrote:
> On Mon, Oct 17, 2016 at 3:18 AM, Michael Paquier
>  wrote:
>> Andres wrote:
>>> +/* Support for TransformExprCallback */
>>> +typedef struct TransformExprState
>>> +{
>>> + const char *column_name;
>>> + Oid expected_type;
>>> +} TransformExprState;
>>> I see no need for this to be a struct defined in the header. Given that
>>> TransformExprCallback isn't public, and the whole thing is specific to
>>> TransformExprState...
>>
>> That's a matter of taste, really. Personally I find cleaner to declare
>> that with the other static declarations at the top of the fil, and
>> keep the comments of transformAssignedExpr clean of everything.
>
> It's pretty standard practice for PostgreSQL to keep declarations
> private to particular files whenever they are used only in that file.
> And I'd argue that it's good practice in general.

Yes, definitely. My comment was referring to the fact that the
declaration of this structure was not at the top of this .c file as
the last version of the patch is doing (would be better to declare
them at the beginning of this .c file for clarity actually). It seems
like I did not get Andres' review point, sorry for that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Alexander Korotkov
On Wed, Oct 19, 2016 at 12:21 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Oct 18, 2016 at 9:28 PM, Alvaro Herrera 
> wrote:
>
>> Vacuuming presents an additional challenge: in order to remove index
>> items from an indirect index, it's critical to scan the PK index first
>> and collect the PK values that are being removed.  Then scan the
>> indirect index and remove any items that match the PK items removed.
>> This is a bit problematic because of the additional memory needed to
>> store the array of PK values.  I haven't implemented this yet.
>>
>
> Imagine another situation: PK column was not updated, but indirect indexed
> column was updated.
> Thus, for single heap tuple we would have single PK tuple and two indirect
> index tuples (correct me if I'm wrong).
> How are we going to delete old indirect index tuple?
>

Let me explain it in more details.

There is a table with two columns and indirect index on it.

CREATE TABLE tbl (id integer primary key, val integer);
CREAET INDIRECT INDEX tbl_val_indirect_idx ON tbl (val);

Then do insert and update.

INSERT INTO tbl VALUES (1, 1);
UPDATE tbl SET val = 2 WHERE id = 1;

Then heap would contain two tuples.

 ctid  | id | val
---++-
 (0;1) |  1 |   1
 (0;2) |  1 |   2

tbl_pk_idx would contain another two tuples

 id | item_pointer
+--
  1 |(0;1)
  1 |(0;2)

And tbl_val_indirect_idx would have also two tuples

 val | id
-+
   1 |  1
   2 |  1

Then vacuum removes (0;1) from heap, reference to (0;1) from tbl_pk_idx.
But how will it remove (1,1) tuple from tbl_val_indirect_idx?  Thus, before
vacuuming tbl_val_indirect_idx we should know not only values of id which
are being removed, but actually (id, val) pairs which are being removed.
Should we collect those paris while scanning heap?  But we should also take
into account that multiple heap tuples might have same (id, val) pair
values (assuming there could be other columns being updated).  Therefore,
we should take into account when last pair of particular (id, val) pair
value was deleted from heap.  That would be very huge change to vacuum, may
be even writing way more complex vacuum algorithm from scratch.  Probably,
you see the better solution of this problem.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-18 Thread Josh Berkus
On 10/18/2016 01:37 PM, Andres Freund wrote:
> Hi,
> 
> On 2016-10-09 21:51:07 -0700, Josh Berkus wrote:
>> Given that hot_standby_feedback is pretty bulletproof now, and a lot of
>> the work in reducing replay conflicts, I think the utility of
>> vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
>> to remove it to 9.6, but it got away from me.
> 
> HS feedback doesn't e.g. work well with delayed and/or archived replay,
> whereas defer_cleanup does.

Oh, point!  See, that's why I polled, I knew there was something I was
forgetting about.

> On the other hand, removing it would make some of the reasoning around
> GetOldestXmin() a bit easier.

Enough to make it worth breaking the above?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" and pg_hba.conf

2016-10-18 Thread Tom Lane
Jeff Janes  writes:
> Is there a way to get "make check" to install a custom pg_hba.conf for its
> temporary installation? Something like pre-prending the line:
> local  all  password_user  md5
> To the default pg_hba.conf?

No.  Why would you want that?  External connections to the test DB seem
like exactly what you *don't* want, for reproducibility's sake.

Also, this seems like it's overlapping with the use case for making
an installation and using make installcheck.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "make check" and pg_hba.conf

2016-10-18 Thread Jeff Janes
Is there a way to get "make check" to install a custom pg_hba.conf for its
temporary installation? Something like pre-prending the line:

local  all  password_user  md5

To the default pg_hba.conf?


Thanks,

Jeff


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Alexander Korotkov
Hi, Alvaro!

Thank you for your proposal.  One question about vacuum excites me most.

On Tue, Oct 18, 2016 at 9:28 PM, Alvaro Herrera 
wrote:

> Vacuuming presents an additional challenge: in order to remove index
> items from an indirect index, it's critical to scan the PK index first
> and collect the PK values that are being removed.  Then scan the
> indirect index and remove any items that match the PK items removed.
> This is a bit problematic because of the additional memory needed to
> store the array of PK values.  I haven't implemented this yet.
>

Imagine another situation: PK column was not updated, but indirect indexed
column was updated.
Thus, for single heap tuple we would have single PK tuple and two indirect
index tuples (correct me if I'm wrong).
How are we going to delete old indirect index tuple?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Simon Riggs
On 18 October 2016 at 22:04, Claudio Freire  wrote:
> On Tue, Oct 18, 2016 at 3:28 PM, Alvaro Herrera
>  wrote:
>> I propose we introduce the concept of "indirect indexes".  I have a toy
>> implementation and before I go further with it, I'd like this assembly's
>> input on the general direction.
>>
>> Indirect indexes are similar to regular indexes, except that instead of
>> carrying a heap TID as payload, they carry the value of the table's
>> primary key.  Because this is laid out on top of existing index support
>> code, values indexed by the PK can only be six bytes long (the length of
>> ItemPointerData); in other words, 281,474,976,710,656 rows are
>> supported, which should be sufficient for most use cases.[1]
>
>
> You don't need that limitation (and vacuum will be simpler) if you add
> the PK as another key, akin to:
>
> CREATE INDIRECT INDEX idx ON tab (a, b, c);
>
> turns into
>
> CREATE INDEX idx ON tab (a, b, c, pk);
>
> And is queried appropriately (using an index-only scan, extracting the
> PK from the index tuple, and then querying the PK index to get the
> tids).
>
> In fact, I believe that can work with all index ams supporting index-only 
> scans.

But if you did that, an UPDATE of a b or c would cause a non-HOT
update, so would defeat the purpose of indirect indexes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-18 Thread Andres Freund
Hi,

On 2016-10-09 21:51:07 -0700, Josh Berkus wrote:
> Given that hot_standby_feedback is pretty bulletproof now, and a lot of
> the work in reducing replay conflicts, I think the utility of
> vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
> to remove it to 9.6, but it got away from me.

HS feedback doesn't e.g. work well with delayed and/or archived replay,
whereas defer_cleanup does.

On the other hand, removing it would make some of the reasoning around
GetOldestXmin() a bit easier.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-18 Thread Josh Berkus
On 10/18/2016 01:28 PM, Robert Haas wrote:
> On Tue, Oct 18, 2016 at 4:18 PM, Josh Berkus  wrote:
>> On 10/12/2016 05:00 PM, Robert Haas wrote:
>>> On Sun, Oct 9, 2016 at 9:51 PM, Josh Berkus  wrote:
 Given that hot_standby_feedback is pretty bulletproof now, and a lot of
 the work in reducing replay conflicts, I think the utility of
 vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
 to remove it to 9.6, but it got away from me.

 Any objections to removing the option in 10?
>>>
>>> I'm not sure I see the point.
>>
>> Redusing the number of configuration variables is an a-priori good.  In
>> aggregate, the more knobs we have, the harder it is to learn how to
>> admin Postgres.  Therefore any time a config variable becomes obsolete,
>> we should remove it.
> 
> Meh.  I agree that more configuration knobs makes it harder to learn
> to configure the system, but we've got enough of them that removing
> exactly one isn't going to make a material difference.  Against that,
> if you are wrong about it being obsolete and there are actually people
> relying on it heavily, those people will be very sad if we remove it,
> and unless they read this mailing list, we probably won't find out
> until it's too late.

Based on that argument, we would never be able to remove any
configuration parameter ever.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-18 Thread Robert Haas
On Tue, Oct 18, 2016 at 4:18 PM, Josh Berkus  wrote:
> On 10/12/2016 05:00 PM, Robert Haas wrote:
>> On Sun, Oct 9, 2016 at 9:51 PM, Josh Berkus  wrote:
>>> Given that hot_standby_feedback is pretty bulletproof now, and a lot of
>>> the work in reducing replay conflicts, I think the utility of
>>> vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
>>> to remove it to 9.6, but it got away from me.
>>>
>>> Any objections to removing the option in 10?
>>
>> I'm not sure I see the point.
>
> Redusing the number of configuration variables is an a-priori good.  In
> aggregate, the more knobs we have, the harder it is to learn how to
> admin Postgres.  Therefore any time a config variable becomes obsolete,
> we should remove it.

Meh.  I agree that more configuration knobs makes it harder to learn
to configure the system, but we've got enough of them that removing
exactly one isn't going to make a material difference.  Against that,
if you are wrong about it being obsolete and there are actually people
relying on it heavily, those people will be very sad if we remove it,
and unless they read this mailing list, we probably won't find out
until it's too late.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-18 Thread Robert Haas
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
 wrote:
> attached is v3 of the patches, with a few minor fixes in Slab, and much
> larger fixes in GenSlab.
>
> Slab (minor fixes)
> --
> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
> still need to zero the free bitmap at the end of the block.
> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
> why/how firstFreeChunk and minFreeChunks are maintained.
> - Fixed / improved a bunch of additional comments, based on feedback.

I had a look at 0001 today, but it seems to me that it still needs
work.  It's still got a lot of remnants of where you've
copy-and-pasted aset.c.  I dispute this allegation:

+ * SlabContext is our standard implementation of MemoryContext.

And all of this is just a direct copy-paste; I don't really want two copies:

+ *  When running under Valgrind, we want a NOACCESS memory region both before
+ *  and after the allocation.  The chunk header is tempting as the preceding
+ *  region, but mcxt.c expects to able to examine the standard chunk header
+ *  fields.  Therefore, we use, when available, the requested_size field and
+ *  any subsequent padding.  requested_size is made NOACCESS before returning
+ *  a chunk pointer to a caller.  However, to reduce client request traffic,
+ *  it is kept DEFINED in chunks on the free list.

And then there's this:

+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+fprintf(stderr, "AllocFree: %s: %p, %d\n", \
+(_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
+(_cxt)->header.name, (_chunk), (_chunk)->size)

Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.

+static void
+randomize_mem(char *ptr, size_t size)
+{
+static int  save_ctr = 1;
+size_t  remaining = size;
+int ctr;
+
+ctr = save_ctr;
+VALGRIND_MAKE_MEM_UNDEFINED(ptr, size);
+while (remaining-- > 0)
+{
+*ptr++ = ctr;
+if (++ctr > 251)
+ctr = 1;
+}
+VALGRIND_MAKE_MEM_UNDEFINED(ptr - size, size);
+save_ctr = ctr;
+}
+#endif   /* RANDOMIZE_ALLOCATED_MEMORY */

Another copy of this doesn't seem like a good idea, either.

More broadly, I'm not sure I like this design very much.  The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size.  However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples.  That's really pretty weird.  Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations?  Maybe we should try to attack that problem
more directly.

On a related note, the autodestruct thing is a weird hack that's only
necessary because of the hijinks already discussed in the previous
paragraph.  The context has no fixed lifetime; we're just trying to
find a way of coping with possibly-shifting tuple sizes over time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Remove vacuum_defer_cleanup_age

2016-10-18 Thread Josh Berkus
On 10/12/2016 05:00 PM, Robert Haas wrote:
> On Sun, Oct 9, 2016 at 9:51 PM, Josh Berkus  wrote:
>> Given that hot_standby_feedback is pretty bulletproof now, and a lot of
>> the work in reducing replay conflicts, I think the utility of
>> vacuum_defer_cleanup_age is at an end.  I really meant so submit a patch
>> to remove it to 9.6, but it got away from me.
>>
>> Any objections to removing the option in 10?
> 
> I'm not sure I see the point.

Redusing the number of configuration variables is an a-priori good.  In
aggregate, the more knobs we have, the harder it is to learn how to
admin Postgres.  Therefore any time a config variable becomes obsolete,
we should remove it.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Claudio Freire
On Tue, Oct 18, 2016 at 3:28 PM, Alvaro Herrera
 wrote:
> I propose we introduce the concept of "indirect indexes".  I have a toy
> implementation and before I go further with it, I'd like this assembly's
> input on the general direction.
>
> Indirect indexes are similar to regular indexes, except that instead of
> carrying a heap TID as payload, they carry the value of the table's
> primary key.  Because this is laid out on top of existing index support
> code, values indexed by the PK can only be six bytes long (the length of
> ItemPointerData); in other words, 281,474,976,710,656 rows are
> supported, which should be sufficient for most use cases.[1]


You don't need that limitation (and vacuum will be simpler) if you add
the PK as another key, akin to:

CREATE INDIRECT INDEX idx ON tab (a, b, c);

turns into

CREATE INDEX idx ON tab (a, b, c, pk);

And is queried appropriately (using an index-only scan, extracting the
PK from the index tuple, and then querying the PK index to get the
tids).

In fact, I believe that can work with all index ams supporting index-only scans.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Simon Riggs
On 18 October 2016 at 21:41, Joshua D. Drake  wrote:

> Are we
> trading initial performance gains for performance degradation through
> maintenance?

Eh? That's backwards, so No. The whole point of this is it avoids long
term degradation currently caused by non-HOT updates.

Normal UPDATEs that don't change PKs won't generate any changes to
VACUUM away, so only actions that remove PK values will cause anything
to be collected and removed from indexes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "Some tests to cover hash_index"

2016-10-18 Thread Robert Haas
On Wed, Oct 12, 2016 at 1:17 AM, Mithun Cy  wrote:
> On Wed, Sep 28, 2016 at 9:56 PM, Robert Haas  wrote:
>> something committable will come from it, but with 2 days left it's not
>> going to happen this CF.
> Adding a new patch. This one uses generate series instead of INSERT INTO
> SELECT and fixed comments from Alvaro.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indirect indexes

2016-10-18 Thread Joshua D. Drake

On 10/18/2016 11:28 AM, Alvaro Herrera wrote:


Vacuuming presents an additional challenge: in order to remove index
items from an indirect index, it's critical to scan the PK index first
and collect the PK values that are being removed.  Then scan the
indirect index and remove any items that match the PK items removed.
This is a bit problematic because of the additional memory needed to
store the array of PK values.  I haven't implemented this yet.


As a whole I think the idea is interesting but the above scares me. Are 
we trading initial performance gains for performance degradation through 
maintenance? Since autovacuum is an indeterminate launch we could have a 
situation where even a medium level updated laden table becomes a source 
of contentions for IO and memory resources. I don't know that we would 
see issues on modern bare metal but considering our implementation space 
is places like RDS and GCE now, this is a serious consideration.


Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-10-18 Thread Tom Lane
Robert Haas  writes:
> On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost  wrote:
>> I'm with Thomas on this and I disagree that the "csvlog bloat" argument
>> has merit.  If we're worried about bloat in csv then we should provide a
>> way for users to control what goes into the csvlog, not argue that
>> something which is clearly useful be excluded.

> I agree, but I think if we do that it would be highly desirable do
> something to make the format discoverable.  For example, at the
> beginning of each file and whenever the format changes, we write some
> kind of unmistakable header line into the file identifying what fields
> will be present on each line thereafter.  It's undesirable for log
> analysis tools to need to contain logic that tries (imperfectly, no
> doubt) to reverse-engineer the field list.

Automatic "csv header" at the start of each logfile, perhaps?  And
force a logfile switch if it changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-10-18 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost  wrote:
> > I'm with Thomas on this and I disagree that the "csvlog bloat" argument
> > has merit.  If we're worried about bloat in csv then we should provide a
> > way for users to control what goes into the csvlog, not argue that
> > something which is clearly useful be excluded.
> 
> I agree, but I think if we do that it would be highly desirable do
> something to make the format discoverable.  For example, at the
> beginning of each file and whenever the format changes, we write some
> kind of unmistakable header line into the file identifying what fields
> will be present on each line thereafter.  It's undesirable for log
> analysis tools to need to contain logic that tries (imperfectly, no
> doubt) to reverse-engineer the field list.

Hmm, I wouldn't use the same file for different formats.  If the format
is changed, a different file should be used.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Indirect indexes

2016-10-18 Thread Alvaro Herrera
I propose we introduce the concept of "indirect indexes".  I have a toy
implementation and before I go further with it, I'd like this assembly's
input on the general direction.

Indirect indexes are similar to regular indexes, except that instead of
carrying a heap TID as payload, they carry the value of the table's
primary key.  Because this is laid out on top of existing index support
code, values indexed by the PK can only be six bytes long (the length of
ItemPointerData); in other words, 281,474,976,710,656 rows are
supported, which should be sufficient for most use cases.[1]

A new flag is added to the index AM routine, indicating whether it can
support indirect indexes.  Initially, only the b-tree AM would support
indirect indexes, but I plan to develop support for GIN indirect soon
afterwards, which seems most valuable.

To create an indirect index, the command
CREATE INDIRECT INDEX
is used.  Currently this always uses the defined primary key index[2].

Implementation-wise, to find a value using an indirect index that index
is scanned first; this produces a PK value, which can be used to scan
the primary key index, the result of which is returned.

There are two big advantages to indirect indexes, both of which are
related to UPDATE's "write amplification":

1. UPDATE is faster.  Indirect indexes on column that are not modified
   by the update do not need to be updated.
2. HOT can be used more frequently.  Columns indexed only by indirect
   indexes do not need to be considered for whether an update needs to
   be non-HOT, so this further limits "write amplification".

The biggest downside is that in order to find out a heap tuple using the
index we need to descend two indexes (the indirect and the PK) instead
of one, so it's slower.  For many use cases the tradeoff is justified.

I measured the benefits with the current prototype implementation.  In
two separate schemas, I created a pgbench_accounts table, with 12
"filler" columns, and indexed them all; one schema used regular indexes,
the other used indirect indexes.  Filled them both to the equivalent of
scale 50, which results in a table of some 2171 MB; the 12 indexes are
282 MB each, and the PK index is 107 MB).  I then ran a pgbench with a
custom script that update a random one of those columns and leave the
others alone on both schemas (not simultaneously).  I ran 100k updates
for each case, 5 times:

  method  │   TPS: min / avg (stddev) / max   │Duration: min / avg / 
max│ avg_wal 
──┼───┼─┼─
 direct   │  601.2 / 1029.9 ( 371.9) / 1520.9 │ 00:01:05.76 / 00:01:48.58 / 
00:02:46.39 │ 4841 MB
 indirect │ 2165.1 / 3081.6 ( 574.8) / 3616.4 │ 00:00:27.66 / 00:00:33.56 / 
00:00:46.2  │ 1194 MB
(2 rows)

This is a pretty small test (not long enough for autovacuum to trigger
decently) but I think this should be compelling enough to present the
case.

Please discuss.

Implementation notes:

Executor-wise, we could have a distinct IndirectIndexScan node, or we
could just hide the second index scan inside a regular IndexScan.  I
think from a cleanliness POV there is no reason to have a separate node;
efficiency wise I think a separate node leads to less branches in the
code.  (In my toy patch I actually have the second indexscan hidden
inside a separate "ibtree" AM; not what I really propose for commit.)
Additionally, executor will have to keep track of the values in the PK
index so that they can be passed down on insertion to each indirect
index.

Planner-wise, I don't think we need to introduce a distinct indirect
index Path.  We can just let the cost estimator attach the true cost of
the two scans to a regular index scan path, and the correct executor
node is produced later if that index is chosen.

In relcache we'll need an additional bitmapset of columns indexed by
indirect indexes.  This is used so that heap_update can return an output
bitmapset of such columns that were not modified (this is computed by
HeapSatisfiesHOTandKeyUpdate).  The executor uses this to know when to
skip updating each indirect index.

Vacuuming presents an additional challenge: in order to remove index
items from an indirect index, it's critical to scan the PK index first
and collect the PK values that are being removed.  Then scan the
indirect index and remove any items that match the PK items removed.
This is a bit problematic because of the additional memory needed to 
store the array of PK values.  I haven't implemented this yet.


Items I haven't thought about yet:
* UNIQUE INDIRECT?  I think these should work, but require some
  tinkering.
* Deferred unique indexes?  See unique_key_recheck.
* CREATE INDEX CONCURRENTLY.


[1]  Eventually we can expand this to allow for "normal" datatypes, say
bigint, but that's likely to require a much bigger patch in order to
change IndexTuple to support it.  I would defer that project to a later
time.

[2] It is possible 

Re: [HACKERS] [ADMIN] 9.5 new setting "cluster name" and logging

2016-10-18 Thread Robert Haas
On Mon, Oct 17, 2016 at 6:24 PM, Stephen Frost  wrote:
> I'm with Thomas on this and I disagree that the "csvlog bloat" argument
> has merit.  If we're worried about bloat in csv then we should provide a
> way for users to control what goes into the csvlog, not argue that
> something which is clearly useful be excluded.

I agree, but I think if we do that it would be highly desirable do
something to make the format discoverable.  For example, at the
beginning of each file and whenever the format changes, we write some
kind of unmistakable header line into the file identifying what fields
will be present on each line thereafter.  It's undesirable for log
analysis tools to need to contain logic that tries (imperfectly, no
doubt) to reverse-engineer the field list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor issue: \c without parameter disconnect current user

2016-10-18 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> On Mon, Oct 17, 2016 at 12:49 AM, Pavel Stehule  
>> wrote:
>>> I expect so \c without parameters has only informational character. But \c
>>> reset user.

>> Yeah, I use that feature all the time.

> And so do the regression tests.

pg_dump relies on it too, I think.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mention column name in error messages

2016-10-18 Thread Robert Haas
On Mon, Oct 17, 2016 at 3:18 AM, Michael Paquier
 wrote:
> Andres wrote:
>> +/* Support for TransformExprCallback */
>> +typedef struct TransformExprState
>> +{
>> + const char *column_name;
>> + Oid expected_type;
>> +} TransformExprState;
>> I see no need for this to be a struct defined in the header. Given that
>> TransformExprCallback isn't public, and the whole thing is specific to
>> TransformExprState...
>
> That's a matter of taste, really. Personally I find cleaner to declare
> that with the other static declarations at the top of the fil, and
> keep the comments of transformAssignedExpr clean of everything.

It's pretty standard practice for PostgreSQL to keep declarations
private to particular files whenever they are used only in that file.
And I'd argue that it's good practice in general.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New SQL counter statistics view (pg_stat_sql)

2016-10-18 Thread Robert Haas
On Thu, Sep 29, 2016 at 1:45 AM, Haribabu Kommi
 wrote:
> Currently, The SQL stats is a fixed size counter to track the all the ALTER
> cases as single counter. So while sending the stats from the backend to
> stats collector at the end of the transaction, the cost is same, because of
> it's fixed size. This approach adds overhead to send and read the stats
> is minimal.
>
> With the following approach, I feel it is possible to support the counter at
> command tag level.
>
> Add a Global and local Hash to keep track of the counters by using the
> command tag as the key, this hash table increases dynamically whenever
> a new type of SQL command gets executed. The Local Hash data is passed
> to stats collector whenever the transaction gets committed.
>
> The problem I am thinking is that, Sending data from Hash and populating
> the Hash from stats file for all the command tags adds some overhead.

Yeah, I'm not very excited about that overhead.  This seems useful as
far as it goes, but I don't really want to incur measurable overhead
when it's in use.  Having a hash table rather than a fixed array of
slots means that you have to pass this through the stats collector
rather than updating shared memory directly, which is fairly heavy
weight.  If each backend could have its own copy of the slot array and
just update that, and readers added up the values across the whole
array, this could be done without any locking at all, and it would
generally be much lighter-weight than this approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor issue: \c without parameter disconnect current user

2016-10-18 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Oct 17, 2016 at 12:49 AM, Pavel Stehule  
> wrote:
> > I expect so \c without parameters has only informational character. But \c
> > reset user.
> 
> Yeah, I use that feature all the time.

And so do the regression tests.

I think what Pavel wants is \conninfo.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] simplehash.h typo fix

2016-10-18 Thread Andres Freund
Hi,

On 2016-10-18 09:41:14 +0200, Erik Rijkers wrote:
> it's -> its
> the the -> the

Thanks! Committed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] minor issue: \c without parameter disconnect current user

2016-10-18 Thread Robert Haas
On Mon, Oct 17, 2016 at 12:49 AM, Pavel Stehule  wrote:
> I expect so \c without parameters has only informational character. But \c
> reset user.

Yeah, I use that feature all the time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor improvement to delete.sgml

2016-10-18 Thread Robert Haas
On Fri, Oct 14, 2016 at 12:05 AM, Etsuro Fujita
 wrote:
> I think it's better to mention that an alias is needed for the target table
> specified in the USING clause of a DELETE statement, to set up a self-join,
> as the documentation on the from_list parameter of UPDATE does.  Please find
> attached a patch.

The statement you are proposing to add to the documentation isn't true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread David G. Johnston
On Tue, Oct 18, 2016 at 10:21 AM, Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Tue, Oct 18, 2016 at 9:45 AM, Tom Lane  wrote:
> >> One interesting point, if you wish to consider history as being
> >> connection-specific, is what happens during a \c command.  Right
> >> now the answer is "nothing" but you might wish it were different.
>
> > ​Just to clarify/confirm a point inferred from the docs...
> > If you place "\set HISTFILE​ ~/.psql_history- :DBNAME"
> > ​into your .psqlrc file then when you perform a "\c" the .psqlrc file is
> > re-read and the ​new value for DBNAME is used to generate a new history
> > file name.
>
> Um, no, I see no indication of that in the code.  Where did you read that
> in the docs?
>
>
​
​
 DBNAME
The name of the database you are currently connected to. This is set every
time you connect to a database (including program start-up), but can be
unset.

​​HISTFILE
The file name that will be used to store the history list. The default
value is ~/.psql_history. For example, putting:
\set HISTFILE ~/.psql_history- :DBNAME
in ~/.psqlrc will cause psql to maintain a separate history for each
database.

​The "including program start-up" aspect to DBNAME means that it is changed
upon using "\c"

I inferred the part about .psqlrc being re-read and thus taking on the new
value of :DBNAME in the example.

psqlrc is later defined to be "the [system|user] startup file" so it was
wrong to conclude that it was re-read upon issuing "\c" - though
"connection startup" isn't a totally unreasonable interpretation of the
timing at which this file is read.  Not everyone is going to associate the
"rc" suffix with the file only being read during program startup.
​


> If we wanted the history file to change at \c, I think the best way would
> be to invent some escape-patterns that could be placed in the value of
> HISTFILE, say along the lines of "\set HISTFILE ~/.psql_history-%h-%p",
> and then specify that if the value contains any such patterns we'll dump
> and reload the history file when reconnecting.  But TBH I don't think
> it's worth the trouble.  I'd sure like to have seen multiple requests
> for such functionality before we go to the trouble.
>

A slightly less invasive approach would be to have a "connection startup
script" file..."psql-conn-rc"...that is re-read immediately after a
successful connection is made to a database.

David J.


Re: [HACKERS] Hash Indexes

2016-10-18 Thread Andres Freund
On 2016-10-18 13:38:14 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Tue, Oct 18, 2016 at 5:37 AM, Amit Kapila  
> > wrote:
> >> I have implemented this idea and it works for MVCC scans.  However, I
> >> think this might not work for non-MVCC scans.  Consider a case where
> >> in Process-1, hash scan has returned one row and before it could check
> >> it's validity in heap, vacuum marks that tuple as dead and removed the
> >> entry from heap and some new tuple has been placed at that offset in
> >> heap.
> 
> > Oops, that's bad.
> 
> Do we care?  Under what circumstances would a hash index be used for a
> non-MVCC scan?

Uniqueness checks, are the most important one that comes to mind.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in foreign.h

2016-10-18 Thread Robert Haas
On Wed, Oct 12, 2016 at 9:20 PM, Amit Langote
 wrote:
> Attached fixes a minor typo: s/Thes/These/g

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Indexes

2016-10-18 Thread Tom Lane
Robert Haas  writes:
> On Tue, Oct 18, 2016 at 5:37 AM, Amit Kapila  wrote:
>> I have implemented this idea and it works for MVCC scans.  However, I
>> think this might not work for non-MVCC scans.  Consider a case where
>> in Process-1, hash scan has returned one row and before it could check
>> it's validity in heap, vacuum marks that tuple as dead and removed the
>> entry from heap and some new tuple has been placed at that offset in
>> heap.

> Oops, that's bad.

Do we care?  Under what circumstances would a hash index be used for a
non-MVCC scan?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-10-18 Thread Robert Haas
On Wed, Oct 12, 2016 at 9:18 AM, Jeevan Chalke
 wrote:
> I did performance testing for aggregate push down and see good performance
> with the patch.

Are you planning another update to this patch based on Ashutosh's comments?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Hash Indexes

2016-10-18 Thread Robert Haas
On Tue, Oct 18, 2016 at 5:37 AM, Amit Kapila  wrote:
> On Wed, Oct 5, 2016 at 10:22 PM, Robert Haas  wrote:
>> On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila  wrote:
>>> I think one way to avoid the risk of deadlock in above scenario is to
>>> take the cleanup lock conditionally, if we get the cleanup lock then
>>> we will delete the items as we are doing in patch now, else it will
>>> just mark the tuples as dead and ensure that it won't try to remove
>>> tuples that are moved-by-split.  Now, I think the question is how will
>>> these dead tuples be removed.  We anyway need a separate mechanism to
>>> clear dead tuples for hash indexes as during scans we are marking the
>>> tuples as dead if corresponding tuple in heap is dead which are not
>>> removed later.  This is already taken care in btree code via
>>> kill_prior_tuple optimization.  So I think clearing of dead tuples can
>>> be handled by a separate patch.
>>
>> That seems like it could work.
>
> I have implemented this idea and it works for MVCC scans.  However, I
> think this might not work for non-MVCC scans.  Consider a case where
> in Process-1, hash scan has returned one row and before it could check
> it's validity in heap, vacuum marks that tuple as dead and removed the
> entry from heap and some new tuple has been placed at that offset in
> heap.

Oops, that's bad.

> Now when Process-1 checks the validity in heap, it will check
> for different tuple then what the index tuple was suppose to check.
> If we want, we can make it work similar to what btree does as being
> discussed on thread [1], but for that we need to introduce page-scan
> mode as well in hash indexes.   However, do we really want to solve
> this problem as part of this patch when this exists for other index am
> as well?

For what other index AM does this problem exist?  Kevin has been
careful not to create this problem for btree, or at least I think he
has.  That's why the pin still has to be held on the index page when
it's a non-MVCC scan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread Tom Lane
"David G. Johnston"  writes:
> On Tue, Oct 18, 2016 at 9:45 AM, Tom Lane  wrote:
>> One interesting point, if you wish to consider history as being
>> connection-specific, is what happens during a \c command.  Right
>> now the answer is "nothing" but you might wish it were different.

> ​Just to clarify/confirm a point inferred from the docs...
> If you place "\set HISTFILE​ ~/.psql_history- :DBNAME"
> ​into your .psqlrc file then when you perform a "\c" the .psqlrc file is
> re-read and the ​new value for DBNAME is used to generate a new history
> file name.

Um, no, I see no indication of that in the code.  Where did you read that
in the docs?

If we wanted the history file to change at \c, I think the best way would
be to invent some escape-patterns that could be placed in the value of
HISTFILE, say along the lines of "\set HISTFILE ~/.psql_history-%h-%p",
and then specify that if the value contains any such patterns we'll dump
and reload the history file when reconnecting.  But TBH I don't think
it's worth the trouble.  I'd sure like to have seen multiple requests
for such functionality before we go to the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread David G. Johnston
On Tue, Oct 18, 2016 at 9:45 AM, Tom Lane  wrote:

> Jonathan Jacobson  writes:
> > The .psql_history file is naturally used by different DB connections
> > (distinguished by a different combination of host + port + database +
> user).
> > At least in my multi-database working environment, this leads sometimes
> to
> > frustration because there are commands that cannot or should not be used
> by
> > different connections.
> > To solve this, psql could keep a separate command history file for each
> > connection.
> > I will be happy to make this my first contribution to PostgreSQL's code.
> > What do you say?
>
> One interesting point, if you wish to consider history as being
> connection-specific, is what happens during a \c command.  Right
> now the answer is "nothing" but you might wish it were different.
>

​Just to clarify/confirm a point inferred from the docs...

If you place "\set HISTFILE​ ~/.psql_history- :DBNAME
"
​
​
​into your .psqlrc file then when you perform a "\c" the .psqlrc file is
re-read and the ​new value for DBNAME is used to generate a new history
file name.

​As Julien pointed out cross-thread this really does seem to be the best
place to implement such logic - though shell functions and aliases can be
used to some good effect as well.

David J.


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread Tom Lane
Jonathan Jacobson  writes:
> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes to
> frustration because there are commands that cannot or should not be used by
> different connections.
> To solve this, psql could keep a separate command history file for each
> connection.
> I will be happy to make this my first contribution to PostgreSQL's code.
> What do you say?

Personally, I'd be strongly against that because I frequently *want*
to re-use the same command on different connections.  As an example,
comparing the behavior of the same command in different PG versions
(hence different postmasters) is an everyday task for me.

I can see that others might have different needs, but surely this
is going to be a use-case-specific requirement.

It's already possible to control which history file is used via psql's
HISTFILE variable and/or the PSQL_HISTORY environment variable.  Perhaps
you can solve your problem today by manipulating those?

One interesting point, if you wish to consider history as being
connection-specific, is what happens during a \c command.  Right
now the answer is "nothing" but you might wish it were different.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread David G. Johnston
On Tue, Oct 18, 2016 at 9:26 AM, Jonathan Jacobson  wrote:

> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes to
> frustration because there are commands that cannot or should not be used by
> different connections.
> To solve this, psql could keep a separate command history file for each
> connection.
> I will be happy to make this my first contribution to PostgreSQL's code.
> What do you say?
>
>
​Existing capabilities not withstanding the feature itself would probably
be considered but I'm doubtful it would replace the default behavior.

The user would, at minimum, have to designate​

​a directory into which the separate history files would be created -
polluting the home directory is unappealing.

If you want to propose the specifics of a feature that meets those two
criteria specific comments and opinions on include-ability can be made.

David J.
​


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread Julien Rouhaud
On 18/10/2016 18:26, Jonathan Jacobson wrote:
> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes
> to frustration because there are commands that cannot or should not be
> used by different connections.
> To solve this, psql could keep a separate command history file for each
> connection.

You can already do this, for instance in your .psqlrc:

\set HISTFILE ~/.psql_history- :HOST - :PORT - :DBNAME - :USER


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread David G. Johnston
On Tue, Oct 18, 2016 at 9:26 AM, Jonathan Jacobson  wrote:

> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes to
> frustration because there are commands that cannot or should not be used by
> different connections.
> To solve this, psql could keep a separate command history file for each
> connection.
> I will be happy to make this my first contribution to PostgreSQL's code.
> What do you say?
>

​I would say that users needing such fine grained control of the history
file should avail themselves of HISTFILE ​and other related psql variables
and/or the PSQL_HISTORY environment variable.

David J.


Re: [HACKERS] Multiple psql history files

2016-10-18 Thread Corey Huinker
On Tue, Oct 18, 2016 at 12:26 PM, Jonathan Jacobson 
wrote:

> The .psql_history file is naturally used by different DB connections
> (distinguished by a different combination of host + port + database + user).
> At least in my multi-database working environment, this leads sometimes to
> frustration because there are commands that cannot or should not be used by
> different connections.
> To solve this, psql could keep a separate command history file for each
> connection.
> I will be happy to make this my first contribution to PostgreSQL's code.
> What do you say?
>
> Regards,
> Jonathan
>

That's settable with HISTFILE

\set HISTFILE ~/.psql_history- :DBNAME

There's also :PORT and a few other vars.

(example borrowed from
http://stackoverflow.com/questions/17924906/where-is-psql-client-history-kept-psql-history-non-existant
)


[HACKERS] Multiple psql history files

2016-10-18 Thread Jonathan Jacobson
The .psql_history file is naturally used by different DB connections
(distinguished by a different combination of host + port + database + user).
At least in my multi-database working environment, this leads sometimes to
frustration because there are commands that cannot or should not be used by
different connections.
To solve this, psql could keep a separate command history file for each
connection.
I will be happy to make this my first contribution to PostgreSQL's code.
What do you say?

Regards,
Jonathan


Re: [HACKERS] [PATCH] pgpassfile connection option

2016-10-18 Thread Robert Haas
On Tue, Oct 11, 2016 at 5:06 PM, Oskari Saarenmaa  wrote:
>   $ PASSWORD=xyz psql 'password=$PASSWORD dbname=foo'
>
> This does have the hazard of making it very easy to accidentally use double
> quotes instead of single quotes and have the shell expand the variable
> making it visible in process listing though.

It has the hazard that environment variables are visible in the
process listing anyway on many platforms.  On Linux, try "ps auxeww";
on MacOS X, try "ps -efEww".  At a quick glance, it seems that on both
of those platforms you have to either be root or be the same user that
owns the process, but I'm not sure that every platform will have it
locked down that tightly and even that might be more exposure than you
really want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Tom Lane
Albe Laurenz  writes:
> The buildfarm log at
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thrips=2016-10-12%2018%3A37%3A26
> shows the build failing (among others) for contrib/tablefunc
> (close to the bottom end of the log).

That's a build of 9.6.

> Now when I look at the source, there is only one C file, and the
> failing functions are *not* declared anywhere, neither in the C file
> nor in the (almost empty) header file.

The declarations in question seem to have been removed in
0665023b4435a469e42289d7065c436967a022b6, which is only in HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM's ancillary tasks

2016-10-18 Thread Robert Haas
On Sun, Oct 16, 2016 at 3:35 PM, Jeff Janes  wrote:
> On Fri, Oct 7, 2016 at 6:14 AM, Robert Haas  wrote:
>> On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janes  wrote:
>> > In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were
>> > represented, so that we could make freezing more aggressive without
>> > losing
>> > forensic evidence.  But I don't think we ever did anything to actually
>> > make
>> > the freezing more aggressive.
>>
>> See 3cff1879f8d03cb729368722ca823a4bf74c0cac.  The objection to doing
>> it in other cases is that it adds write-ahead log volume which might
>> cause its own share of problems.  There might be some way of getting
>> ahead of that, though.  I think if we piggyback on an existing WAL
>> record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be
>> minimal, but I haven't been dedicated enough to try to write the
>> patch.
>>
>> > When I applied the up-thread patch so that pgbench_history gets autovac,
>> > those autovacs don't actually cause any pages to get frozen until the
>> > wrap
>> > around kicks in, even when all the tuples on the early pages should be
>> > well
>> > beyond vacuum_freeze_min_age.
>>
>> If the pages are already all-visible, they'll be skipped until
>> vacuum_freeze_table_age is reached.
>
> So if I set vacuum_freeze_min_age to zero, then they should become
> all-visible and all-frozen at the same time, and avoid that problem?

Hmm.  I *think* so...

> From the docs on vacuum_freeze_min_age: "Increasing this setting may avoid
> unnecessary work if the rows that would otherwise be frozen will soon be
> modified again".  How much work is that? Presumably they are already getting
> marked visible, is marking them frozen too a meaningful amount of extra
> work?  Is it just the extra WAL record?

Yeah, the extra WAL record is the main thing, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Albe Laurenz
Tom Lane wrote:
> No, it's cross *file* references within a single contrib module that
> would be likely to need extern declarations in a header file.  That's
> not especially weird IMO.  I'm not sure how many cases there actually
> are though.

I went through the contrib moules and tried to look that up,
and now I am even more confused than before.

The buildfarm log at
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thrips=2016-10-12%2018%3A37%3A26
shows the build failing (among others) for contrib/tablefunc
(close to the bottom end of the log).

Now when I look at the source, there is only one C file, and the
failing functions are *not* declared anywhere, neither in the C file
nor in the (almost empty) header file.

So it must be something other than double declaration that makes
the build fail.
Could it be that the .DEF files have something to do with that?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Tom Lane
I wrote:
> No, it's cross *file* references within a single contrib module that
> would be likely to need extern declarations in a header file.  That's
> not especially weird IMO.  I'm not sure how many cases there actually
> are though.

I poked at this a little bit.  AFAICT, the only actual cross-file
references are in contrib/ltree/, which has quite a few.  Maybe we
could hold our noses and attach PGDLLEXPORT to the declarations in
ltree.h.

hstore's HSTORE_POLLUTE macro would also need PGDLLEXPORT-ification,
but that's just within the macro so it wouldn't be too ugly.

The other problem is xml2's xml_is_well_formed(), which duplicates
the name of a core backend function.  If we PGDLLEXPORT-ify that,
we get conflicts against the core's declaration in utils/xml.h.
I do not see any nice way to dodge that.  Maybe we could decide that
six releases' worth of backwards compatibility is enough and
just drop it from xml2 --- AFAICS, that would only break applications
that had never updated to the extension-ified version of xml2 at all.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-10-18 Thread Robert Haas
On Fri, Oct 14, 2016 at 12:37 AM, Ashutosh Bapat
 wrote:
>> Have you tested the effect of this patch on planner memory consumption
>> with multi-way joins between tables with many partitions?  If you
>> haven't, you probably should. (Testing runtime would be good, too.)
>> Does it grow linearly?  Quadratically?  Exponentially?  Minor leaks
>> don't matter, but if we're generating too much garbage we'll have to
>> make sure it gets cleaned up soon enough to prevent runaway memory
>> usage.
>
> I tried to check memory usage with various combinations of number of
> partitions and number of relations being joined. For higher number of
> relations being joined like 10 with 100 partitions, OOM killer kicked
> in during the planning phase. I am suspecting
> adjust_partitionrel_attrs() (changed that name to
> adjust_join_appendrel_attrs() to be in sync with
> adjust_appendrel_attrs()) to be the culprit. It copies expression
> trees every time for joining two children. That's an exponentially
> increasing number as the number of legal joins increases
> exponentially. I am still investigating this.

I think the root of this problem is that the existing paths shares a
lot more substructure than the ones created by the new code.  Without
a partition-wise join, the incremental memory usage for a joinrel
isn't any different whether the underlying rel is partitioned or not.
If it's partitioned, we'll be pointing to an AppendPath; if not, we'll
be pointing to some kind of Scan.  But the join itself creates exactly
the same amount of new stuff regardless of what's underneath it.  With
partitionwise join, that ceases to be true.  Every joinrel - and the
number of those grows exponentially in the number of baserels, IICU -
needs its own list of paths for every member rel.  So if a
non-partition-wise join created X paths, and there are K partitions, a
partition-wise join creates X * K paths.  That's a lot.

Although we might be able to save some memory by tightening things up
here and there - for example, right now the planner isn't real smart
about recycling paths that are evicted by add_path(), and there's
probably other wastage as well - I suspect that what this shows is
that the basic design of this patch is not going to be viable.
Intuitively, it's often going to be the case that we want the "same
plan" for every partition-set.  That is, if we have A JOIN B ON A.x =
B.x JOIN C ON A.y = B.y, and if A, B, and C are all compatibility
partitioned, then the result should be an Append plan with 100 join
plans under it, and all 100 of those plans should be basically mirror
images of each other.  Of course, that's not really right in general:
for example, it could be that A1 is big and A2 is small while B1 is
small and B2 is big, so that the right plan for (A1 JOIN B1) and for
(A2 JOIN B2) are totally different from each other.  But in many
practical cases we'll want to end up with a plan of precisely the same
shape for all children, and the current design ignores this, expending
both memory and CPU time to compute essentially-equivalent paths
across all children.

One way of attacking this problem is to gang together partitions which
are equivalent for planning purposes, as discussed in the paper "Join
Optimization Techniques for Partitioned Tables" by Herodotou, Borisov,
and Babu.  However, it's not exactly clear how to do this: we could
gang together partitions that have the same index definitions, but the
sizes of the heaps, the sizes of their indexes, and the row counts
will vary from one partition to the next, and any of those things
could cause the plan choice to be different for one partition vs. the
next.  We could try to come up with heuristics for when those things
are likely to be true.  For example, suppose we compute the set of
partitions such that all joined relations have matching index
definitions on all tables; then, we take the biggest table in the set
and consider all tables more than half that size as part of one gang.
The biggest table becomes the leader and we compute partition-wise
paths for just that partition; the other members of the gang will
eventually get a plan that is of the same shape, but we don't actually
create it that plan until after scan/join planning is concluded.

Another idea is to try to reduce peak memory usage by performing
planning separately for each partition-set.  For example, suppose we
decide to do a partition-wise join of A, B, and C.  Initially, this
gets represented as a PartitionJoinPath tree, like this:

PartitionJoinPath
-> AppendPath for A
-> PartitionJoinPath
  -> AppendPath for B
  -> AppendPath for C

Because we haven't created individual join paths for the members, this
doesn't use much memory.  Somehow, we come up with a cost for the
PartitionJoinPath; it probably won't be entirely accurate.  Once
scan/join planning is concluded, if our final path contains a
PartitionJoinPath, we go back and loop over the partitions.  For each

Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Tom Lane
Craig Ringer  writes:
> On 18 October 2016 at 04:11, Tom Lane  wrote:
>> As for the core problem, I wonder why we aren't recommending that
>> third-party modules be built using the same infrastructure contrib
>> uses, rather than people ginning up their own infrastructure and
>> then finding out the hard way that that means they need PGDLLEXPORT
>> marks.

> Effectively, "PGXS for Windows".

Yeah.

> I've kind of been hoping the CMake work would make the whole mess of
> Perl build stuff go away. CMake would solve this quite neatly since we
> can bundle CMake parameters file for inclusion in other projects and
> could also tell pg_config how to point to it. Extensions then become
> trivial CMake projects.

Agreed, it'd be wise not to put any effort into that until we see
whether the CMake conversion succeeds.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Tom Lane
Craig Ringer  writes:
> On 18 October 2016 at 04:19, Andres Freund  wrote:
>> On 2016-10-17 16:16:37 -0400, Robert Haas wrote:
>>> I wouldn't think that cross-file references would be especially
>>> common.  Functions that take PG_FUNCTION_ARGS and return Datum aren't
>>> a lot of fun to call from C.  But maybe I'm wrong.

>> There's a fair number of DirectFunctionCall$Ns over the backend.

> It's only an issue if one contrib calls another contrib (or the core
> backend code calls into a contrib, but that's unlikely) via
> DirectFunctionCall .

No, it's cross *file* references within a single contrib module that
would be likely to need extern declarations in a header file.  That's
not especially weird IMO.  I'm not sure how many cases there actually
are though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move pg_largeobject to a different tablespace *without* turning on system_table_mods.

2016-10-18 Thread Andreas Joseph Krogh
På tirsdag 18. oktober 2016 kl. 16:26:37, skrev Euler Taveira <
eu...@timbira.com.br >:
On 18-10-2016 10:13, Andreas Joseph Krogh wrote:
 > From time to time pg_largeobject comes up as an issue with being
 > implemented as a system-catalog.
 > 
 Did you read the archives [1]?
 
 
Yes..
 
> As I see it, there are 2 relevant use-cases for improving the situation:
 > 1. Being able to pg_dump *without* any LOs (think of it as
 >    without the contents of pg_largeobject). This is very handy
 >    for testing/troubleshooting.
 >
 It could be an option (--no-blobs). The -b option has a limited use case.
 
 
Yes, it definitely should be an option to pg_dump. I guess because of 
pg_largeobject being a system-catalog it adds additional difficulties 
implementing it?
 
> 2. Being able to move pg_largeobject to a different tablespace
 >    *without* turning on system_table_mods. This is important for
 >    people storing LOTS of large-objects on separate
 >    disks (non-SSD) and hence in a different tablespace.
 > Anyone willing to discuss this?
 > 
 This was proposed a few years ago but no one cared to draft a patch.
 
So that why I'm re-raising the issue:-)
Having "everything in the database" adds lots of benefits, conceptually 
(follows tx-semantics, consistent backups etc.), however it's currently not so 
easy in practice.
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Tom Lane
I wrote:
> The cleanest fix might be to change those various "long" variables
> to uint32.  You'd have to think about how to handle the ntohl/htonl
> calls that are used on them, though.

Or actually, no, you wouldn't have to think very hard.  I was supposing
that those calls were declared to traffic in "long"s, but they aren't
and never have been, at least not since SUSv2:

uint32_t htonl(uint32_t hostlong);
uint16_t htons(uint16_t hostshort);
uint32_t ntohl(uint32_t netlong);
uint16_t ntohs(uint16_t netshort);

So s/long/uint32/ would probably fix it just fine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move pg_largeobject to a different tablespace *without* turning on system_table_mods.

2016-10-18 Thread Euler Taveira
On 18-10-2016 10:13, Andreas Joseph Krogh wrote:
> From time to time pg_largeobject comes up as an issue with being
> implemented as a system-catalog.
>  
Did you read the archives [1]?

> As I see it, there are 2 relevant use-cases for improving the situation:
> 1. Being able to pg_dump *without* any LOs (think of it as
>without the contents of pg_largeobject). This is very handy
>for testing/troubleshooting.
>
It could be an option (--no-blobs). The -b option has a limited use case.

> 2. Being able to move pg_largeobject to a different tablespace
>*without* turning on system_table_mods. This is important for
>people storing LOTS of large-objects on separate
>disks (non-SSD) and hence in a different tablespace.
> Anyone willing to discuss this?
>  
This was proposed a few years ago but no one cared to draft a patch.


[1]
https://www.postgresql.org/message-id/3073cc9b0910051618t693d15f3u265872908240d...@mail.gmail.com


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Tom Lane
Heikki Linnakangas  writes:
> On 10/18/2016 04:13 PM, Tom Lane wrote:
>> There's a smoking gun in the postmaster log:
>> 2016-10-18 09:10:34.547 EDT [18502] LOG:  wrong key in cancel request for 
>> process 18491

> Ok, I've reverted that commit for now. It clearly needs more thought, 
> because of this, and the pademelon failure discussed on the other thread.

I think that was an overreaction.  The problem is pretty obvious after
adding some instrumentation:

2016-10-18 09:57:47.508 EDT [21229] LOG:  wrong key (0x7B7E4D5E, expected 
0xF0F804017B7E4D5E) in cancel request for process 21228

To wit, the various cancel_key backend variables are declared as "long",
and the new code

if (!pg_strong_random(, sizeof(MyCancelKey)))

is therefore computing an 8-byte random value on 64-bit-long machines.
But only 4 bytes go to the client and come back.

The cleanest fix might be to change those various "long" variables
to uint32.  You'd have to think about how to handle the ntohl/htonl
calls that are used on them, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] emergency outage requiring database restart

2016-10-18 Thread Merlin Moncure
On Mon, Oct 17, 2016 at 2:04 PM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>
>> castaging=# CREATE OR REPLACE VIEW vw_ApartmentSample AS
>> castaging-#   SELECT ...
>> ERROR:  42809: "pg_cast_oid_index" is an index
>> LINE 11:   FROM ApartmentSample s
>> ^
>> LOCATION:  heap_openrv_extended, heapam.c:1304
>>
>> should I be restoring from backups?
>
> It's pretty clear to me that you've got catalog corruption here.  You
> can try to fix things manually as they emerge, but that sounds like a
> fool's errand.

Yeah.  Believe me -- I know the drill.  Most or all the damage seemed
to be to the system catalogs with at least two critical tables dropped
or inaccessible in some fashion.  A lot of the OIDs seemed to be
pointing at the wrong thing.  Couple more datapoints here.

*) This database is OLTP, doing ~ 20 tps avg (but very bursty)
*) Another database on the same cluster was not impacted.  However
it's more olap style and may not have been written to during the
outage

Now, this infrastructure running this system is running maybe 100ish
postgres clusters and maybe 1000ish sql server instances with
approximately zero unexplained data corruption issues in the 5 years
I've been here.  Having said that, this definitely smells and feels
like something on the infrastructure side.  I'll follow up if I have
any useful info.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Heikki Linnakangas

On 10/18/2016 04:13 PM, Tom Lane wrote:

Magnus Hagander  writes:

On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.



My guess is it's related to this:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c
That's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.
What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?


I can confirm that query cancel is broken in HEAD on RHEL6.

regression=# select pg_sleep(10);
^CCancel request sent
... nothing happens for the balance of the 10 seconds ...
regression=#

There's a smoking gun in the postmaster log:

2016-10-18 09:10:34.547 EDT [18502] LOG:  wrong key in cancel request for 
process 18491


Ok, I've reverted that commit for now. It clearly needs more thought, 
because of this, and the pademelon failure discussed on the other thread.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Tom Lane
Magnus Hagander  writes:
> On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>> The test executes "select pg_sleep(10)" and tries to cancel it. In recent
>> master builds, cancel seems to be ignored, and the statement lasts for 10
>> seconds.

> My guess is it's related to this:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c
> That's certainly not intended to break things, but that was changed on Oct
> 17 and it relates to cancel keys.
> What platform does the postgres server run on? Can can you check if query
> cancel works on libpq or if it's completely broken?

I can confirm that query cancel is broken in HEAD on RHEL6.

regression=# select pg_sleep(10);
^CCancel request sent
... nothing happens for the balance of the 10 seconds ...
regression=# 

There's a smoking gun in the postmaster log:

2016-10-18 09:10:34.547 EDT [18502] LOG:  wrong key in cancel request for 
process 18491

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Move pg_largeobject to a different tablespace *without* turning on system_table_mods.

2016-10-18 Thread Andreas Joseph Krogh
Hi -hackers.
 
>From time to time pg_largeobject comes up as an issue with being implemented 
as a system-catalog.
 
As I see it, there are 2 relevant use-cases for improving the situation:
1. Being able to pg_dump *without* any LOs (think of it as
    without the contents of pg_largeobject). This is very handy
    for testing/troubleshooting.
2. Being able to move pg_largeobject to a different tablespace
    *without* turning on system_table_mods. This is important for
    people storing LOTS of large-objects on separate
    disks (non-SSD) and hence in a different tablespace.

Anyone willing to discuss this?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 




Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-18 Thread Ashutosh Bapat
On Tue, Oct 18, 2016 at 6:18 PM, Etsuro Fujita
 wrote:
> On 2016/10/17 20:12, Ashutosh Bapat wrote:
>
>>> On 2016/10/07 10:26, Amit Langote wrote:

 I think this (v4) patch is in the best shape so far.
>>
>> +1, except one small change.
>
>
>> The comment "... On relcache invalidation events or the relevant
>> syscache invalidation events, ..." specifies relcache separately. I
>> think, it should either needs to specify all the syscaches or just
>> mention "On corresponding syscache invalidation events, ...".
>>
>> Attached patch uses the later version. Let me know if this looks good.
>> If you are fine, I think we should mark this as "Ready for committer".
>
>
> I'm not sure that is a good idea because ISTM the comments there use the
> words "syscache" and "relcache" properly.  I'd like to leave that for
> committers.

Fine with me.

>
>> The patch compiles cleanly, and make check in regress and that in
>> postgres_fdw shows no failures.
>
>
> Thanks for the review and the updated patch!
>
> One thing I'd like to propose to improve the patch is to update the
> following comments for PlanCacheFuncCallback: "Note that the coding would
> support use for multiple caches, but right now only user-defined functions
> are tracked this way.".  We now use this for tracking FDW-related objects as
> well, so the comments needs to be updated. Please find attached an updated
> version.

Fine with me.

>
> Sorry for speaking this now, but one thing I'm not sure about the patch is
> whether we should track the dependencies on FDW-related objects more
> accurately; we modified extract_query_dependencies so that the query tree's
> dependencies are tracked, regardless of the command type, but the query tree
> would be only affected by those objects in AddForeignUpdateTargets, so it
> would be enough to collect the dependencies for the case where the given
> query is UPDATE/DELETE.  But I thought the patch would be probably fine as
> proposed, because we expect updates on such catalogs to be infrequent.  (I
> guess the changes needed for the accuracy would be small, though.)

If we do as you suggest, we will have to add separate code for
tracking plan dependencies for SELECT queries. In fact, I am not in
favour of tracking the query dependencies for UPDATE/DELETE since we
don't have any concrete example as to when that would be needed.

> Besides
> that, I modified add_rte_to_flat_rtable so that the plan's dependencies are
> tracked, but that would lead to tracking the dependencies of unreferenced
> foreign tables in dead subqueries or the dependencies of foreign tables
> excluded from the plan by eg, constraint exclusion.  But I thought that
> would be also OK by the same reason as above.  (Another reason for that was
> it seemed better to me to collect the dependencies in the same place as for
> relation OIDs.)

If those unreferenced relations become references because of the
changes in options, we will require those query dependencies to be
recorded. So, if we are recording query dependencies, we should record
the ones on unreferenced relations as well.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

2016-10-18 Thread Etsuro Fujita

On 2016/10/17 20:12, Ashutosh Bapat wrote:


On 2016/10/07 10:26, Amit Langote wrote:

I think this (v4) patch is in the best shape so far.

+1, except one small change.



The comment "... On relcache invalidation events or the relevant
syscache invalidation events, ..." specifies relcache separately. I
think, it should either needs to specify all the syscaches or just
mention "On corresponding syscache invalidation events, ...".

Attached patch uses the later version. Let me know if this looks good.
If you are fine, I think we should mark this as "Ready for committer".


I'm not sure that is a good idea because ISTM the comments there use the 
words "syscache" and "relcache" properly.  I'd like to leave that for 
committers.



The patch compiles cleanly, and make check in regress and that in
postgres_fdw shows no failures.


Thanks for the review and the updated patch!

One thing I'd like to propose to improve the patch is to update the 
following comments for PlanCacheFuncCallback: "Note that the coding 
would support use for multiple caches, but right now only user-defined 
functions are tracked this way.".  We now use this for tracking 
FDW-related objects as well, so the comments needs to be updated. 
Please find attached an updated version.


Sorry for speaking this now, but one thing I'm not sure about the patch 
is whether we should track the dependencies on FDW-related objects more 
accurately; we modified extract_query_dependencies so that the query 
tree's dependencies are tracked, regardless of the command type, but the 
query tree would be only affected by those objects in 
AddForeignUpdateTargets, so it would be enough to collect the 
dependencies for the case where the given query is UPDATE/DELETE.  But I 
thought the patch would be probably fine as proposed, because we expect 
updates on such catalogs to be infrequent.  (I guess the changes needed 
for the accuracy would be small, though.)  Besides that, I modified 
add_rte_to_flat_rtable so that the plan's dependencies are tracked, but 
that would lead to tracking the dependencies of unreferenced foreign 
tables in dead subqueries or the dependencies of foreign tables excluded 
from the plan by eg, constraint exclusion.  But I thought that would be 
also OK by the same reason as above.  (Another reason for that was it 
seemed better to me to collect the dependencies in the same place as for 
relation OIDs.)


Best regards,
Etsuro Fujita


foreign_plan_cache_inval_v6.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Replace PostmasterRandom() with a stronger way of generating ran

2016-10-18 Thread Michael Paquier
On Tue, Oct 18, 2016 at 12:34 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> And actually, enabling prngd would need to be controlled by a
>> configure switch as well disabled by default, no?
>
> AFAICT, openssl has no configuration options related to prngd; they
> seem to be able to use it automatically when /dev/[u]random isn't there.
> This surprises me a bit because the location of prngd's random-data socket
> is evidently variable.  I've not dug into exactly how openssl figures that
> out, but I'm sure a little quality time with the openssl sources would
> explain it.

I dug a bit into the code around RAND_egd and how it gets into
fetching a method to get random bytes but got tired of the game for
now. The man page means visibly that OpenSSL connects directly to the
daemon:
https://www.openssl.org/docs/manmaster/crypto/RAND_egd.html

By the way, after a short chat with Heikki, we can up with an extra
idea: resurrect unix_std from pgcrypto in pg_strong_random as the last
fallback method and instead of using sha1, use sha2 as SCRAM is going
to need those functions in src/common/ as well. Having a configure
switch to enable it may be a good idea.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to implement pg_current_logfile() function

2016-10-18 Thread Gilles Darold
Le 14/10/2016 à 20:45, Christoph Berg a écrit :
> Re: Gilles Darold 2016-10-14 
>> Agree, the usability of the current version is really a pain. What I've
>> thought is that the function could return the csv log in all case when
>> csvlog is set in log_destination and the stderr log file when csvlog is
>> not defined. We can also have a parametrer to ask for a specific log
>> format, like pg_current_logfile('csv') or pg_current_logfile('stderr').
> Good idea to add an optional parameter.
>
> pg_current_logfile(type text DEFAULT 'auto')
>
> pg_current_logfile()
> pg_current_logfile('stderr')
> pg_current_logfile('csvlog')
> pg_current_logfile('auto')
>
> I think I'd prefer the stderr variant by default when both variants
> are configured.
>
> Christoph
>
>

Here is the v6 of the patch, here is the description of the
pg_current_logfile() function, I have tried to keep thing as simple as
possible:

pg_current_logfile( [ destination text ] )
-

Returns the name of the current log file used by the logging collector,
as text.
Log collection must be active or the return value is undefined. When
csvlog is
used as log destination, the csv filename is returned, when it is set to
stderr,
the stderr filename is returned. When both are used, it returns the
stderr filename.

There is an optional parameter of type text to determines the log
filename to
return following the log format, values can be 'csvlog' or 'stderr'.
When the log
format asked is not used as log destination then the return value is
undefined.

Example of use when log_destination = 'csvlog,stderr'

postgres=# select pg_current_logfile();
   pg_current_logfile   
-
 pg_log/postgresql-2016-10-18_121326.log
(1 row)

postgres=# select pg_current_logfile('csvlog');

   pg_current_logfile   
-
 pg_log/postgresql-2016-10-18_121326.csv
(1 row)

postgres=# select pg_current_logfile('stderr');
   pg_current_logfile   
-
 pg_log/postgresql-2016-10-18_121326.log
(1 row)

postgres=# select pg_current_logfile('csv');
ERROR:  log format csv not supported, possible values are stderr or csvlog

postgres=# select pg_read_file(pg_current_logfile());
  
pg_read_file   
---
 2016-10-18 14:06:30.576 CEST [8113] ERROR:  invalid input syntax for
integer: "A" at character 10+
 2016-10-18 14:06:30.576 CEST [8113] STATEMENT:  select
1='A';+
 
(1 row)

I've also fixed the crash when current_logfile is removed by hand.


-- 
Gilles Darold
Consultant PostgreSQL
http://dalibo.com - http://dalibo.org

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a588350..3cd9535 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15480,6 +15480,12 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
   
 
   
+   pg_current_logfile( destination text)
+   text
+   current log file used by the logging collector
+  
+
+  
session_user
name
session user name
@@ -15686,6 +15692,23 @@ SET search_path TO schema , schema, ..
 pg_notification_queue_usage

 
+   
+pg_current_logfile
+   
+
+   
+pg_current_logfile returns the name of the
+current log file used by the logging collector, as text.
+Log collection must be active or the return value is undefined. When
+csvlog is used as log destination, the csv filename is returned, when
+it is set to stderr, the stderr filename is returned. When both are
+used, it returns the stderr filename.
+There is an optional parameter of type text to determines
+the log filename to return following the log destination, values can
+be 'csvlog' or 'stderr'. When the log format asked is not used as log
+destination then the return value is undefined.
+   
+

 pg_listening_channels returns a set of names of
 asynchronous notification channels that the current session is listening
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 1b812bd..8f909a7 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -170,6 +170,13 @@ last started with
   (this file is not present after server shutdown)
 
 
+
+ pg_log_file
+ A file recording the current log file(s) used by the syslogger
+  when log collection is active (this file is not present when logging_collector is not activated)
+
+
+
 
 
 
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index fd62d66..7b09349 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -146,6 +146,7 @@ static char 

Re: [HACKERS] Parallel Index Scans

2016-10-18 Thread Amit Kapila
On Tue, Oct 18, 2016 at 4:08 PM, Rahila Syed  wrote:
>>Another point which needs some thoughts is whether it is good idea to
>>use index relation size to calculate parallel workers for index scan.
>>I think ideally for index scans it should be based on number of pages
>>to be fetched/scanned from index.
> IIUC, its not possible to know the exact number of pages scanned from an
> index
> in advance.

We can't find the exact numbers of index pages to be scanned, but I
think we can find estimated number of pages to be fetched (refer
cost_index).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2016-10-18 Thread Rushabh Lathia
Thanks Amit for reviewing this patch.

On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila 
wrote:

> On Wed, Oct 5, 2016 at 11:35 AM, Rushabh Lathia
>  wrote:
> > Hi hackers,
> >
> > Attached is the patch to implement Gather Merge.
> >
>
> Couple of review comments:
>
> 1.
> ExecGatherMerge()
> {
> ..
> + /* No workers? Then never mind. */
> + if (!got_any_worker
> ||
> + node->nreaders < 2)
> + {
> +
> ExecShutdownGatherMergeWorkers(node);
> + node->nreaders = 0;
> +
> }
>
> Are you planning to restrict the use of gather merge based on number
> of workers, if there is a valid reason, then I think comments should
> be updated for same?
>
>
Thanks for catching this. This is left over from the earlier design patch.
With
current design we don't have any limitation for the number of worker . I did
the performance testing with setting max_parallel_workers_per_gather to 1
and didn't noticed any performance degradation. So I removed this limitation
into attached patch.

2.
> +ExecGatherMerge(GatherMergeState * node){
> ..
> + if (!node->initialized)
> + {
> + EState   *estate = node->ps.state;
> +
> GatherMerge *gm = (GatherMerge *) node->ps.plan;
> +
> + /*
> + * Sometimes we
> might have to run without parallelism; but if parallel
> + * mode is active then we can try to
> fire up some workers.
> + */
> + if (gm->num_workers > 0 && IsInParallelMode())
> +
> {
> + ParallelContext *pcxt;
> + bool got_any_worker =
> false;
> +
> + /* Initialize the workers required to execute Gather node. */
> +
> if (!node->pei)
> + node->pei = ExecInitParallelPlan(node-
> >ps.lefttree,
> +
>  estate,
> +
>  gm->num_workers);
> ..
> }
>
> There is lot of common code between ExecGatherMerge and ExecGather.
> Do you think it makes sense to have a common function to avoid the
> duplicity?
>
> I see there are small discrepancies in both the codes like I don't see
> the use of single_copy flag, as it is present in gather node.
>
>
Yes, even I thought to centrilize some things of ExecGather and
ExecGatherMerge,
but its really not something that is fixed. And I thought it might change
particularly
for the Gather Merge. And as explained by Robert single_copy doesn't make
sense
for the Gather Merge. I will still look into this to see if something can
be make
centralize.


> 3.
> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool
> force)
> {
> ..
> + tup = gm_readnext_tuple(gm_state, reader, force, NULL);
> +
> + /*
> +
>  * try to read more tuple into nowait mode and store it into the tuple
> + * array.
> +
>  */
> + if (HeapTupleIsValid(tup))
> + fill_tuple_array(gm_state, reader);
>
> How the above read tuple is stored in array?  In anycase the above
> interface seems slightly awkward to me. Basically, I think what you
> are trying to do here is after reading first tuple in waitmode, you
> are trying to fill the array by reading more tuples.  So, can't we
> push reading of this fist tuple into that function and name it as
> form_tuple_array().
>
>
Yes, you are right. First its trying to read tuple into wait-mode, and once
it
find tuple then its try to fill the tuple array (which basically try to
read tuple
into nowait-mode). Reason I keep it separate is because in case of
initializing
the gather merge, if we unable to read tuple from all the worker - while
trying
re-read, we again try to fill the tuple array for the worker who already
produced
atleast a single tuple (see gather_merge_init() for more details). Also I
thought
filling tuple array (which basically read tuple into nowait mode) and
reading tuple
(into wait-mode) are two separate task - and if its into separate function
that code
look more clear. If you have any suggestion for the function name
(fill_tuple_array)
then I am open to change that.



> 4.
> +create_gather_merge_path(..)
> {
> ..
> +  0 /* FIXME: pathnode->limit_tuples*/);
>
> What exactly you want to fix in above code?
>
>
Fixed.


> 5.
>  +/* Tuple array size */
> +#define MAX_TUPLE_STORE 10
>
> Have you tried with other values of MAX_TUPLE_STORE?  If yes, then
> what are the results?  I think it is better to add a comment why array
> size is best for performance.
>
>

Actually I was thinking on that, but I don't wanted to add their because
its just
performance number on my machine. Anyway I added the comments.


>
> 6.
> +/* INTERFACE ROUTINES
> + * ExecInitGatherMerge - initialize the MergeAppend
> node
> + * ExecGatherMerge - retrieve the next tuple from the node
> + *
> ExecEndGatherMerge - shut down the MergeAppend node
> + *
> ExecReScanGatherMerge - rescan the MergeAppend node
>
> typo.  /MergeAppend/GatherMerge
>
>
Fixed.


> 7.
> +static TupleTableSlot *gather_merge_getnext(GatherMergeState * gm_state);
> +static HeapTuple
> gm_readnext_tuple(GatherMergeState * gm_state, int nreader, bool
> force, bool *done);
>
> Formatting near GatherMergeState doesn't seem to be appropriate.  I
> think you need to add GatherMergeState in typedefs.list and 

Re: [HACKERS] Parallel Index Scans

2016-10-18 Thread Rahila Syed
>Another point which needs some thoughts is whether it is good idea to
>use index relation size to calculate parallel workers for index scan.
>I think ideally for index scans it should be based on number of pages
>to be fetched/scanned from index.
IIUC, its not possible to know the exact number of pages scanned from an
index
in advance.
What we are essentially making parallel is the scan of the leaf pages.
So it will make sense to have the number of workers based on number of leaf
pages.
Having said that, I think it will not make much difference as compared to
existing method because
currently total index pages are used to calculate the number of workers. As
far as I understand,in large indexes, the difference between
number of leaf pages and total pages is not significant. In other words,
internal pages form a small fraction of total pages.
Also the calculation is based on log of number of pages so it will make
even lesser difference.

Thank you,
Rahila Syed






On Tue, Oct 18, 2016 at 8:38 AM, Amit Kapila 
wrote:

> On Thu, Oct 13, 2016 at 8:48 AM, Amit Kapila 
> wrote:
> > As of now, the driving table for parallel query is accessed by
> > parallel sequential scan which limits its usage to a certain degree.
> > Parallelising index scans would further increase the usage of parallel
> > query in many more cases.  This patch enables the parallelism for the
> > btree scans.  Supporting parallel index scan for other index types
> > like hash, gist, spgist can be done as separate patches.
> >
>
> I would like to have an input on the method of selecting parallel
> workers for scanning index.  Currently the patch selects number of
> workers based on size of index relation and the upper limit of
> parallel workers is max_parallel_workers_per_gather.  This is quite
> similar to what we do for parallel sequential scan except for the fact
> that in parallel seq. scan, we use the parallel_workers option if
> provided by user during Create Table.  User can provide
> parallel_workers option as below:
>
> Create Table  With (parallel_workers = 4);
>
> Is it desirable to have similar option for parallel index scans, if
> yes then what should be the interface for same?  One possible way
> could be to allow user to provide it during Create Index as below:
>
> Create Index  With (parallel_workers = 4);
>
> If above syntax looks sensible, then we might need to think what
> should be used for parallel index build.  It seems to me that parallel
> tuple sort patch [1] proposed by Peter G. is using above syntax for
> getting the parallel workers input from user for parallel index
> builds.
>
> Another point which needs some thoughts is whether it is good idea to
> use index relation size to calculate parallel workers for index scan.
> I think ideally for index scans it should be based on number of pages
> to be fetched/scanned from index.
>
>
> [1] - https://www.postgresql.org/message-id/CAM3SWZTmkOFEiCDpUNaO4n9-
> 1xcmWP-1NXmT7h0Pu3gM2YuHvg%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] [PATCH] Better logging of COPY queries if log_statement='all'

2016-10-18 Thread Aleksander Alekseev
> > According to my colleagues it would be very nice to have this feature.
> > For instance, if you are trying to optimize PostgreSQL for application
> > that uses COPY and you don't have access to or something like this. 
> > It could also be useful in some other cases.
> 
> This use-case doesn't really make much sense to me.  Can you explain it
> in more detail?  Is the goal here to replicate all of the statements
> that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Hash Indexes

2016-10-18 Thread Amit Kapila
On Wed, Oct 5, 2016 at 10:22 PM, Robert Haas  wrote:
> On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila  wrote:
>> I think one way to avoid the risk of deadlock in above scenario is to
>> take the cleanup lock conditionally, if we get the cleanup lock then
>> we will delete the items as we are doing in patch now, else it will
>> just mark the tuples as dead and ensure that it won't try to remove
>> tuples that are moved-by-split.  Now, I think the question is how will
>> these dead tuples be removed.  We anyway need a separate mechanism to
>> clear dead tuples for hash indexes as during scans we are marking the
>> tuples as dead if corresponding tuple in heap is dead which are not
>> removed later.  This is already taken care in btree code via
>> kill_prior_tuple optimization.  So I think clearing of dead tuples can
>> be handled by a separate patch.
>
> That seems like it could work.
>

I have implemented this idea and it works for MVCC scans.  However, I
think this might not work for non-MVCC scans.  Consider a case where
in Process-1, hash scan has returned one row and before it could check
it's validity in heap, vacuum marks that tuple as dead and removed the
entry from heap and some new tuple has been placed at that offset in
heap.  Now when Process-1 checks the validity in heap, it will check
for different tuple then what the index tuple was suppose to check.
If we want, we can make it work similar to what btree does as being
discussed on thread [1], but for that we need to introduce page-scan
mode as well in hash indexes.   However, do we really want to solve
this problem as part of this patch when this exists for other index am
as well?


[1]  - 
https://www.postgresql.org/message-id/CACjxUsNtBXe1OfRp%3DacB%2B8QFAVWJ%3Dnr55_HMmqQYceCzVGF4tQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Vladimir Gordiychuk
>What platform does the postgres server run on?

Ubuntu
OS name: "linux", version: "3.19.0-66-generic", arch: "amd64", family:
"unix"

2016-10-18 11:05 GMT+03:00 Magnus Hagander :

>
>
> On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
>> Hi,
>>
>> In pgjdbc we have regular regression testing against "build from
>> master" PostgreSQL, and recent master builds fail for "statement cancel"
>> test.
>>
>> The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine,
>> then "statement cancel" started to fail.
>> The test executes "select pg_sleep(10)" and tries to cancel it. In recent
>> master builds, cancel seems to be ignored, and the statement lasts for 10
>> seconds.
>>
>> Exactly the same driver and test version works fine for 8.4, 9.1, 9.2,
>> 9.3, 9.4, 9.5, and 9.6:
>> Here's a sample build report: https://travis-ci.org/
>> pgjdbc/pgjdbc/builds/168444341
>>
>> Any hints what could be the issue?
>> Was the breakage intentional?
>>
>>
> My guess is it's related to this: https://git.postgresql.org/
> gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e
> 258ff3a77c
>
> That's certainly not intended to break things, but that was changed on Oct
> 17 and it relates to cancel keys.
>
> What platform does the postgres server run on? Can can you check if query
> cancel works on libpq or if it's completely broken?
>
>
> --
>  Magnus Hagander
>  Me: http://www.hagander.net/
>  Work: http://www.redpill-linpro.com/
>


Re: [HACKERS] Batches, error handling and transaction in the protocol

2016-10-18 Thread Shay Rojansky
> More generally speaking, the protocol appears to couple two different
things which may be unrelated. On the one hand, we have a protocol
> sync mechanism for error recovery (skip until Sync). One the other hand,
we have an implicit transaction for extended query messages until
> that same Sync. It seems valid to want to have error recovery without an
implicit transaction, but this doesn't seem supported by the current
> protocol (I could add a note for v4).

In the absence of any response on my message from September 28th, I've
added a todo item for wire protocol v4 (separate transaction delineation
from protocol error recovery).

Note that the same issue was discussed with Craig Ringer in
https://www.postgresql.org/message-id/CAMsr%2BYEgnJ8ZAWPLx5%3DBCbYYq9SNTdwbwvUcb7V-vYm5d5uhbQ%40mail.gmail.com

On Wed, Sep 28, 2016 at 6:04 PM, Shay Rojansky  wrote:

> Hi everyone, I'd appreciate some guidance on an issue that's been raised
> with Npgsql, input from other driver writers would be especially helpful.
>
> Npgsql currently supports batching (or pipelining) to avoid roundtrips,
> and sends a Sync message only at the end of the batch (so
> Parse1/Bind1/Describe1/Execute1/Parse2/Bind2/Describe2/Execute2/Sync).
> The reasoning is that if the first statement in the batch fails, the others
> shouldn't be processed. This seems to be the standard approach (the
> proposed patch for libpq seems to do the same).
>
> At the same time, if the batch doesn't occur within an explicit
> transaction (i.e. after BEGIN), it is automatically wrapped in an implicit
> transaction, with Sync committing it. This can, for example, provoke
> deadlocks if two batches try to update the same rows in reverse order. The
> problem is that the user didn't request a transaction in any way - they're
> just using batching to avoid roundtrips and their intention is to be in
> autocommit mode.
>
> One possible solution for this would be to insert a Sync after every
> execute in the batch, rather than a single Sync at the very end. This would
> make batches work the same as unbatched statements, and would resolve the
> deadlocks. However, behavior in case of error would be problematic:
> PostgreSQL would continue executing later messages if earlier ones failed,
> Npgsql would have to deal with multiple errors, etc.
>
> More generally speaking, the protocol appears to couple two different
> things which may be unrelated. On the one hand, we have a protocol sync
> mechanism for error recovery (skip until Sync). One the other hand, we have
> an implicit transaction for extended query messages until that same Sync.
> It seems valid to want to have error recovery without an implicit
> transaction, but this doesn't seem supported by the current protocol (I
> could add a note for v4).
>
> Finally, to give more context, a Microsoft developer ran into this while
> running ASP.NET benchmarks over Npgsql and its Entity Framework Core ORM
> provider. One of EFCore's great new features is that it batches database
> updates into a single roundtrip, but this triggered deadlocks. Whereas in
> many cases it's OK to tell users to solve the deadlocks by properly
> ordering their statements, when an ORM is creating the batch it's a more
> difficult proposition.
>
> Thanks for any thoughts or guidance!
>
> Shay
>


Re: [HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Magnus Hagander
On Tue, Oct 18, 2016 at 1:00 AM, Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Hi,
>
> In pgjdbc we have regular regression testing against "build from
> master" PostgreSQL, and recent master builds fail for "statement cancel"
> test.
>
> The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine,
> then "statement cancel" started to fail.
> The test executes "select pg_sleep(10)" and tries to cancel it. In recent
> master builds, cancel seems to be ignored, and the statement lasts for 10
> seconds.
>
> Exactly the same driver and test version works fine for 8.4, 9.1, 9.2,
> 9.3, 9.4, 9.5, and 9.6:
> Here's a sample build report: https://travis-ci.org/
> pgjdbc/pgjdbc/builds/168444341
>
> Any hints what could be the issue?
> Was the breakage intentional?
>
>
My guess is it's related to this:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9e083fd4683294f41544e6d0d72f6e258ff3a77c

That's certainly not intended to break things, but that was changed on Oct
17 and it relates to cancel keys.

What platform does the postgres server run on? Can can you check if query
cancel works on libpq or if it's completely broken?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


[HACKERS] Query cancel seems to be broken in master since Oct 17

2016-10-18 Thread Vladimir Sitnikov
Hi,

In pgjdbc we have regular regression testing against "build from
master" PostgreSQL, and recent master builds fail for "statement cancel"
test.

The PostgreSQL as of Mon Oct 17 00:09:39 UTC 2016 was fine, then "statement
cancel" started to fail.
The test executes "select pg_sleep(10)" and tries to cancel it. In recent
master builds, cancel seems to be ignored, and the statement lasts for 10
seconds.

Exactly the same driver and test version works fine for 8.4, 9.1, 9.2, 9.3,
9.4, 9.5, and 9.6:
Here's a sample build report:
https://travis-ci.org/pgjdbc/pgjdbc/builds/168444341

Any hints what could be the issue?
Was the breakage intentional?

Vladimir


[HACKERS] simplehash.h typo fix

2016-10-18 Thread Erik Rijkers

it's -> its
the the -> the--- ./src/include/lib/simplehash.h.orig	2016-10-18 09:31:22.712028458 +0200
+++ ./src/include/lib/simplehash.h	2016-10-18 09:37:00.050970588 +0200
@@ -256,7 +256,7 @@
 	return curelem;
 }
 
-/* return distance between bucket and it's optimal position */
+/* return distance between bucket and its optimal position */
 static inline uint32
 SH_DISTANCE_FROM_OPTIMAL(SH_TYPE *tb, uint32 optimal, uint32 bucket)
 {
@@ -349,7 +349,7 @@
 	 *
 	 * To be able to simply move entries over, we have to start not at the
 	 * first bucket (i.e olddata[0]), but find the first bucket that's either
-	 * empty, or is occupied by an entry at it's optimal position. Such a
+	 * empty, or is occupied by an entry at its optimal position. Such a
 	 * bucket has to exist in any table with a load factor under 1, as not all
 	 * buckets are occupied, i.e. there always has to be an empty bucket.  By
 	 * starting at such a bucket we can move the entries to the larger table,
@@ -485,9 +485,9 @@
 		/*
 		 * If the bucket is not empty, we either found a match (in which case
 		 * we're done), or we have to decide whether to skip over or move the
-		 * colliding entry. When the the colliding elements distance to it's
+		 * colliding entry. When the colliding elements distance to its
 		 * optimal position is smaller than the to-be-inserted entry's, we
-		 * shift the colliding entry (and it's followers) forward by one.
+		 * shift the colliding entry (and its followers) forward by one.
 		 */
 
 		if (SH_COMPARE_KEYS(tb, hash, key, entry))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers