Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Amit Kapila
On Thu, Mar 9, 2017 at 3:11 AM, Robert Haas  wrote:
> On Tue, Mar 7, 2017 at 6:41 PM, Robert Haas  wrote:
 Great, thanks.  0001 looks good to me now, so committed.
>>>
>>> Committed 0002.
>>
>> Here are some initial review thoughts on 0003 based on a first read-through.
>
> More thoughts on the main patch:
>
> +mode anyway).  It would seem natural to complete the split in VACUUM, but 
> since
> +splitting a bucket might require allocating a new page, it might fail if you
> +run out of disk space.  That would be bad during VACUUM - the reason for
> +running VACUUM in the first place might be that you run out of disk space,
> +and now VACUUM won't finish because you're out of disk space.  In contrast,
> +an insertion can require enlarging the physical file anyway.
>
> But VACUUM actually does try to complete the splits, so this is wrong, I 
> think.
>

Vacuum just tries to clean the remaining tuples from old bucket.  Only
insert or split operation tries to complete the split.

>
> +if (!xlrec.is_primary_bucket_page)
> +XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD);
>
> So, in hashbucketcleanup, the page is registered and therefore an FPI
> will be taken if this is the first reference to this page since the
> checkpoint, but hash_xlog_delete won't restore the FPI.  I think that
> exposes us to a torn-page hazard.
> _hash_freeovflpage/hash_xlog_squeeze_page seems to have the same
> disease, as does _hash_squeezebucket/hash_xlog_move_page_contents.
>

Right, if we use XLogReadBufferForRedoExtended() instead of
XLogReadBufferExtended()/LockBufferForCleanup during relay routine,
then it will restore FPI when required and can serve our purpose of
acquiring clean up lock on primary bucket page.  Yet another way could
be to store some information like block number, relfilenode, forknum
(maybe we can get away with some less info) of primary bucket in the
fixed part of each of these records and then using that read the page
and then take a cleanup lock.   Now, as in most cases, this buffer
needs to be registered only in cases when there are multiple overflow
pages, I think having fixed information might hurt in some cases.

>
> +/*
> + * As bitmap page doesn't have standard page layout, so this will
> + * allow us to log the data.
> + */
> +XLogRegisterBuffer(2, mapbuf, REGBUF_STANDARD);
> +XLogRegisterBufData(2, (char *) _page_bit, 
> sizeof(uint32));
>
> If the page doesn't have a standard page layout, why are we passing
> REGBUF_STANDARD?  But I think you've hacked things so that bitmap
> pages DO have a standard page layout, per the change to
> _hash_initbitmapbuffer, in which case the comment seems wrong.
>

Yes, the comment is obsolete and I have removed it.  We need standard
page layout for bitmap page, so that full page writes won't exclude
the bitmappage data.

>
> +/*
> + * Note that we still update the page even if it was restored from a full
> + * page image, because the bucket flag is not included in the image.
> + */
>
> Really?  Why not?  (Because it's part of the special space?)
>

Yes, because it's part of special space.  Do you want that to
mentioned in comments?

> Doesn't hash_xlog_split_allocpage need to acquire cleanup locks to
> guard against concurrent scans?
>

I don't think so, because regular code takes it on old bucket to
protect the split from concurrent inserts and cleanup lock on new
bucket is just for the sake of consistency.

> And also hash_xlog_split_complete?

In regular code, we are not taking cleanup lock for this operation.

> And hash_xlog_delete?  If the regular code path is getting a cleanup
> lock to protect against concurrent scans, recovery better do it, too.
>

We are already taking cleanup lock for this, not sure what exactly is
missed here, can you be more specific?

> +/*
> + * There is no harm in releasing the lock on old bucket before new bucket
> + * during replay as no other bucket splits can be happening concurrently.
> + */
> +if (BufferIsValid(oldbuf))
> +UnlockReleaseBuffer(oldbuf);
>
> And I'm not sure this is safe either.  The issue isn't that there
> might be another bucket split happening concurrently, but that there
> might be scans that get confused by seeing the bucket split flag set
> before the new bucket is created or the metapage updated.
>

During scan we check for bucket being populated, so this should be safe.

>  Seems like
> it would be safer to keep all the locks for this one.
>

If you feel better that way, I can change it and add a comment saying
something like "we could release lock on bucket being split early as
well, but doing here to be consistent with normal operation"

>
> It seems like a good test to do with this patch would be to set up a
> pgbench test on the master with a hash index replacing the usual btree
> index.  Then, set 

Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Fujii Masao
On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
 wrote:
> Hello,
>
> dblink fails to close the unnamed connection as follows when a new unnamed 
> connection is requested.  The attached patch fixes this.

This issue was reported about ten years ago and added as TODO item.
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php

I agree that this is a bug, and am tempted to back-patch to all the supported
versions. But it had not been fixed in many years since the first report of
the issue. So I'm not sure if it's ok to just treat this as a bug right now and
back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
more opinions about this.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev
Hi, Andres

Thanks a lot for the review!

> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?

I don't think we can do that. According to comments:

```
 * NOTE: once allocated, TabStatusArray structures are never moved or deleted
 [...]
 * This allows relcache pgstat_info pointers to be treated as long-lived data,
 * avoiding repeated searches in pgstat_initstats() when a relation is
 * repeatedly opened during a transaction.
```

It is my understanding that dynahash can't give same guarantees.

> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.

Agree, fixed to 'O(1) ... lookup'.

> It's not really freed...

Right, it's actually zeroed. Fixed.

> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.

Good point. Fixed.

> Think it'd be better to move the hash creation into its own function.

Agree, fixed.

On Mon, Mar 06, 2017 at 12:01:50PM -0800, Andres Freund wrote:
> Hi,
> 
> This issue has bothered me in non-partitioned use-cases recently, so
> thanks for taking it up.
> 
> 
> On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote:
> > diff --git a/src/backend/postmaster/pgstat.c 
> > b/src/backend/postmaster/pgstat.c
> > index 2fb9a8bf58..fa906e7930 100644
> > --- a/src/backend/postmaster/pgstat.c
> > +++ b/src/backend/postmaster/pgstat.c
> > @@ -160,6 +160,16 @@ typedef struct TabStatusArray
> >  
> >  static TabStatusArray *pgStatTabList = NULL;
> 
> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?
> 
> I think a separate list that only contains things that are "active" in
> the current transaction makes sense, because the current logic requires
> iterating over a potentially very large array at transaction commit.
> 
> 
> > +/* pgStatTabHash entry */
> > +typedef struct TabStatHashEntry
> > +{
> > +   Oid t_id;
> > +   PgStat_TableStatus* tsa_entry;
> > +} TabStatHashEntry;
> > +
> > +/* Hash table for faster t_id -> tsa_entry lookup */
> > +static HTAB *pgStatTabHash = NULL;
> > +
> 
> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.
> 
> 
> 
> >  /*
> >   * Backends store per-function info that's waiting to be sent to the 
> > collector
> >   * in this hash table (indexed by function OID).
> > @@ -815,7 +825,13 @@ pgstat_report_stat(bool force)
> > pgstat_send_tabstat(this_msg);
> > this_msg->m_nentries = 0;
> > }
> > +
> > +   /*
> > +* Entry will be freed soon so we need to remove it 
> > from the lookup table.
> > +*/
> > +   hash_search(pgStatTabHash, >t_id, HASH_REMOVE, 
> > NULL);
> > }
> 
> It's not really freed...
> 
> 
> >  static PgStat_TableStatus *
> >  get_tabstat_entry(Oid rel_id, bool isshared)
> >  {
> > +   TabStatHashEntry* hash_entry;
> > PgStat_TableStatus *entry;
> > TabStatusArray *tsa;
> > -   TabStatusArray *prev_tsa;
> > -   int i;
> > +
> > +   /* Try to find an entry */
> > +   entry = find_tabstat_entry(rel_id);
> > +   if(entry != NULL)
> > +   return entry;
> 
> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.
> 
> 
> > /*
> > -* Search the already-used tabstat slots for this relation.
> > +* Entry doesn't exist - creating one.
> > +* First make sure there is a free space in a last element of 
> > pgStatTabList.
> >  */
> > -   prev_tsa = NULL;
> > -   for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = 
> > tsa->tsa_next)
> > +   if (!pgStatTabList)
> > {
> > -   for (i = 0; i < tsa->tsa_used; i++)
> > -   {
> > -   entry = >tsa_entries[i];
> > -   if (entry->t_id == rel_id)
> > -   return entry;
> > -   }
> > +   HASHCTL ctl;
> >  
> > -   if (tsa->tsa_used < TABSTAT_QUANTUM)
> > +   Assert(!pgStatTabHash);
> > +
> > +   memset(, 0, sizeof(ctl));
> > +   ctl.keysize = sizeof(Oid);
> > +   ctl.entrysize = sizeof(TabStatHashEntry);
> > +   ctl.hcxt = TopMemoryContext;
> > +
> > +   pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup 
> > table",
> > +   TABSTAT_QUANTUM, , HASH_ELEM | HASH_BLOBS | 
> > HASH_CONTEXT);
> 
> Think it'd be better to move the hash creation into its own function.
> 
> 
> - Andres

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c 

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia  wrote:
> Thanks Robert for committing this.
>
> My colleague Neha Sharma found one regression with the patch. I was about
> to send this mail and noticed that you committed the patch.

Oops.  Bad timing.

> postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
> by aid;
> ERROR:  ORDER/GROUP BY expression not found in targetlist

I think your fix for this looks right, although I would write it this way:

-gm_plan->plan.targetlist = subplan->targetlist;
+gm_plan->plan.targetlist = build_path_tlist(root, _path->path);

The second part of your fix looks wrong.  I think you want this:

 create_gather_merge_path(root,
  grouped_rel,
  subpath,
- NULL,
+ partial_grouping_target,
  root->group_pathkeys,
  NULL,
  _groups);

That will match the create_gather_path case.

This test case is still failing for me even with those fixes:

rhaas=# select aid+1 from pgbench_accounts group by aid+1;
ERROR:  could not find pathkey item to sort

So evidently there is at least one more bug here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-09 Thread Masahiko Sawada
On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
>>> While I can't see this explained anywhere, I'm
>>> pretty sure that that's supposed to be impossible, which this patch
>>> changes.
>>>
>>
>> What makes you think that patch will allow pg_class.relfrozenxid to be
>> advanced past opaque->btpo.xact which was previously not possible?
>
> By not reliably recycling pages in a timely manner, we won't change
> anything about the dead page. It just sticks around. This is mostly
> fine, but we still need VACUUM to be able to reason about it (to
> determine if it is in fact recyclable), which is what I'm concerned
> about breaking here. It still needs to be *possible* to recycle any
> recyclable page at some point (e.g., when we find it convenient).
>
> pg_class.relfrozenxid is InvalidTransactionId for indexes because
> indexes generally don't store XIDs. This is the one exception that I'm
> aware of, presumably justified by the fact that it's only for
> recyclable pages anyway, and those are currently *guaranteed* to get
> recycled quickly. In particular, they're guaranteed to get recycled by
> the next VACUUM. They may be recycled in the course of anti-wraparound
> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
> case that this patch proposes to have us skip touching indexes for.
>

To prevent this, I think we need to not skip the lazy_cleanup_index
when ant-wraparound vacuum (aggressive = true) even if the number of
scanned pages is less then the threshold. This can ensure that
pg_class.relfrozenxid is not advanced past opaque->bp.xact with
minimal pain. Even if the btree dead page is leaved, the subsequent
modification makes garbage on heap and then autovauum recycles that
page while index vacuuming(lazy_index_vacuum).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar
On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
> I slightly modified your query to reproduce this issue.
>
> explain analyze select * from r1 where value<555;
>
> Patch is attached to fix the problem.

I forgot to mention the cause of the problem.

if (istate->schunkptr < istate->nchunks)
{
   PagetableEntry *chunk = [idxchunks[istate->schunkptr]];
PagetableEntry *page = [idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

In above if condition we have only checked istate->schunkptr <
istate->nchunks that means we have some chunk left so we are safe to
access idxchunks,  But just after that we are accessing
ptbase[idxpages[istate->spageptr]] without checking that accessing
idxpages is safe or not.

tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Amit Kapila
On Thu, Mar 9, 2017 at 12:25 AM, Robert Haas  wrote:
> On Wed, Mar 8, 2017 at 7:45 AM, Amit Kapila  wrote:
>> Okay, I can try, but note that currently there is no test related to
>> "snapshot too old" for any other indexes.
>
> Wow, that's surprising.  It seems the snapshot_too_old test only
> checks that this works for a table that has no indexes.  Have you,
> anyway, tested it manually?
>

Yes, I have tested in manually.  I think we need to ensure that the
modified tuple falls on the same page as old tuple to make the test
work.  The slight difficulty with the index is to ensure the modified
tuple to be inserted into same page as old tuple, this is more true
with hash indexes.  Also, for heap, I think it relies on hot pruning
stuff and for index we need to perform manual vacuum.  Basically, if
we want we can write a test for index, but not sure if it is worth the
pain to write for hash index when the test for btree is not there.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar
On Thu, Mar 9, 2017 at 9:17 PM, David Rowley
 wrote:
> patch with [1]
>
> =# create table r1(value int);
> CREATE TABLE
> =# insert into r1 select (random()*1000)::int from
> generate_Series(1,100);
> INSERT 0 100
> =# create index on r1 using brin(value);
> CREATE INDEX
> =# set enable_seqscan=0;
> SET
> =# explain select * from r1 where value=555;

I am looking into the issue, I have already reproduced it.  I will
update on this soon.

Thanks for reporting.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev
Hi Amit,

> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.

OK, I'll see what could be done here as well then.

On Tue, Mar 07, 2017 at 10:55:12AM +0900, Amit Langote wrote:
> Hi Aleksander,
> 
> On 2017/03/07 0:22, Aleksander Alekseev wrote:
> > Hello.
> > 
> > OK, here is a patch.
> > 
> > Benchmark, before:
> > 
> > ```
> > number of transactions actually processed: 1823
> > latency average = 1153.495 ms
> > latency stddev = 154.366 ms
> > tps = 6.061104 (including connections establishing)
> > tps = 6.061211 (excluding connections establishing)
> > ```
> > 
> > Benchmark, after:
> > 
> > ```
> > number of transactions actually processed: 2462
> > latency average = 853.862 ms
> > latency stddev = 112.270 ms
> > tps = 8.191861 (including connections establishing)
> > tps = 8.192028 (excluding connections establishing)
> > ```
> > 
> > +35% TPS, just as expected. Feel free to run your own benchmarks on
> > different datasets and workloads. `perf top` shows that first bottleneck
> > is completely eliminated.
> 
> That seems like a good gain.
> 
> > I did nothing about the second bottleneck
> > since as Amit mentioned partition-pruning should solve this anyway and
> > apparently any micro-optimizations don't worth an effort.
> 
> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.
> 
> Thanks,
> Amit
> 
> 

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Mark Dilger

> On Mar 9, 2017, at 7:59 AM, Peter Eisentraut 
>  wrote:
> 
> On 2/28/17 22:15, Peter Eisentraut wrote:
>> The SQL standard defines a separate error code for nextval exhausting
>> the sequence space.  I haven't found any discussion of this in the
>> archives, so it seems this was just not considered or not yet in
>> existence when the error codes were introduced.  Here is a patch to
>> correct it.
> 
> committed

Perhaps you should add something to the release notes.  Somebody could be
testing for the old error code.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar  wrote:
> On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
>> I slightly modified your query to reproduce this issue.
>>
>> explain analyze select * from r1 where value<555;
>>
>> Patch is attached to fix the problem.
>
> I forgot to mention the cause of the problem.
>
> if (istate->schunkptr < istate->nchunks)
> {
>PagetableEntry *chunk = [idxchunks[istate->schunkptr]];
> PagetableEntry *page = [idxpages[istate->spageptr]];
> BlockNumber chunk_blockno;
>
> In above if condition we have only checked istate->schunkptr <
> istate->nchunks that means we have some chunk left so we are safe to
> access idxchunks,  But just after that we are accessing
> ptbase[idxpages[istate->spageptr]] without checking that accessing
> idxpages is safe or not.
>
> tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

I don't know if this is the only problem -- it would be good if David
could retest -- but it's certainly *a* problem, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane
Stephen Frost  writes:
> We have already figured out that the user's predicate results in a
> subset of the index's or we wouldn't be able to use that index though,
> right?  Do we really need to spend cycles re-discovering that?  Are
> there cases where we actually need the index's predicate to ever be
> included for correctness..?

In a bitmap indexscan, yes, because the entire point of the recheck
condition is that we're going to scan a whole page and possibly see
tuples that don't satisfy the index predicate at all.  Another point
that's mentioned in the comments in createplan.c is that if you're
considering the result of a BitmapOr or BitmapAnd, there's not necessarily
a single index involved so it's much harder to reason about which part
of the qual is an index predicate.

I do notice that createplan.c makes some effort to get rid of filter
conditions that are provably true when the index conditions are.
Doesn't look like it considers the reverse direction.  Not sure if
that's missing a bet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread David Rowley
I was just doing some testing on [1] when I noticed that there's a problem
with parallel bitmap index scans scans.

Test case:

patch with [1]

=# create table r1(value int);
CREATE TABLE
=# insert into r1 select (random()*1000)::int from
generate_Series(1,100);
INSERT 0 100
=# create index on r1 using brin(value);
CREATE INDEX
=# set enable_seqscan=0;
SET
=# explain select * from r1 where value=555;
   QUERY PLAN

-
 Gather  (cost=3623.52..11267.45 rows=5000 width=4)
   Workers Planned: 2
   ->  Parallel Bitmap Heap Scan on r1  (cost=2623.52..9767.45 rows=2083
width=4)
 Recheck Cond: (value = 555)
 ->  Bitmap Index Scan on r1_value_idx  (cost=0.00..2622.27
rows=522036 width=0)
   Index Cond: (value = 555)
(6 rows)

=# explain analyze select * from r1 where value=555;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The crash occurs in tbm_shared_iterate() at:

PagetableEntry *page = [idxpages[istate->spageptr]];


I see in tbm_prepare_shared_iterate() tbm->npages is zero. I'm unsure if
bringetbitmap() does something different with npages than btgetbitmap()
around setting npages?

But anyway, due to the npages being 0 the tbm->ptpages is not allocated
in tbm_prepare_shared_iterate()

if (tbm->npages)
{
tbm->ptpages = dsa_allocate(tbm->dsa, sizeof(PTIterationArray) +
tbm->npages * sizeof(int));

so when tbm_shared_iterate runs this code;

/*
* If both chunk and per-page data remain, must output the numerically
* earlier page.
*/
if (istate->schunkptr < istate->nchunks)
{
PagetableEntry *chunk = [idxchunks[istate->schunkptr]];
PagetableEntry *page = [idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

chunk_blockno = chunk->blockno + istate->schunkbit;

if (istate->spageptr >= istate->npages ||
chunk_blockno < page->blockno)
{
/* Return a lossy page indicator from the chunk */
output->blockno = chunk_blockno;
output->ntuples = -1;
output->recheck = true;
istate->schunkbit++;

LWLockRelease(>lock);
return output;
}
}

it fails, due to idxpages pointing to random memory

Probably this is a simple fix for the authors, so passing it along. I'm a
bit unable to see how the part above is meant to work.

[1]
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Gather Merge

2017-03-09 Thread Rushabh Lathia
On Thu, Mar 9, 2017 at 9:42 PM, Robert Haas  wrote:

> On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia 
> wrote:
> > Thanks Robert for committing this.
> >
> > My colleague Neha Sharma found one regression with the patch. I was about
> > to send this mail and noticed that you committed the patch.
>
> Oops.  Bad timing.
>
> > postgres=# explain select aid from pgbench_accounts where aid % 25= 0
> group
> > by aid;
> > ERROR:  ORDER/GROUP BY expression not found in targetlist
>
> I think your fix for this looks right, although I would write it this way:
>
> -gm_plan->plan.targetlist = subplan->targetlist;
> +gm_plan->plan.targetlist = build_path_tlist(root, _path->path);
>
> The second part of your fix looks wrong.  I think you want this:
>
>  create_gather_merge_path(root,
>   grouped_rel,
>   subpath,
> - NULL,
> + partial_grouping_target,
>   root->group_pathkeys,
>   NULL,
>   _groups);
>
> That will match the create_gather_path case.
>
>
Right, I did that change and perform the test with the fix and I don't
see any regression now.


> This test case is still failing for me even with those fixes:
>
> rhaas=# select aid+1 from pgbench_accounts group by aid+1;
> ERROR:  could not find pathkey item to sort
>
>
I don't see this failure with the patch. Even I forced the gather merge
in the above query and that just working fine.

Attaching patch, with the discussed changes.



Thanks,
Rushabh Lathia


fix_target_gm_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar
On Thu, Mar 9, 2017 at 9:37 PM, Dilip Kumar  wrote:
>> =# create table r1(value int);
>> CREATE TABLE
>> =# insert into r1 select (random()*1000)::int from
>> generate_Series(1,100);
>> INSERT 0 100
>> =# create index on r1 using brin(value);
>> CREATE INDEX
>> =# set enable_seqscan=0;
>> SET
>> =# explain select * from r1 where value=555;
>
> I am looking into the issue, I have already reproduced it.  I will
> update on this soon.
>
> Thanks for reporting.

I slightly modified your query to reproduce this issue.

explain analyze select * from r1 where value<555;

Patch is attached to fix the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


parallel_bitmap_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Joe Conway
On 03/09/2017 07:54 AM, Tom Lane wrote:
> Fujii Masao  writes:
>> On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
>>  wrote:
>>> dblink fails to close the unnamed connection as follows when a new unnamed 
>>> connection is requested.  The attached patch fixes this.
> 
>> This issue was reported about ten years ago and added as TODO item.
>> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php
> 
>> I agree that this is a bug, and am tempted to back-patch to all the supported
>> versions. But it had not been fixed in many years since the first report of
>> the issue. So I'm not sure if it's ok to just treat this as a bug right now 
>> and
>> back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
>> more opinions about this.
> 
> It looks to me like the issue simply fell through the cracks because Joe
> wasn't excited about fixing it.  Now that we have a second complaint,
> I think it's worth treating as a bug and back-patching.
> 
> (I've not read this particular patch and am not expressing an opinion
> whether it's correct.)

Ok, will take another look.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 7:47 AM, Naytro Naytro  wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.
>
> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe
>
> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Our latch implementation did get overhauled pretty thoroughly in 9.6;
see primarily commit 98a64d0bd713cb89e61bef6432befc4b7b5da59e.  But I
can't guess what is going wrong here based on this information.  It
might help if you can pull some stack backtraces from the startup
process.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Peter Eisentraut
On 2/28/17 22:15, Peter Eisentraut wrote:
> The SQL standard defines a separate error code for nextval exhausting
> the sequence space.  I haven't found any discussion of this in the
> archives, so it seems this was just not considered or not yet in
> existence when the error codes were introduced.  Here is a patch to
> correct it.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
 wrote:
> I don't see this failure with the patch. Even I forced the gather merge
> in the above query and that just working fine.
>
> Attaching patch, with the discussed changes.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-03-09 Thread Pavel Stehule
2017-03-09 13:18 GMT+01:00 Surafel Temesgen :

> Hi ,
>
> Here is a patch corrected as your feedback except missed tests case
> because corresponding by clause is implemented on the top of set operation
> and you can’t do that to set operation without corresponding by clause too
>

I don't understand.

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30
as b, 40 as c;

ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...


Corresponding clause should to work like projection filter.

Regards

Pavel


>
> Eg
>
>
> postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>
> ERROR:  each UNION query must have the same number of columns
>
> LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>
>   ^
>
> postgres=# create table t1(a int, b int, c int);
>
> CREATE TABLE
>
> postgres=# create table t2(a int, b int);
>
> CREATE TABLE
>
>
>
> postgres=# select * from t1 union select * from t2;
>
> ERROR:  each UNION query must have the same number of columns
>
> LINE 1: select * from t1 union select * from t2;
>
>
>
> Thanks
>
> Surafel
>
> On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> I am sending a review of this interesting feature.
>>
>> I found following issues, questions:
>>
>> 1. unclosed tags  in documentation
>> 2. bad name "changeTargetEntry" - should be makeTargetEntry?
>> 3. Why you removed lot of asserts in prepunion.c? These asserts should be
>> valid still
>> 4. make_coresponding_target has wrong formatting
>> 5. error "%s queries with a CORRESPONDING clause must have at least one
>> column with the same name" has wrong formatting, you can show position
>> 6. previous issue is repeated - look on formatting ereport function,
>> please, you can use DETAIL and HINT fields
>> 7. corresponding clause should to contain column list (I am looking to
>> ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it
>> has impact on all implementation.
>> 8. typo orderCorrespondingLsit(List *targetlist)
>> 9. I miss more tests for CORRESPONDING BY
>> 10. if I understand to this feature, this query should to work
>>
>> postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 
>> b, 6 c, 8 d;
>> ERROR:  each UNION query must have the same number of columns
>> LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...
>>
>> postgres=# create table t1(a int, b int, c int);
>> CREATE TABLE
>> Time: 63,260 ms
>> postgres=# create table t2(a int, b int);
>> CREATE TABLE
>> Time: 57,120 ms
>> postgres=# select * from t1 union corresponding select * from t2;
>> ERROR:  each UNION query must have the same number of columns
>> LINE 1: select * from t1 union corresponding select * from t2;
>>
>> If it is your first patch to Postgres, then it is perfect work!
>>
>> The @7 is probably most significant - I dislike a expression list there.
>> name_list should be better there.
>>
>> Regards
>>
>> Pavel
>>
>>
>


Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane
Stephen Frost  writes:
> Perhaps I'm missing something obvious, but isn't it a bit redundant to
> have both a Recheck condition (which is the predicate of the index) and
> a Filter condition (which is the user's predicate) when we've already
> decided that the user's predicate must result in a subset of the
> index's, as, otherwise, we wouldn't be able to use the index in the
> first place?

Yeah, I think this is just something that the planner doesn't see fit
to expend cycles on detecting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Perhaps I'm missing something obvious, but isn't it a bit redundant to
> > have both a Recheck condition (which is the predicate of the index) and
> > a Filter condition (which is the user's predicate) when we've already
> > decided that the user's predicate must result in a subset of the
> > index's, as, otherwise, we wouldn't be able to use the index in the
> > first place?
> 
> Yeah, I think this is just something that the planner doesn't see fit
> to expend cycles on detecting.

We have already figured out that the user's predicate results in a
subset of the index's or we wouldn't be able to use that index though,
right?  Do we really need to spend cycles re-discovering that?  Are
there cases where we actually need the index's predicate to ever be
included for correctness..?

This seems like we're going out of our way to add in an additional check
for something that we've already determined must always be true and that
strikes me as odd.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Tom Lane
Fujii Masao  writes:
> On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
>  wrote:
>> dblink fails to close the unnamed connection as follows when a new unnamed 
>> connection is requested.  The attached patch fixes this.

> This issue was reported about ten years ago and added as TODO item.
> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php

> I agree that this is a bug, and am tempted to back-patch to all the supported
> versions. But it had not been fixed in many years since the first report of
> the issue. So I'm not sure if it's ok to just treat this as a bug right now 
> and
> back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
> more opinions about this.

It looks to me like the issue simply fell through the cracks because Joe
wasn't excited about fixing it.  Now that we have a second complaint,
I think it's worth treating as a bug and back-patching.

(I've not read this particular patch and am not expressing an opinion
whether it's correct.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost
Greetings,

Consider this:

create table t10 (c1 int, c2 int);
create index on t10 (c1) where c2 > 5;

\d t10
  Table "sfrost.t10"
 Column |  Type   | Modifiers 
+-+---
 c1 | integer | 
 c2 | integer | 
Indexes:
"t10_c1_idx" btree (c1) WHERE c2 > 5

insert into t10 select * from generate_series(1,1) a, generate_series(1,10) 
b;
(repeat a bunch of times, if desired)

vacuum analyze t10;
set work_mem = '64kB';
set enable_indexscan = false;
set enable_seqscan = false;

=*> explain analyze select * from t10 where c2 > 6;
QUERY PLAN  
  
--
 Bitmap Heap Scan on t10  (cost=6496.49..15037.50 rows=318653 width=8) (actual 
time=34.682..116.236 rows=32 loops=1)
   Recheck Cond: (c2 > 5)
   Rows Removed by Index Recheck: 327502
   Filter: (c2 > 6)
   Rows Removed by Filter: 8
   Heap Blocks: exact=642 lossy=2898
   ->  Bitmap Index Scan on t10_c1_idx  (cost=0.00..6416.83 rows=400081 
width=0) (actual time=34.601..34.601 rows=40 loops=1)
 Planning time: 0.087 ms
 Execution time: 124.229 ms
(9 rows)

Perhaps I'm missing something obvious, but isn't it a bit redundant to
have both a Recheck condition (which is the predicate of the index) and
a Filter condition (which is the user's predicate) when we've already
decided that the user's predicate must result in a subset of the
index's, as, otherwise, we wouldn't be able to use the index in the
first place?

In other words, it seems like we shouldn't need a Filter in the above
Bitmap Heap Scan, instead we should just make the Recheck be (c2 > 6).

I've not looked into the code side of this at all and there may be
reasons why this is hard to do, but it seems like a worthwhile
improvement to consider doing, though perhaps I'm missing some reason
why we need both the Recheck and the Filter in such cases for
correctness.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2017-03-09 Thread Jim Nasby

On 3/8/17 9:34 AM, Andreas Karlsson wrote:

Also, if by any chance you think (or use any
software that thinks) that OIDs for system objects are a stable
identifier, this will be the first case where that ceases to be true.
If the system is shut down or crashes or the session is killed, you'll
be left with stray objects with names that you've never typed into the
system.  I'm sure you're going to say "don't worry, none of that is
any big deal" and maybe you're right.


Hm, I cannot think of any real life scenario where this will be an issue
based on my personal experience with PostgreSQL, but if you can think of
one please provide it. I will try to ponder some more on this myself.


The case I currently have is to allow tracking database objects similar 
to (but not the same) as how we track the objects that belong to an 
extension[1]. That currently depends on event triggers to keep names 
updated if they're changed, as well as making use of the reg* types. If 
an event trigger fired as part of the index rename (essentially treating 
it like an ALTER INDEX) then I should be able to work around that.


The ultimate reason for doing this is to provide something similar to 
extensions (create a bunch of database objects that are all bound 
together), but also similar to classes in OO languages (so you can have 
multiple instances).[2]


Admittedly, this is pretty off the beaten path and I certainly wouldn't 
hold up the patch because of it. I am hoping that it'd be fairly easy to 
fire an event trigger as if someone had just renamed the index.


1: https://github.com/decibel/object_reference
2: https://github.com/decibel/pg_classy
--
Jim Nasby, Chief Data Architect, OpenSCG
http://OpenSCG.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-09 Thread Peter Geoghegan
On Thu, Mar 9, 2017 at 8:17 PM, Bruce Momjian  wrote:
>> In my opinion, we expose query id (and dbid, and userid) as the
>> canonical identifier for each pg_stat_statements entry, and have done
>> so for some time. That's the stable API -- not query text. I'm aware
>> of cases where query text was used as an identifier, but that ended up
>> being hashed anyway.
>
> Speaking of hash values for queries, someone once asked me if we could
> display a hash value for queries displayed in pg_stat_activity and
> pg_stat_statements so they could take a running query and look in
> pg_stat_statements to see how long is usually ran.  It seemed like a
> useful idea to me.

I agree.

> I don't think they can hash the query manually because of the constants
> involved.

It would be a matter of having postgres expose Query.queryId (or the
similar field in PlannedStmt, I suppose). Morally, that field belongs
to pg_stat_statements, since it was written to make the query
normalization stuff work, and because everything would break if
another extension attempted to use it as pg_stat_statements does.
Whether or not that makes it okay to expose the hash value in
pg_stat_activity like that is above my pay grade, as Tom would say.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins

2017-03-09 Thread Ashutosh Bapat
>>
>> The new name merge_fdw_options() is shorter than the one I chose, but
>> we are not exactly merging options for an upper relation since there
>> isn't the other fpinfo to merge from. But I think we can live with
>> merge_fdw_options().
>
> Perhaps "combine" is a better word? I didn't really see a problem with
> that. After I posted this I wished I'd made the function even more
> generic to accept either side as NULL and make up the new fpinfo from
> the non-NULL one, or Merge if both were non-NULL.  I liked that way
> much better than giving the function too much knowledge of what its
> purpose actually is. It's more likely to get used for something else
> in the future, which means there's less chance that someone else will
> make the same mistake.

It's more like copy for an upper rel and merge for a join rel. Your
point about making it side-agnostic is good but I doubt if we want to
spend more effort there. If somebody writes a code with the input plan
stuffed in the innerrel instead of the outerrel, s/he will notice it
immediately when testing as assertion would fail or there will be a
straight segfault. We will decide what to do then.

>
>> Once I fixed that, the testcases started showing an assertion failure,
>> since fpinfo of a base relation can not have an outerrel. Fixed the
>> assertion as well. If we are passing fpinfo's of joining relations,
>> there's no need to have outerrel and innerrel in fpinfo of join.
>
> Looks like you forgot to update the comment on the Assert()

Yes and I also forgot to update the function prologue to refer to the
fpinfo_o/i instead of inner and outer relations. Attached patch
corrects it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


foreign_outerjoin_pushdown_fix_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-09 Thread Ashutosh Bapat
On Thu, Mar 9, 2017 at 6:28 PM, Robert Haas  wrote:
> On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
>  wrote:
>>>
>>> +if (rel->partial_pathlist != NIL &&
>>> +(Path *) linitial(rel->partial_pathlist) == subpath)
>>> +partial_subplans_set = bms_add_member(partial_subplans_set, i);
>>>
>>> This seems like a scary way to figure this out.  What if we wanted to
>>> build a parallel append subpath with some path other than the
>>> cheapest, for some reason?  I think you ought to record the decision
>>> that set_append_rel_pathlist makes about whether to use a partial path
>>> or a parallel-safe path, and then just copy it over here.
>>
>> I agree that assuming that a subpath is non-partial path if it's not
>> cheapest of the partial paths is risky. In fact, we can not assume
>> that even when it's not one of the partial_paths since it could have
>> been kicked out or was never added to the partial path list like
>> reparameterized path. But if we have to save the information about
>> which of the subpaths are partial paths and which are not in
>> AppendPath, it would take some memory, noticeable for thousands of
>> partitions, which will leak if the path doesn't make into the
>> rel->pathlist.
>
> True, but that's no different from the situation for any other Path
> node that has substructure.  For example, an IndexPath has no fewer
> than 5 list pointers in it.  Generally we assume that the number of
> paths won't be large enough for the memory used to really matter, and
> I think that will also be true here.  And an AppendPath has a list of
> subpaths, and if I'm not mistaken, those list nodes consume more
> memory than the tracking information we're thinking about here will.
>

What I have observed is that we try to keep the memory usage to a
minimum, trying to avoid memory consumption as much as possible. Most
of that substructure gets absorbed by the planner or is shared across
paths. Append path lists are an exception to that, but we need
something to hold all subpaths together and list is PostgreSQL's way
of doing it. So, that's kind of unavoidable. And may be we will find
some reason for almost every substructure in paths.

> I think you're thinking about this issue because you've been working
> on partitionwise join where memory consumption is a big issue, but
> there are a lot of cases where that isn't really a big deal.

:).

>
>> The purpose of that information is to make sure that we
>> allocate only one worker to that plan. I suggested that we use
>> path->parallel_workers for the same, but it seems that's not
>> guaranteed to be reliable. The reasons were discussed upthread. Is
>> there any way to infer whether we can allocate more than one workers
>> to a plan by looking at the corresponding path?
>
> I think it would be smarter to track it some other way.  Either keep
> two lists of paths, one of which is the partial paths and the other of
> which is the parallel-safe paths, or keep a bitmapset indicating which
> paths fall into which category.

I like two lists: it consumes almost no memory (two list headers
instead of one) compared to non-parallel-append when there are
non-partial paths and what more, it consumes no extra memory when all
paths are partial.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Upgrading postmaster's log messages about bind/listen errors

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 9, 2017 at 4:01 PM, Joe Conway  wrote:
>> On 03/09/2017 12:27 PM, Tom Lane wrote:
>>> For good measure I also added a DEBUG1 log message reporting successful
>>> binding to a port.  I'm not sure if there's an argument for putting this
>>> out at LOG level (i.e. by default) --- any thoughts about that?

>> +1 for making it LOG instead of DEBUG1

> I would tend to vote against that, because startup is getting
> gradually chattier and chattier, and I think this isn't likely to be
> of interest to very many people most of the time.

Yeah, my thought was that if we've gotten along without this for 20 years,
it's probably not of interest to most people most of the time.

However, if we're measuring this on a scale of usefulness to the average
DBA, I would argue that it's of more interest than any of these messages
that currently appear by default:

2017-03-09 23:40:12.334 EST [19335] LOG:  MultiXact member wraparound 
protections are now enabled
2017-03-09 23:40:12.335 EST [19339] LOG:  autovacuum launcher started
2017-03-09 23:40:12.336 EST [19341] LOG:  logical replication launcher started

The first of those is surely past its sell-by date.  As for the other two,
we should log *failure* to start, but not the normal case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Petr Jelinek
On 09/03/17 18:46, Peter Eisentraut wrote:
> On 3/6/17 05:27, Petr Jelinek wrote:
>> updated and rebased version of the patch attached.
> 
> Some comments on this patch:
> 
> Can we have a better explanation of different snapshot options in
> CREATE_REPLICATION_SLOT.  We use all these variants in different
> places, but it's not fully documented why.  Think about interested
> users reading this.
> 

I am not quite sure what/where do you want me to expand the docs, the
protocol doc explains what the options do, are you missing reasoning for
why we use them in the calling code?

> 
> errmsg("subscription table %u in subscription %u does not exist",
> 
> Use names instead of IDs if possible.
> 

Well, this message is more or less equal to cache lookup failure,
problem is that neither relation nor subscription are guaranteed to
exist if this fails. I can write code that spits two different messages
depending of they do, not quite sure if it's worth it there though.

> 
> +   libpqrcv_table_list,
> +   libpqrcv_table_info,
> +   libpqrcv_table_copy,
> 
> I don't think these functions belong into the WAL receiver API, since
> they are specific to this particular logical replication
> implementation.  I would just make an API function libpqrc_exec_sql()
> that runs a query, and then put the table_*() functions as wrappers
> around that into tablesync.c.
> 

Yeah I was wondering about it as well. The reason why it's in
libpqwalreciver is that if we create some libpqrc_exec_sql as you say,
we'll need code that converts the PQresult to something consumable by
backend that has no access to libpq API and that seemed like quite a bit
of boilerplate work. I really don't want to write another libpq-like or
SPI-like interface for this. Maybe it would be acceptable to return
result as tuplestore?

> 
> Not sure what the worker init stuff in ApplyLauncherShmemInit() is
> doing.  Could be commented more.
> 

Eh, I copy pasted comment there from different place, will fix.

> 
> max_sync_workers_per_subscription could be PGC_SIGHUP, I think.  And
> the minimum could be 0, to stop any syncing?
> 
> 
> pg_subscription_rel.h: I'm not sure under which circumstances we need
> to use BKI_ROWTYPE_OID.
> 
> 
> Does pg_subscription_rel need an OID column?  The index
> SubscriptionRelOidIndexId is not used anywhere.
>

Ah, leftover from the time it used dependencies for tracking.

> 
> You removed this command from the tests:
> 
> ALTER SUBSCRIPTION testsub SET PUBLICATION testpub, testpub1;
> 
> I suppose because it causes a connection.  We should have an option to
> prevent that (NOCONNECT/NOREFRESH?).

NOREFRESH makes more sense I guess.

> 
> Why was the test 'check replication origin was dropped on subscriber'
> removed?
> 

I don't know what you mean?

> 
> Attached also a small patch with some stylistic changes.
> 

Thanks.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Petr Jelinek
On 09/03/17 18:48, Peter Eisentraut wrote:
> On 3/6/17 05:27, Petr Jelinek wrote:
>> And lastly I changed the automagic around exporting, not exporting and
>> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
>> parameter for the CREATE_REPLICATION_SLOT command (and added simple
>> framework for adding more of those if needed in the future).
> 
> It might be useful to make this into a separate patch, for clarity.
> 

Yes, already working on that (at least part of it), since it's useful
for other patches in CF.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] allow referring to functions without arguments when unique

2017-03-09 Thread Peter Eisentraut
On 3/7/17 00:32, Michael Paquier wrote:
>> OK. After a lookup, I am just seeing opfamily, opclass missing, so
>> this patch is doing it as you describe.

I'm not sure what you mean here.

>> @@ -7198,6 +7198,33 @@ function_with_argtypes:
>> n->objargs = extractArgTypes($2);
>> $$ = n;
>> }
>> This may not have arguments listed, so is function_with_argtypes really 
>> adapted?

Well, we could do something like function_with_opt_argtypes?

>> +   /*
>> +* If no arguments were specified, the name must yield a unique 
>> candidate.
>> +*/
>> +   if (nargs == -1 && clist)
>> +   {
>> +   if (clist->next)
>> +   ereport(ERROR,
>> I would have used list_length here for clarity.

This is actually not a "List" node.

>> The comment at the top of LookupFuncName() needs a refresh. The caller
>> can as well just use a function name without arguments.

Fixed.

> =# set search_path to 'public,popo';

I think you mean

set search_path to public,popo;

> =# drop function dup2;
> ERROR:  42883: function dup2() does not exist
> LOCATION:  LookupFuncName, parse_func.c:1944
> In this case I would have expected an error telling that the name is
> ambiguous. FuncnameGetCandidates() returns an empty list.

Your example works correctly if you set the schema path correctly.
However, the error message is misleading with the parentheses.  I have
updated that to create a different error message for this case, and
added a test case.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ac83b6f71433d2101f67dd148a4af2aac78129ee Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 23:58:48 -0500
Subject: [PATCH v2] Allow referring to functions without arguments when unique

In DDL commands referring to an existing function, allow omitting the
argument list if the function name is unique in its schema, per SQL
standard.

This uses the same logic that the regproc type uses for finding
functions by name only.
---
 doc/src/sgml/ref/alter_extension.sgml   |  2 +-
 doc/src/sgml/ref/alter_function.sgml| 13 +
 doc/src/sgml/ref/alter_opfamily.sgml|  7 +++--
 doc/src/sgml/ref/comment.sgml   |  2 +-
 doc/src/sgml/ref/create_cast.sgml   |  6 ++--
 doc/src/sgml/ref/create_transform.sgml  | 12 +---
 doc/src/sgml/ref/drop_function.sgml | 35 ---
 doc/src/sgml/ref/grant.sgml |  2 +-
 doc/src/sgml/ref/revoke.sgml|  2 +-
 doc/src/sgml/ref/security_label.sgml|  2 +-
 src/backend/nodes/copyfuncs.c   |  1 +
 src/backend/nodes/equalfuncs.c  |  1 +
 src/backend/parser/gram.y   | 27 ++
 src/backend/parser/parse_func.c | 37 +++--
 src/include/nodes/parsenodes.h  |  3 ++
 src/test/regress/expected/create_function_3.out | 11 +++-
 src/test/regress/sql/create_function_3.sql  |  8 ++
 17 files changed, 143 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/ref/alter_extension.sgml b/doc/src/sgml/ref/alter_extension.sgml
index de6d6dca16..a7c0927d1c 100644
--- a/doc/src/sgml/ref/alter_extension.sgml
+++ b/doc/src/sgml/ref/alter_extension.sgml
@@ -39,7 +39,7 @@
   EVENT TRIGGER object_name |
   FOREIGN DATA WRAPPER object_name |
   FOREIGN TABLE object_name |
-  FUNCTION function_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) |
+  FUNCTION function_name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] |
   MATERIALIZED VIEW object_name |
   OPERATOR operator_name (left_type, right_type) |
   OPERATOR CLASS object_name USING index_method |
diff --git a/doc/src/sgml/ref/alter_function.sgml b/doc/src/sgml/ref/alter_function.sgml
index 0388d06b95..168eeb7c52 100644
--- a/doc/src/sgml/ref/alter_function.sgml
+++ b/doc/src/sgml/ref/alter_function.sgml
@@ -21,15 +21,15 @@
 
  
 
-ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 action [ ... ] [ RESTRICT ]
-ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 RENAME TO new_name
-ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 OWNER TO { new_owner | CURRENT_USER | SESSION_USER }
-ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 SET SCHEMA new_schema
-ALTER FUNCTION name ( [ [ argmode ] [ argname ] argtype [, ...] ] )
+ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ]
 DEPENDS ON EXTENSION extension_name
 
 where action is one 

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Kuntal Ghosh
On Fri, Mar 10, 2017 at 3:11 AM, Andres Freund  wrote:
> On 2017-03-09 16:37:29 -0500, Tom Lane wrote:
>> Robert Haas  writes:
>> > On Thu, Mar 9, 2017 at 2:30 PM, Peter Eisentraut
>> >  wrote:
>> >> In practice, I think it's common to do a quick select * from
>> >> pg_stat_activity to determine whether a database instance is in use.
>>
>> > I thought of the same kind of thing, and it was discussed upthread.
>> > There seemed to be more votes for keeping it all in one view, but that
>> > could change if more people vote.
>>
>> I've not been paying much attention to this thread, but it seems like
>> something that would help Peter's use-case and have other uses as well
>> is a new column that distinguishes different process types --- user
>> session, background worker, autovacuum worker, etc.
>
> The patches upthread add precisely such a column.
>
The patch exposes auxiliary processes, autovacuum launcher and
bgworker along with other backends in pg_stat_activity. It also adds
an extra column, named proc_type (suggested by Craig
and Robert), to indicate the type of process in pg_stat_activity view.
proc_type includes:

* client backend
* autovacuum launcher
* wal sender
* bgworker
* writer
* checkpointer
* wal writer
* wal receiver

Here is the present output with the relevant columns.
postgres=# SELECT wait_event_type, wait_event, state, proc_type FROM
pg_stat_activity;
 wait_event_type | wait_event  | state  |  proc_type
-+-++-
 Activity| AutoVacuumMain  | idle   | autovacuum launcher
 Activity| LogicalLauncherMain | idle   | bgworker
 Activity| WalSenderMain   | idle   | wal sender
 | | active | client backend
 Activity| BgWriterMain| idle   | writer
 Activity| CheckpointerMain| idle   | checkpointer
 Activity| WalWriterMain   | idle   | wal writer
(7 rows)


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-09 Thread Bruce Momjian
On Sat, Mar  4, 2017 at 10:52:44AM -0800, Peter Geoghegan wrote:
> On Sat, Mar 4, 2017 at 8:02 AM, Tom Lane  wrote:
> > Meh ... we've generally regretted it when we "solved" a backwards
> > compatibility problem by introducing a GUC that changes query semantics.
> > I'm inclined to think we should either do it or not.
> 
> In my opinion, we expose query id (and dbid, and userid) as the
> canonical identifier for each pg_stat_statements entry, and have done
> so for some time. That's the stable API -- not query text. I'm aware
> of cases where query text was used as an identifier, but that ended up
> being hashed anyway.

Speaking of hash values for queries, someone once asked me if we could
display a hash value for queries displayed in pg_stat_activity and
pg_stat_statements so they could take a running query and look in
pg_stat_statements to see how long is usually ran.  It seemed like a
useful idea to me.

I don't think they can hash the query manually because of the constants
involved.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical replication origin tracking fix

2017-03-09 Thread Petr Jelinek
Hi,

while discussing with Craig issues around restarting logical replication
stream related to the patch he posted [1], I realized that we track
wrong origin LSN in the logical replication apply.

We currently track commit_lsn which is *start* of commit record, what we
need to track is end_lsn which is *end* of commit record otherwise we
might request transaction that was already replayed if the subscription
instance has crashed right after commit.

Attached patch fixes that.

[1]
https://www.postgresql.org/message-id/CAMsr+YGFvikx-U_mHQ0mAzTarqvCpwzvsPKv=7mfp9scdrm...@mail.gmail.com

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


0001-Fix-remote-position-tracking-in-logical-replication.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enabling atomics on ARM64

2017-03-09 Thread Tom Lane
Andres Freund  writes:
> FWIW, Unless somebody seriously protests, I'm going to commit this soon-ish.

Sure, if you want to take responsibility for it, push away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Generic type subscripting

2017-03-09 Thread Peter Eisentraut
On 2/28/17 13:02, Dmitry Dolgov wrote:
> +
> +-- Extract value by key
> +SELECT ('{"a": 1}'::jsonb)['a'];
> +
> +-- Extract nested value by key path
> +SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
> +
> +-- Extract element by index
> +SELECT ('[1, "2", null]'::jsonb)['1'];
> +
> +-- Update value by key
> +UPDATE table_name set jsonb_field['key'] = 1;
> +
> +-- Select records using where clause with subscripting
> +SELECT * from table_name where jsonb_field['key'] = '"value"';
> +

I see a possible problem here:  This design only allows one subscripting
function.  But what you'd really want in this case is at least two: one
taking an integer type for selecting by array index, and one taking text
for selecting by field name.

I suppose that since a given value can only be either an array or an
object, there is no ambiguity, but I think this might also lose some
error checking.  It might also not work the same way for other types.

It looks like your jsonb subscripting function just returns null if it
can't find a field, which is also a bit dubious.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 9, 2017 at 9:17 PM, Tom Lane  wrote:
>> Buildfarm thinks eight wasn't enough.
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clam=2017-03-10%2002%3A00%3A01

> At first I was confused how you knew that this was the fault of this
> patch, but this seems like a pretty indicator:
> TRAP: FailedAssertion("!(curval == 0 || (curval == 0x03 && status !=
> 0x00) || curval == status)", File: "clog.c", Line: 574)

Yeah, that's what led me to blame the clog-group-update patch.

> I'm not sure whether it's related to this problem or not, but now that
> I look at it, this (preexisting) comment looks like entirely wishful
> thinking:
>  * If we update more than one xid on this page while it is being written
>  * out, we might find that some of the bits go to disk and others don't.
>  * If we are updating commits on the page with the top-level xid that
>  * could break atomicity, so we subcommit the subxids first before we mark
>  * the top-level commit.

Maybe, but that comment dates to 2008 according to git, and clam has
been, er, happy as a clam up to now.  My money is on a newly-introduced
memory-access-ordering bug.

Also, I see clam reported in green just now, so it's not 100%
reproducible :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-03-09 Thread Amit Khandekar
On 10 March 2017 at 10:13, Ashutosh Bapat
 wrote:
> On Thu, Mar 9, 2017 at 6:28 PM, Robert Haas  wrote:
>> On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
>>  wrote:

 +if (rel->partial_pathlist != NIL &&
 +(Path *) linitial(rel->partial_pathlist) == subpath)
 +partial_subplans_set = bms_add_member(partial_subplans_set, 
 i);

 This seems like a scary way to figure this out.  What if we wanted to
 build a parallel append subpath with some path other than the
 cheapest, for some reason?

Yes, there was an assumption that append subpath will be either a
cheapest non-partial path, or the cheapest (i.e. first in the list)
partial path, although in the patch there is no Asserts to make sure
that a common rule has been followed at both these places.

 I think you ought to record the decision
 that set_append_rel_pathlist makes about whether to use a partial path
 or a parallel-safe path, and then just copy it over here.
>>>
>>> I agree that assuming that a subpath is non-partial path if it's not
>>> cheapest of the partial paths is risky. In fact, we can not assume
>>> that even when it's not one of the partial_paths since it could have
>>> been kicked out or was never added to the partial path list like
>>> reparameterized path. But if we have to save the information about
>>> which of the subpaths are partial paths and which are not in
>>> AppendPath, it would take some memory, noticeable for thousands of
>>> partitions, which will leak if the path doesn't make into the
>>> rel->pathlist.
>>
>> True, but that's no different from the situation for any other Path
>> node that has substructure.  For example, an IndexPath has no fewer
>> than 5 list pointers in it.  Generally we assume that the number of
>> paths won't be large enough for the memory used to really matter, and
>> I think that will also be true here.  And an AppendPath has a list of
>> subpaths, and if I'm not mistaken, those list nodes consume more
>> memory than the tracking information we're thinking about here will.
>>
>
> What I have observed is that we try to keep the memory usage to a
> minimum, trying to avoid memory consumption as much as possible. Most
> of that substructure gets absorbed by the planner or is shared across
> paths. Append path lists are an exception to that, but we need
> something to hold all subpaths together and list is PostgreSQL's way
> of doing it. So, that's kind of unavoidable. And may be we will find
> some reason for almost every substructure in paths.
>
>> I think you're thinking about this issue because you've been working
>> on partitionwise join where memory consumption is a big issue, but
>> there are a lot of cases where that isn't really a big deal.
>
> :).
>
>>
>>> The purpose of that information is to make sure that we
>>> allocate only one worker to that plan. I suggested that we use
>>> path->parallel_workers for the same, but it seems that's not
>>> guaranteed to be reliable. The reasons were discussed upthread. Is
>>> there any way to infer whether we can allocate more than one workers
>>> to a plan by looking at the corresponding path?
>>
>> I think it would be smarter to track it some other way.  Either keep
>> two lists of paths, one of which is the partial paths and the other of
>> which is the parallel-safe paths, or keep a bitmapset indicating which
>> paths fall into which category.
>
> I like two lists: it consumes almost no memory (two list headers
> instead of one) compared to non-parallel-append when there are
> non-partial paths and what more, it consumes no extra memory when all
> paths are partial.

I agree that the two-lists approach will consume less memory than
bitmapset. Keeping two lists will effectively have an extra pointer
field which will add up to the AppendPath size, but this size will not
grow with the number of subpaths, whereas the Bitmapset will grow.

But as far as code is concerned, I think the two-list approach will
turn out to be less simple if we derive corresponding two different
arrays in AppendState node. Handling two different arrays during
execution does not look clean. Whereas, the bitmapset that I have used
in Append has turned out to be very simple. I just had to do the below
check (and that is the only location) to see if it's a partial or
non-partial subplan. There is nowhere else any special handling for
non-partial subpath.

/*
* Increment worker count for the chosen node, if at all we found one.
* For non-partial plans, set it to -1 instead, so that no other workers
* run it.
*/
if (min_whichplan != PA_INVALID_PLAN)
{
   if (bms_is_member(min_whichplan,
((Append*)state->ps.plan)->partial_subplans_set))
   padesc->pa_info[min_whichplan].pa_num_workers++;
   else
   padesc->pa_info[min_whichplan].pa_num_workers = -1;
}

Now, since Bitmapset field is used during execution 

Re: [HACKERS] PATCH: Configurable file mode mask

2017-03-09 Thread Peter Eisentraut
On 2/28/17 20:58, David Steele wrote:
> This patch introduces a new initdb param, -u/-file-mode-mask, and a new
> GUC, file_mode_mask, to allow the default mode of files and directories
> in the $PGDATA directory to be modified.

The postmaster.pid file appears not to observe the configured mask.

There ought to be a test, perhaps under src/bin/initdb/, to check for
that kind of thing.

There is no documentation update for initdb.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-09 Thread Robert Haas
On Mon, Feb 27, 2017 at 11:15 PM, Jim Nasby  wrote:
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.

It's not the most elegant thing ever, but I'm not sure I really see a
big problem with it.  Today, if the WAL file were the wrong size, we'd
just error out.  With the patch, if the WAL file were the wrong size,
but happened to be a size that we consider legal, pg_waldump would
treat it as a legal file and try to display the WAL records contained
therein.  This doesn't seem like a huge problem from her; what are you
worried about?

I agree that it would be bad if, for example, pg_resetwal saw a broken
WAL file in pg_wal and consequently did the reset incorrectly, because
the whole point of pg_resetwal is to escape situations where the
contents of pg_wal may be bogus.  However, pg_resetwal can read the
value from the control file, so the technique of believing the file
size doesn't need to be used in that case anyway.  The only tools that
need to infer the WAL size from the sizes of the segments actually
present are those that neither have a running cluster (where SHOW can
be used) nor access to the control file.  There aren't many of those,
and pg_waldump, at least, is a debugging tool anyway.  IIUC, the other
case where this comes up is for pg_standby, but if the WAL segments
aren't all the same size that tool is presumably going to croak with
or without these changes, so I'm not really sure there's much of an
issue here.

I might be missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 7:59 AM, Michael Paquier
 wrote:
> Yes, I agree with that for MD5, and after looking around I can see
> (like here http://prosody.im/doc/plain_or_hashed) as well that
> SCRAM-hashed is used. Now, there are as well references to the salt,
> like in protocol.sgml:
> "The salt to use when encrypting the password."
> Joe, do you think that in this case using the term "hashing" would be
> more appropriate? I would think so as we use it to hash the password.

I'm not Joe, but I think that would be more appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] partitioned tables and contrib/sepgsql

2017-03-09 Thread Stephen Frost
Greetings,

While going over the contrib modules, I noticed that sepgsql was not
updated for partitioned tables.  What that appears to mean is that it's
not possible to define labels on partitioned tables.  As I recall,
accessing the parent of a table will, similar to the GRANT system, not
perform checkes against the child tables, meaning that there's no way to
have SELinux checks properly enforced when partitioned tables are being
used.

This is an issue which should be resolved for PG10, so I'll add it to
the open items list.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [GSoC] Self introduction and question to mentors

2017-03-09 Thread Andrew Borodin
Hi, Kirill!

2017-03-06 12:41 GMT+05:00 Кирилл Бороздин :
> My name is Kirill Borozdin. I am a student of Ural Federal University from
> Yekaterinburg, Russia.
That's fine. I'm the associate professor at Ural Federal University.
But not in your institution, though.

> I understand that this task will require a big number of experiments because
> we want to achieve
> a speed-up in real life cases.
>
> Can you recommend me some articles or specific source files in PostgreSQL
> codebase (it is so huge!)
> which I can explore to be better prepared (in case I will be lucky enough to
> be selected for GSoC)?

Well, this project is not that specific to PostgreSQL. Everything you
need to compile and use Postgres you can find on Wiki or just ask me
directly.

For starters, I'd recommend to look up some sorting algorithms survey
papers from Google Scholar. There is a good paper by Goetz Graefe on
sorting, but it either reflects current situation that some future
algorithms :)

Something on cache-oblivious sorting and on cache
http://igoro.com/archive/gallery-of-processor-cache-effects/

Finally, papers referenced in Wiki page about GSoC for this project
are a good starting point.

As for tests, TPC is kind of ubiquitous, We could just move standard
pg_bench script towards more sorting involved.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-09 Thread Mike Palmiotto
On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
> Greetings,
>
> While going over the contrib modules, I noticed that sepgsql was not
> updated for partitioned tables.  What that appears to mean is that it's
> not possible to define labels on partitioned tables.  As I recall,
> accessing the parent of a table will, similar to the GRANT system, not
> perform checkes against the child tables, meaning that there's no way to
> have SELinux checks properly enforced when partitioned tables are being
> used.

I'll start taking a look at this. Presumably we'd just extend existing
object_access_hooks to cover partitioned tables?

>
> This is an issue which should be resolved for PG10, so I'll add it to
> the open items list.

I'll grab it. Thanks.

--Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Joe Conway
On 03/09/2017 06:34 AM, Robert Haas wrote:
> On Thu, Mar 9, 2017 at 7:59 AM, Michael Paquier
>  wrote:
>> Yes, I agree with that for MD5, and after looking around I can see
>> (like here http://prosody.im/doc/plain_or_hashed) as well that
>> SCRAM-hashed is used. Now, there are as well references to the salt,
>> like in protocol.sgml:
>> "The salt to use when encrypting the password."
>> Joe, do you think that in this case using the term "hashing" would be
>> more appropriate? I would think so as we use it to hash the password.
> 
> I'm not Joe, but I think that would be more appropriate.


I am Joe and I agree ;-)


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut
On 3/6/17 05:27, Petr Jelinek wrote:
> And lastly I changed the automagic around exporting, not exporting and
> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
> parameter for the CREATE_REPLICATION_SLOT command (and added simple
> framework for adding more of those if needed in the future).

It might be useful to make this into a separate patch, for clarity.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze

On 09.03.2017 18:58, Robert Haas wrote:

Also, even if the superset thing were true on a theoretical plane, I'm
not sure it would do us much good in practice.  If we start using
YAML-specific constructs, we won't have valid JSON any more.  If we
use only things that are legal in JSON, YAML's irrelevant.


That's true. I just wanted to share my view of the "date guessing" part 
of pgpro's commits.
I don't have a good solution for it either, I can only tell that where I 
work we do have same issues: either we guess by looking at the string 
value or we know that "this particular key" must be a date.
Unsatisfied with either solution, we tend to use YAML for our APIs if 
possible.


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation improvements for partitioning

2017-03-09 Thread Robert Haas
I think you might have the titles for 0002 and 0003 backwards.

On Fri, Mar 3, 2017 at 2:51 AM, Amit Langote
 wrote:
> 0002: some cosmetic fixes to create_table.sgml

I think this sentence may be unclear to some readers:

+ One might however want to set it for only some partitions,
+  which is possible by doing SET NOT NULL on individual
+  partitions.

I think you could replace this with something like: Even if there is
no NOT NULL constraint on the parent, such a constraint
can still be added to individual partitions, if desired; that is, the
children can disallow nulls even if the parent allows them, but not
the other way around.

> 0003: add clarification about NOT NULL constraint on partition columns in
>   alter_table.sgml

This is about list-ifying a note, but I think we should try to
de-note-ify it.  It's a giant block of text that is not obviously more
noteworthy than the surrounding text; I think  should be saved
for things that particularly need to be called out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Peter Eisentraut
On 3/8/17 04:11, Ashutosh Bapat wrote:
> The header for this table is "list of relations", so type gets
> associated with relations indicated type of relation. btree: gin as a
> type of relation doesn't sound really great. Instead we might want to
> add another column "access method" and specify the access method used
> for that relation.

I like that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

2017-03-09 Thread Robert Haas
On Sun, Feb 19, 2017 at 12:02 PM, David Christensen  wrote:
> Hi Robert, this is part of a larger patch which *does* enable the checksums 
> online; I’ve been extracting the necessary pieces out with the understanding 
> that some people thought the disable code might be useful in its own merit.  
> I can add documentation for the various states.  The CHECKSUM_REVALIDATE is 
> the only one I feel is a little wibbly-wobbly; the other states are required 
> because of the fact that enabling checksums requires distinguishing between 
> “in process of enabling” and “everything is enabled”.

Ah, sorry, I had missed that patch.

> My design notes for the patch were submitted to the list with little comment; 
> see: 
> https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>
> I have since added the WAL logging of checksum states, however I’d be glad to 
> take feedback on the other proposed approaches (particularly the system 
> catalog changes + the concept of a checksum cycle).]

I think the concept of a checksum cycle might be overkill.  It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that.  If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.

While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10.  We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category.  I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high.  I would rather wait for v11 and have a little more time to try
to get everything right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 10:23 AM, Amit Kapila  wrote:
>> +mode anyway).  It would seem natural to complete the split in VACUUM, but 
>> since
>> +splitting a bucket might require allocating a new page, it might fail if you
>> +run out of disk space.  That would be bad during VACUUM - the reason for
>> +running VACUUM in the first place might be that you run out of disk space,
>> +and now VACUUM won't finish because you're out of disk space.  In contrast,
>> +an insertion can require enlarging the physical file anyway.
>>
>> But VACUUM actually does try to complete the splits, so this is wrong, I 
>> think.
>
> Vacuum just tries to clean the remaining tuples from old bucket.  Only
> insert or split operation tries to complete the split.

Oh, right.  Sorry.

>> +if (!xlrec.is_primary_bucket_page)
>> +XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD);
>>
>> So, in hashbucketcleanup, the page is registered and therefore an FPI
>> will be taken if this is the first reference to this page since the
>> checkpoint, but hash_xlog_delete won't restore the FPI.  I think that
>> exposes us to a torn-page hazard.
>> _hash_freeovflpage/hash_xlog_squeeze_page seems to have the same
>> disease, as does _hash_squeezebucket/hash_xlog_move_page_contents.
>
> Right, if we use XLogReadBufferForRedoExtended() instead of
> XLogReadBufferExtended()/LockBufferForCleanup during relay routine,
> then it will restore FPI when required and can serve our purpose of
> acquiring clean up lock on primary bucket page.  Yet another way could
> be to store some information like block number, relfilenode, forknum
> (maybe we can get away with some less info) of primary bucket in the
> fixed part of each of these records and then using that read the page
> and then take a cleanup lock.   Now, as in most cases, this buffer
> needs to be registered only in cases when there are multiple overflow
> pages, I think having fixed information might hurt in some cases.

Just because the WAL record gets larger?  I think you could arrange to
record the data only in the cases where it is needed, but I'm also not
sure it would matter that much anyway.  Off-hand, it seems better than
having to mark the primary bucket page dirty (because you have to set
the LSN) every time any page in the chain is modified.

>> +/*
>> + * Note that we still update the page even if it was restored from a 
>> full
>> + * page image, because the bucket flag is not included in the image.
>> + */
>>
>> Really?  Why not?  (Because it's part of the special space?)
>
> Yes, because it's part of special space.  Do you want that to
> mentioned in comments?

Seems like a good idea.  Maybe just change the end to "because the
special space is not included in the image" to make it slightly more
explicit.

>> Doesn't hash_xlog_split_allocpage need to acquire cleanup locks to
>> guard against concurrent scans?
>
> I don't think so, because regular code takes it on old bucket to
> protect the split from concurrent inserts and cleanup lock on new
> bucket is just for the sake of consistency.

You might be right, but this definitely does create a state on the
standby that can't occur on the master: the bucket being split is
flagged as such, but the bucket being populated doesn't appear to
exist yet.  Maybe that's OK.  It makes me slightly nervous to have the
standby take a weaker lock than the master does, but perhaps it's
harmless.

>> And also hash_xlog_split_complete?
>
> In regular code, we are not taking cleanup lock for this operation.

You're right, sorry.

>> And hash_xlog_delete?  If the regular code path is getting a cleanup
>> lock to protect against concurrent scans, recovery better do it, too.
>>
>
> We are already taking cleanup lock for this, not sure what exactly is
> missed here, can you be more specific?

Never mind, I'm wrong.

>> +/*
>> + * There is no harm in releasing the lock on old bucket before new 
>> bucket
>> + * during replay as no other bucket splits can be happening 
>> concurrently.
>> + */
>> +if (BufferIsValid(oldbuf))
>> +UnlockReleaseBuffer(oldbuf);
>>
>> And I'm not sure this is safe either.  The issue isn't that there
>> might be another bucket split happening concurrently, but that there
>> might be scans that get confused by seeing the bucket split flag set
>> before the new bucket is created or the metapage updated.
>
> During scan we check for bucket being populated, so this should be safe.

Yeah, I guess so.  This business of the standby taking weaker locks
still seems scary to me, as in hash_xlog_split_allocpage above.

>>  Seems like
>> it would be safer to keep all the locks for this one.
>
> If you feel better that way, I can change it and add a comment saying
> something like "we could release lock on bucket being split early as
> well, but doing here to be consistent with normal operation"

I think that might not be a bad idea.

-- 
Robert 

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze

On 08.03.2017 20:48, Peter van Hardenberg wrote:

Small point of order: YAML is not strictly a super-set of JSON.


I haven't read the whole standard, but from what I can see the standard 
considers JSON an official subset of itself: 
http://www.yaml.org/spec/1.2/spec.html


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] tzdata2017a breaks timestamptz regression test

2017-03-09 Thread Tom Lane
I typically build with --with-system-tzdata, which means that any time
Red Hat sees fit to push out a new copy of the tzdata package, that's
what I'm testing against.  This morning they updated to tzdata2017a,
and I'm seeing the attached test failures.

The reason for this is that the IANA crew have really moved forward
aggressively on their project to remove invented zone abbreviations.
As stated at
http://mm.icann.org/pipermail/tz-announce/2017-February/45.html
they've removed text abbreviations for pretty much all of South
America, which is what's breaking these cases.

For the cases involving the America/Santiago zone, I'm a bit inclined
to just switch that to America/New_York, which seems much less likely
to get fooled with by IANA.  But I'm wondering if Alvaro had a specific
reason for using the Santiago zone in those test cases.

The diffs involving VET (America/Caracas) are more problematic: we're
intentionally trying to test a time-varying zone abbreviation, so we can't
just switch to a stable zone.  As far as the lines 2395..2437 hunk goes,
we could perhaps make that segment of the test print in ISO datestyle
instead of Postgres datestyle, which would force the zone to be printed in
numeric form not as an abbreviation.  That's slightly annoying, but it
doesn't really compromise what that part of the test is trying to test.
However, the very last hunk indicates that VET no longer functions as a
time-varying zone abbreviation at all, because it doesn't match any
abbreviation recorded for America/Caracas in the IANA database, so we fall
back to treating it as a simple synonym for America/Caracas (cf commit
39b691f25).  So that's now failing altogether to test what it means to.

What I propose we do about that is replace the America/Caracas test cases
with Europe/Moscow tests, moving the dates as needed to match DST
transitions from when Moscow was observing DST (pre 2011).  The comments
in the IANA files indicate that they believe the MSK/MSD abbreviations
have or had real-world usage, so they probably won't replace them with
numeric offsets.  We can hope, anyway.

regards, tom lane

*** /home/postgres/pgsql/src/test/regress/expected/timestamptz.out	Mon Jan 30 17:03:46 2017
--- /home/postgres/pgsql/src/test/regress/results/timestamptz.out	Thu Mar  9 13:35:41 2017
***
*** 1778,1796 
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33);
  make_timestamptz 
  -
!  Sun Jul 15 08:15:55.33 1973 CLT
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '+2');
  make_timestamptz 
  -
!  Sun Jul 15 02:15:55.33 1973 CLT
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '-2');
  make_timestamptz 
  -
!  Sun Jul 15 06:15:55.33 1973 CLT
  (1 row)
  
  WITH tzs (tz) AS (VALUES
--- 1778,1796 
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33);
  make_timestamptz 
  -
!  Sun Jul 15 08:15:55.33 1973 -04
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '+2');
  make_timestamptz 
  -
!  Sun Jul 15 02:15:55.33 1973 -04
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '-2');
  make_timestamptz 
  -
!  Sun Jul 15 06:15:55.33 1973 -04
  (1 row)
  
  WITH tzs (tz) AS (VALUES
***
*** 1799,1821 
  ('+10:00:1'), ('+10:00:01'),
  ('+10:00:10'))
   SELECT make_timestamptz(2010, 2, 27, 3, 45, 00, tz), tz FROM tzs;
!make_timestamptz|tz 
! ---+---
!  Fri Feb 26 23:45:00 2010 CLST | +1
!  Fri Feb 26 23:45:00 2010 CLST | +1:
!  Fri Feb 26 23:45:00 2010 CLST | +1:0
!  Fri Feb 26 23:45:00 2010 CLST | +100
!  Fri Feb 26 23:45:00 2010 CLST | +1:00
!  Fri Feb 26 23:45:00 2010 CLST | +01:00
!  Fri Feb 26 14:45:00 2010 CLST | +10
!  Fri Feb 26 14:45:00 2010 CLST | +1000
!  Fri Feb 26 14:45:00 2010 CLST | +10:
!  Fri Feb 26 14:45:00 2010 CLST | +10:0
!  Fri Feb 26 14:45:00 2010 CLST | +10:00
!  Fri Feb 26 14:45:00 2010 CLST | +10:00:
!  Fri Feb 26 14:44:59 2010 CLST | +10:00:1
!  Fri Feb 26 14:44:59 2010 CLST | +10:00:01
!  Fri Feb 26 14:44:50 2010 CLST | +10:00:10
  (15 rows)
  
  -- these should fail
--- 1799,1821 
  ('+10:00:1'), ('+10:00:01'),
  ('+10:00:10'))
   SELECT make_timestamptz(2010, 2, 27, 3, 45, 00, tz), tz FROM tzs;
!make_timestamptz   |tz 
! --+---
!  Fri Feb 26 23:45:00 2010 -03 | +1
!  Fri Feb 26 23:45:00 2010 -03 | +1:
!  Fri Feb 26 23:45:00 2010 -03 | +1:0
!  Fri Feb 26 23:45:00 2010 -03 | +100
!  Fri Feb 26 23:45:00 2010 -03 | +1:00
!  Fri Feb 26 23:45:00 2010 -03 | +01:00
!  Fri Feb 26 14:45:00 2010 -03 | +10
!  Fri Feb 26 14:45:00 2010 -03 | +1000
!  Fri 

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Peter Eisentraut
Perhaps I'm confused by the title of this thread/CF entry, but
background workers already do show up in pg_stat_activity.  (You can
verify that by testing the background sessions patch.)  So which
additional things are we aiming to see with this?

In practice, I think it's common to do a quick select * from
pg_stat_activity to determine whether a database instance is in use.
(You always see your own session, but that's easy to eyeball.)  If we
add all the various background processes by default, that will make
things harder, especially if there is no straightforward way to filter
them out.

Perhaps a pg_stat_user_* and pg_stat_system_* split like we have for
some of the statistics tables would be useful?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation improvements for partitioning

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> This is about list-ifying a note, but I think we should try to
> de-note-ify it.  It's a giant block of text that is not obviously more
> noteworthy than the surrounding text; I think  should be saved
> for things that particularly need to be called out.

Yeah.  A big problem with the markup we use, imo, is that  segments
are displayed in a way that makes them more prominent than the surrounding
text, not less so.  That doesn't really square with my intuitive view of
what a  ought to be used for; it forces it to be considered as
something only slightly less dangerous than a  or , not
as a parenthetical remark.  But that's what we have to deal with so
we should mark up our text accordingly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Robert Haas
On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Wonder if we there's an argument to be made for implementing this
>> roughly similarly to split_pathtarget_at_srf - instead of injecting a
>> ProjectSet node we'd add a FunctionScan node below a Result node.
>
> Yeah, possibly.  That would have the advantage of avoiding an ExecProject
> step when the SRFs aren't buried, which would certainly be the expected
> case.
>
> If you don't want to make ExecInitExpr responsible, then the planner would
> have to do something like split_pathtarget_at_srf anyway to decompose the
> expressions, no matter which executor representation we use.

Did we do anything about this?  Are we going to?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Peter Eisentraut
On 3/8/17 08:30, Stephen Frost wrote:
> Right, I don't think having an extra column which is going to be NULL a
> large amount of the time is good.

Note that \di already has a column "Table" that is null for something
that is not an index.  So I don't think this argument is very strong.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Peter van Hardenberg
Anecdotally, we just stored dates as strings and used a convention (key
ends in "_at", I believe) to interpret them. The lack of support for dates
in JSON is well-known, universally decried... and not a problem the
PostgreSQL community can fix.

On Thu, Mar 9, 2017 at 10:24 AM, Sven R. Kunze  wrote:

> On 09.03.2017 18:58, Robert Haas wrote:
>
>> Also, even if the superset thing were true on a theoretical plane, I'm
>> not sure it would do us much good in practice.  If we start using
>> YAML-specific constructs, we won't have valid JSON any more.  If we
>> use only things that are legal in JSON, YAML's irrelevant.
>>
>
> That's true. I just wanted to share my view of the "date guessing" part of
> pgpro's commits.
> I don't have a good solution for it either, I can only tell that where I
> work we do have same issues: either we guess by looking at the string value
> or we know that "this particular key" must be a date.
> Unsatisfied with either solution, we tend to use YAML for our APIs if
> possible.
>
>
> Regards,
> Sven
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Kuntal Ghosh
Hello Amit,

On Tue, Mar 7, 2017 at 4:24 PM, Amit Langote
 wrote:
> Hi Kuntal,
>
> Patches apply and compile fine.  Works as advertised.
>
> Some minor comments on the patches themselves.
>
Thanks for the review.

> In 0001:
>
> - * pgstat_bestart() -
> + * pgstat_procstart() -
> + *
> + *  Initialize this process's entry in the PgBackendStatus array.
> + *  Called from InitPostgres and AuxiliaryProcessMain.
>
> Not being called from AuxiliaryProcessMain().  Maybe leftover comment from
> a previous version.  Actually I see that in patch 0002, Main() functions
> of various auxiliary processes call pgstat_procstart, not
> AuxiliaryProcessMain.
>
Fixed.

> + * user-defined functions which expects ids of backends starting from
> 1 to
>
> s/expects/expect/g
>
Fixed.

> +/*
> + * AuxiliaryPidGetProc -- get PGPROC for an auxiliary process
> + * given its PID
> + *
> + * Returns NULL if not found.
> + */
> +PGPROC *
> +AuxiliaryPidGetProc(int pid)
> +{
> +PGPROC *result;
>
> Initialize to NULL so that the comment above is true. :)
>
Fixed.

> In 0002:
>
> @@ -248,6 +248,9 @@ BackgroundWriterMain(void)
>   */
>  prev_hibernate = false;
>
> +/* report walwriter process in the PgBackendStatus array */
> +pgstat_procstart();
> +
>
> s/walwriter/writer/g
Fixed.

> Patch 0004 should update monitoring.sgml.
Added.

I've attached the updated patches. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-non-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-auxiliary-processes-in-pg_stat_get_.patch
Description: binary/octet-stream


0003-Expose-stats-for-autovacuum-launcher-and-bgworker.patch
Description: binary/octet-stream


0004-Add-proc_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-09 Thread Peter Eisentraut
This is looking pretty neat.  I played around with it a bit.  There are
a couple of edge cases that you need to address, I think.

- Does not support \x

- When \pset format is rst, then \pset linestyle also shows up as
  "rst".  That is wrong.  Same for markdown.

- Broken output in tuples_only (\t) mode. (rst and markdown)

- rst: Do something about \pset title; the way it currently shows up
  appears to be invalid; could use ".. table:: title" directive

- markdown: Extra blank line between table and footer.

- markdown: We should document or comment somewhere exactly which of the
  various markdown table formats this is supposed to produce.  (Pandoc
  pipe_tables?)

- markdown: Table title needs to be after the table, like

Table: title

- markdown: Needs to escape | characters in cell contents.  (Not
  needed for rst.)  More escaping might be needed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Peter Eisentraut
On 3/9/17 12:27, Mark Dilger wrote:
> Perhaps you should add something to the release notes.  Somebody could be
> testing for the old error code.

The release notes will be written when the release is prepared.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > We have already figured out that the user's predicate results in a
> > subset of the index's or we wouldn't be able to use that index though,
> > right?  Do we really need to spend cycles re-discovering that?  Are
> > there cases where we actually need the index's predicate to ever be
> > included for correctness..?
> 
> In a bitmap indexscan, yes, because the entire point of the recheck
> condition is that we're going to scan a whole page and possibly see
> tuples that don't satisfy the index predicate at all.

I understand the point of the recheck condition being that we might see
tuples that don't satisfy either the index predicate or the user's
predicate, but that isn't the point- to even consider using that index
we must have already realized that the user's predicate will only be
satisfied in a subset of cases when the index predicate predicate will
be satisfied, making any check of the index predicate necessairly always
true.

> Another point
> that's mentioned in the comments in createplan.c is that if you're
> considering the result of a BitmapOr or BitmapAnd, there's not necessarily
> a single index involved so it's much harder to reason about which part
> of the qual is an index predicate.

We must be pulling the index's predicate to be able to put it into the
Recheck condition in the first place.  What I'm arguing is that once
we've decided that we're able to use an index X, because the values
which the user's predicate will satisfy is a subset of those which
the index's predicate will satisfy, then we can entirely forget about
the index's predicate as being redundant to the user's.

I don't see anything about a BitmapAnd or BitmapOr being relevant to
that.  We will always need to check the user's predicate against the
tuples being returned from the Bitmap Heap Scan, unless by chance it
matches the index's predicate exactly *and* we have an entirely exact
match bitmap without any lossy parts.

> I do notice that createplan.c makes some effort to get rid of filter
> conditions that are provably true when the index conditions are.
> Doesn't look like it considers the reverse direction.  Not sure if
> that's missing a bet.

That actually strikes me as a less likely condition to have in the
general case, isn't it?  Wouldn't that only happen if the index
predicate and the user predicate are identical, because otherwise we
either can't use the index or we must keep the user's predicate because
it will only be satisfied in a subset of cases?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut
On 3/6/17 05:27, Petr Jelinek wrote:
> updated and rebased version of the patch attached.

Some comments on this patch:

Can we have a better explanation of different snapshot options in
CREATE_REPLICATION_SLOT.  We use all these variants in different
places, but it's not fully documented why.  Think about interested
users reading this.


errmsg("subscription table %u in subscription %u does not exist",

Use names instead of IDs if possible.


+   libpqrcv_table_list,
+   libpqrcv_table_info,
+   libpqrcv_table_copy,

I don't think these functions belong into the WAL receiver API, since
they are specific to this particular logical replication
implementation.  I would just make an API function libpqrc_exec_sql()
that runs a query, and then put the table_*() functions as wrappers
around that into tablesync.c.


Not sure what the worker init stuff in ApplyLauncherShmemInit() is
doing.  Could be commented more.


There are a lot of places that do one of

MyLogicalRepWorker->relid == InvalidOid
OidIsValid(MyLogicalRepWorker->relid)

To check whether the current worker is a sync worker.  Wrap that into
a function.


Not a fan of adding CommandCounterIncrement() calls at the end of
functions.  In some cases, they are not necessary at all.  In cases
where they are, the CCI call should be at a higher level between two
function calls with a comment for why the call below needs to see the
changes from the call above.


The index name pg_subscription_rel_map_index/SubscriptionRelMapIndexId
doesn't seem to match existing style, since there is no "map" column.
How about pg_subscription_rel_rel_sub_index?  I see we have a
similarly named index for publications.


max_sync_workers_per_subscription could be PGC_SIGHUP, I think.  And
the minimum could be 0, to stop any syncing?


pg_subscription_rel.h: I'm not sure under which circumstances we need
to use BKI_ROWTYPE_OID.


Does pg_subscription_rel need an OID column?  The index
SubscriptionRelOidIndexId is not used anywhere.


You removed this command from the tests:

ALTER SUBSCRIPTION testsub SET PUBLICATION testpub, testpub1;

I suppose because it causes a connection.  We should have an option to
prevent that (NOCONNECT/NOREFRESH?).


Why was the test 'check replication origin was dropped on subscriber'
removed?


Attached also a small patch with some stylistic changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c6ba1eaa3c5a44e4a9f6d072cb95fcf7e68ba3d6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 08:19:06 -0500
Subject: [PATCH] fixup! Logical replication support for initial data copy

---
 doc/src/sgml/catalogs.sgml  |  9 +
 doc/src/sgml/logical-replication.sgml   | 18 +-
 doc/src/sgml/monitoring.sgml|  4 ++--
 doc/src/sgml/protocol.sgml  | 14 +++---
 doc/src/sgml/ref/alter_subscription.sgml|  8 
 doc/src/sgml/ref/create_subscription.sgml   | 13 ++---
 src/backend/catalog/pg_subscription.c   | 13 -
 src/backend/replication/logical/launcher.c  | 18 +++---
 src/backend/replication/logical/snapbuild.c |  6 +++---
 src/backend/replication/logical/worker.c|  4 ++--
 src/backend/replication/repl_gram.y |  1 +
 src/backend/replication/walsender.c |  2 +-
 src/backend/tcop/postgres.c |  8 +---
 13 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f587a08b6a..ab78585035 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -302,7 +302,7 @@ System Catalogs
 
  
   pg_subscription_rel
-  relation state mapping for subscriptions
+  relation state for subscriptions
  
 
  
@@ -6429,14 +6429,14 @@ pg_subscription_rel
 
   
The catalog pg_subscription_rel contains the
-   status for each replicated relation in each subscription.  This is a
+   state for each replicated relation in each subscription.  This is a
many-to-many mapping.
   
 
   
-   This catalog only contains tables known to subscription after running
+   This catalog only contains tables known to the subscription after running
either CREATE SUBSCRIPTION or
-   ALTER SUBSCRIPTION ... REFRESH commands.
+   ALTER SUBSCRIPTION ... REFRESH.
   
 
   
@@ -6472,6 +6472,7 @@ pg_subscription_rel Columns
   char
   
   
+   State code:
i = initialize,
d = data is being copied,
s = synchronized,
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index f75304cd91..4ec6bb49b7 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -24,8 +24,8 @@ Logical Replication
  
 
  
-  Logical replication typically starts with a snapshot of the data on
-  the publisher 

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I do notice that createplan.c makes some effort to get rid of filter
>> conditions that are provably true when the index conditions are.
>> Doesn't look like it considers the reverse direction.  Not sure if
>> that's missing a bet.

> That actually strikes me as a less likely condition to have in the
> general case, isn't it?  Wouldn't that only happen if the index
> predicate and the user predicate are identical, because otherwise we
> either can't use the index or we must keep the user's predicate because
> it will only be satisfied in a subset of cases?

Well, remember that the planner's idea of an ideal situation is to not
have any filter conditions, not to not have any index (a/k/a recheck)
conditions.  It's going to try to put as much load as it can on the index
condition side of things, and that gives rise to the need for rechecks.

It seems like there might be some mileage to be gained by reversing the
proof direction here, and having it get rid of recheck conditions that are
implied by filter conditions rather than vice versa.  I'm not quite
convinced though, and I'm also not sure how hard it would be to mechanize.
A lot of that code is shared between the bitmap and plain-indexscan cases,
which could make it tricky to not break the plain-indexscan case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze

On 08.03.2017 20:52, Magnus Hagander wrote:
On Wed, Mar 8, 2017 at 11:48 AM, Peter van Hardenberg > wrote:


Small point of order: YAML is not strictly a super-set of JSON.

Editorializing slightly, I have not seen much interest in the
world for YAML support though I'd be interested in evidence to the
contrary.


The world of configuration management seems to for some reason run off 
YAML, but that's the only places I've seen it recently (ansible, 
puppet etc).


SaltStack uses YAML for their tools, too. I personally can empathize 
with them (as a user of configuration management) about this as writing 
JSON would be nightmare with all the quoting, commas, curly braces etc. 
But that's my own preference maybe.


(Btw. does "run off" mean like or avoid? At least my dictionaries tend 
to the latter.)


That said if we're introducing something new, it's usually better to 
copy from another format than to invite your own.


From my day-to-day work I can tell, the date(time) type is the only 
missing piece of JSON to make it perfect for business applications 
(besides, maybe, a "currency" type).


Regards,
Sven


Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Stephen Frost
Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/8/17 04:11, Ashutosh Bapat wrote:
> > The header for this table is "list of relations", so type gets
> > associated with relations indicated type of relation. btree: gin as a
> > type of relation doesn't sound really great. Instead we might want to
> > add another column "access method" and specify the access method used
> > for that relation.
> 
> I like that.

When it will be NULL for all of the regular tables..?

I'd rather have the one Type column that just included the access method
when there is one to include.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Tom Lane
Mark Dilger  writes:
>> On Mar 9, 2017, at 7:59 AM, Peter Eisentraut 
>>  wrote:
>> On 2/28/17 22:15, Peter Eisentraut wrote:
>>> The SQL standard defines a separate error code for nextval exhausting
>>> the sequence space.  I haven't found any discussion of this in the
>>> archives, so it seems this was just not considered or not yet in
>>> existence when the error codes were introduced.  Here is a patch to
>>> correct it.

> Perhaps you should add something to the release notes.  Somebody could be
> testing for the old error code.

The release notes for v10 aren't going to be drafted for months yet.
When they are, hopefully the writer will notice that this should be
listed as an incompatible change.  That's not the responsibility
of this commit, although it would've been better if the commit log
entry explicitly pointed out that it's an incompatible change.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign partition DDL regression tests

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 1:19 AM, Ashutosh Bapat
 wrote:
>>> At least we need to update the documentation.
>>
>> Got a proposal?
>
> How about something like attached?

Committed with some revisions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Couple of issues with prepared FETCH commands

2017-03-09 Thread Robert Haas
On Wed, Jan 11, 2017 at 11:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 10, 2017 at 5:38 PM, Andrew Gierth
>>  wrote:
>>> But the problem that actually came up is this: if you do the PQprepare
>>> before the named cursor has actually been opened, then everything works
>>> _up until_ the first event, such as a change to search_path, that forces
>>> a revalidation; and at that point it fails with the "must not change
>>> result type" error _even if_ the cursor always has exactly the same
>>> result type.  This happens because the initial prepare actually stored
>>> NULL for plansource->resultDesc, since the cursor name wasn't found,
>>> while on the revalidate, when the cursor obviously does exist, it gets
>>> the actual result type.
>>>
>>> It seems a bit of a "gotcha" to have it fail in this case when the
>>> result type isn't actually being checked in other cases.
>
>> To me, that sounds like a bug.
>
> Yeah --- specifically, I wonder why we allow the reference to an
> unrecognized cursor name to succeed.  Or were you defining the bug
> differently?

I'm not sure whether that's a bug or not.  What I was defining as a
bug is calling a change from "we don't know what the result type will
be" to "we know that the result type will be X" as a change in the
result type.  That's really totally inaccurate.

I've never really understood errors about changing the result type.
As a user, I assumed those were unavoidable implementation artifacts,
on the theory that they were annoying and therefore the developers
would have eliminated such messages had it been practical.  As a
developer, I've never gotten around to understanding whether that
theory was correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 12:48 PM, Sven R. Kunze  wrote:
> On 08.03.2017 20:48, Peter van Hardenberg wrote:
>>
>> Small point of order: YAML is not strictly a super-set of JSON.
>
> I haven't read the whole standard, but from what I can see the standard
> considers JSON an official subset of itself:
> http://www.yaml.org/spec/1.2/spec.html

But there's apparent sophistry, like this, in that spec:

SON's RFC4627 requires that mappings keys merely “SHOULD” be unique,
while YAML insists they “MUST” be. Technically, YAML therefore
complies with the JSON spec, choosing to treat duplicates as an error.
In practice, since JSON is silent on the semantics of such duplicates,
the only portable JSON files are those with unique keys, which are
therefore valid YAML files.

I don't see how YAML can impose a stronger requirement than JSON and
yet claim to be a superset; a JSON document that doesn't meet that
requirement will be legal (if stupid) as JSON but illegal as YAML.

Also, even if the superset thing were true on a theoretical plane, I'm
not sure it would do us much good in practice.  If we start using
YAML-specific constructs, we won't have valid JSON any more.  If we
use only things that are legal in JSON, YAML's irrelevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] updating parallel.sgml

2017-03-09 Thread Robert Haas
On Wed, Mar 8, 2017 at 12:46 PM, Robert Haas  wrote:
> Here's a patch which updates parallel.sgml for the new parallel scan
> types which have recently been committed, and also expands the
> explanation of parallel joins slightly.  I hope everyone will find
> this useful in understanding what new capabilities we now have, and
> what remains to be done in the future.
>
> Barring objections, I plan to commit this tomorrow.

Committed with a fix for a typo pointed out to me off-list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Stephen Frost
* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/8/17 08:30, Stephen Frost wrote:
> > Right, I don't think having an extra column which is going to be NULL a
> > large amount of the time is good.
> 
> Note that \di already has a column "Table" that is null for something
> that is not an index.  So I don't think this argument is very strong.

That's an interesting point.

I think what I find most odd about all of this is that \dti and \dit
work at all, and give a different set of columns from \dt.  We don't
document that combining those works in \?, as far as I can see, and
other combinations don't work, just this.

In any case, I won't push very hard on this, it's useful information to
include and we should do so.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-09 Thread Robert Haas
On Wed, Jan 25, 2017 at 12:44 AM, Craig Ringer  wrote:
>> The way that SetTransactionIdLimit() now works looks a bit dangerous.
>> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
>> passed-in oldestXid value and written straight into shared memory.
>> But the shared memory copy of oldestXid could have a different value.
>> I'm not sure if that breaks anything, but it certainly weakens any
>> confidence callers might have had that all those values are consistent
>> with each other.
>
> This was my main hesitation with the whole thing too.
>
> It's necessary to advance oldestXmin before we xlog the advance and
> truncate clog, and necessary to advance the vacuum limits only
> afterwards.

Well, that's why I tried to advocate that their should be two separate
XID limits in shared memory: leave what's there now the way it is, and
then add a new limit for "don't try to look up XIDs before this point:
splat".  I still think that'd be less risky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Andres Freund
Hi,

On 2017-03-09 13:47:35 +0100, Naytro Naytro wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
> 
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
> 
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.

Which node is on which of those, and where is the high load?


> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).

Hm. Just to clarify: The load is on the *receiving* side, in the startup
process?  Because the load doesn't quite look that way...


> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> 
> Descriptor 6 is a pipe

That's presumably a latches internal pipe.  Could you redo that
truss/strace with timestamps attached?  Does truss show signals
received? The above profile would e.g. make a lot more sense if not.  Is
the wal receiver sending signals?


> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

It was "just" moved (and expanded), used to be at
src/backend/port/unix_latch.c.

There normally shouldn't be that much "latch traffic" in the startup
process, we'd expect to block from within WaitForWALToBecomeAvailable().

Hm.  Any chance you've configured a recovery_min_apply_delay?  Although
I'd expect more timestamp calls in that case.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Stephen Frost
Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
> >> If you don't want to make ExecInitExpr responsible, then the planner would
> >> have to do something like split_pathtarget_at_srf anyway to decompose the
> >> expressions, no matter which executor representation we use.
> 
> > Did we do anything about this?  Are we going to?
> 
> No, and I think we should.  Is it on the v10 open items list?

Wasn't, I've added it now:

https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze

On 09.03.2017 19:50, Peter van Hardenberg wrote:
Anecdotally, we just stored dates as strings and used a convention 
(key ends in "_at", I believe) to interpret them. The lack of support 
for dates in JSON is well-known, universally decried... and not a 
problem the PostgreSQL community can fix.


I completely agree here.

Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bytea_output vs make installcheck

2017-03-09 Thread Peter Eisentraut
On 2/14/17 16:50, Jeff Janes wrote:
> make installcheck currently fails against a server running
> with bytea_output = escape.
> 
> Making it succeed is fairly easy, and I think it is worth doing.
> 
> Attached are two options for doing that.  One overrides bytea_output
> locally where-ever needed, and the other overrides it for the entire
> 'regression' database.

I would use option 2 here (ALTER DATABASE) and be done with it.  Some
people didn't like using ALTER DATABASE, but it's consistent with
existing use.  If someone wants to change that, that can be independent
of this issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Enabling atomics on ARM64

2017-03-09 Thread Robert Haas
On Wed, Mar 8, 2017 at 10:18 PM, Roman Shaposhnik  wrote:
> I'd like to offer a forward port from a change I'm contributing
> the Greenplum code base over here:
> https://github.com/greenplum-db/gpdb/pull/1983

Thanks.   Please add this to https://commitfest.postgresql.org/14/ so
it doesn't get forgotten.  I'm afraid your patch is probably too late
for v10 (the deadline for patches to be considered for v10 was March
1st, though somebody might propose to make an exception), but v11 will
be here soon enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CORRESPONDING clause design

2017-03-09 Thread Pavel Stehule
hi

2017-03-09 17:19 GMT+01:00 Pavel Stehule :

>
>
> 2017-03-09 13:18 GMT+01:00 Surafel Temesgen :
>
>> Hi ,
>>
>> Here is a patch corrected as your feedback except missed tests case
>> because corresponding by clause is implemented on the top of set operation
>> and you can’t do that to set operation without corresponding by clause too
>>
>
> I don't understand.
>
> The following statement should to work
>
> postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30
> as b, 40 as c;
>
> ERROR:  each UNION query must have the same number of columns
> LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...
>
>
> Corresponding clause should to work like projection filter.
>

I found a link to postgresql mailing list related to some older try to this
feature implementation


https://www.postgresql.org/message-id/cajzswkx7c6wmfo9py4baf8vhz_ofko3afsosjpsb17rgmgb...@mail.gmail.com

> Regards
>
> Pavel
>
>
>>
>> Eg
>>
>>
>> postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>>
>> ERROR:  each UNION query must have the same number of columns
>>
>> LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>>
>>   ^
>>
>> postgres=# create table t1(a int, b int, c int);
>>
>> CREATE TABLE
>>
>> postgres=# create table t2(a int, b int);
>>
>> CREATE TABLE
>>
>>
>>
>> postgres=# select * from t1 union select * from t2;
>>
>> ERROR:  each UNION query must have the same number of columns
>>
>> LINE 1: select * from t1 union select * from t2;
>>
>>
>>
>> Thanks
>>
>> Surafel
>>
>> On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> I am sending a review of this interesting feature.
>>>
>>> I found following issues, questions:
>>>
>>> 1. unclosed tags  in documentation
>>> 2. bad name "changeTargetEntry" - should be makeTargetEntry?
>>> 3. Why you removed lot of asserts in prepunion.c? These asserts should
>>> be valid still
>>> 4. make_coresponding_target has wrong formatting
>>> 5. error "%s queries with a CORRESPONDING clause must have at least one
>>> column with the same name" has wrong formatting, you can show position
>>> 6. previous issue is repeated - look on formatting ereport function,
>>> please, you can use DETAIL and HINT fields
>>> 7. corresponding clause should to contain column list (I am looking to
>>> ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it
>>> has impact on all implementation.
>>> 8. typo orderCorrespondingLsit(List *targetlist)
>>> 9. I miss more tests for CORRESPONDING BY
>>> 10. if I understand to this feature, this query should to work
>>>
>>> postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 
>>> b, 6 c, 8 d;
>>> ERROR:  each UNION query must have the same number of columns
>>> LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...
>>>
>>> postgres=# create table t1(a int, b int, c int);
>>> CREATE TABLE
>>> Time: 63,260 ms
>>> postgres=# create table t2(a int, b int);
>>> CREATE TABLE
>>> Time: 57,120 ms
>>> postgres=# select * from t1 union corresponding select * from t2;
>>> ERROR:  each UNION query must have the same number of columns
>>> LINE 1: select * from t1 union corresponding select * from t2;
>>>
>>> If it is your first patch to Postgres, then it is perfect work!
>>>
>>> The @7 is probably most significant - I dislike a expression list there.
>>> name_list should be better there.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>
>


Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management

2017-03-09 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:23 PM, Robert Haas  wrote:
> I think something like 0007-hj-shared-buf-file-v6.patch from
> https://www.postgresql.org/message-id/CAEepm=3g=dMG+84083fkFzLvgMJ7HdhbGB=aezabnukbzm3...@mail.gmail.com
> is probably a good approach to this problem.  In essence, it dodges
> the problem of trying to transfer ownership by making ownership be
> common from the beginning.  That's what I've been recommending, or
> trying to, anyway.

I think that I'm already doing that, to at least the same extent that
0007-hj-shared-buf-file-v6.patch is.

That patch seems to be solving the problem by completely taking over
management of temp files from fd.c. That is, these temp files are not
marked as temp files in the way ordinary temp BufFiles are, with
explicit buy-in from resowner.c about their temp-ness. It adds
"#include "catalog/pg_tablespace.h" to buffile.c, even though that
kind of thing generally lives within fd.c. It does use
PG_TEMP_FILE_PREFIX, but never manages to use temp_tablespaces if
that's set by user. It also doesn't do anything about temp_file_limit
enforcement.

Quite a lot of thought seems to have gone into making the
fd.c/resowner mechanism leak-proof in the face of errors. So, quite
apart from what that approaches misses out on, I really don't want to
change resource management too much. As I went into in my "recap", it
seems useful to change as little as possible about temp files.
Besides, because parallel hash join cannot know the size of the
BufFiles from workers in advance, and thus cannot replicate fd.c fully
by having state for each segment, it ends up actually *relying on*
ENOENT-on-unlink() as a condition that terminates execution:

> +bool
> +BufFileDeleteShared(Oid tablespace, pid_t pid, int set, int partition,
> +   int participant)
> +{
> +   chartempdirpath[MAXPGPATH];
> +   chartempfilepath[MAXPGPATH];
> +   int segment = 0;
> +   boolfound = false;
> +
> +   /*
> +* We don't know if the BufFile really exists, because SharedBufFile
> +* tracks only the range of file numbers.  If it does exists, we don't
> +* khow many 1GB segments it has, so we'll delete until we hit ENOENT or
> +* an IO error.
> +*/
> +   for (;;)
> +   {
> +   make_shareable_path(tempdirpath, tempfilepath,
> +   tablespace, pid, set, partition,
> +   participant, segment);
> +   if (!PathNameDelete(tempfilepath))
> +   break;
> +   found = true;
> +   ++segment;
> +   }
> +
> +   return found;
> +}

However, the whole point of my fortifying the refcount mechanism for
V9 of parallel tuplesort is to not have to ignore a ENOENT like this,
on the grounds that that's ugly/weird (I pointed this out when I
posted my V8, actually). Obviously I could very easily teach fd.c not
to LOG a complaint about an ENOENT on unlink() for the relevant
parallel case, on the assumption that it's only down to an error
during the brief period of co-ownership of a Buffile at the start of
leader unification. Is that acceptable?

Anyway, the only advantage I immediately see with the approach
0007-hj-shared-buf-file-v6.patch takes (over what I've provisionally
written as my V9) is that by putting everything in shared memory all
along, there is no weirdness with tying local memory clean-up to a
shared memory on_dsm_detach() callback. As I said, stashing a local
pointer for the leader in shared memory is weird, even if it
accomplishes the stated goals for V9.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
>> If you don't want to make ExecInitExpr responsible, then the planner would
>> have to do something like split_pathtarget_at_srf anyway to decompose the
>> expressions, no matter which executor representation we use.

> Did we do anything about this?  Are we going to?

No, and I think we should.  Is it on the v10 open items list?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cast result of copyNode()

2017-03-09 Thread Peter Eisentraut
On 3/7/17 18:27, Mark Dilger wrote:
> You appear to be using a #define macro to wrap a function of the same name
> with the code:
> 
> #define copyObject(obj) ((typeof(obj)) copyObject(obj))

Yeah, that's a bit silly.  Here is an updated version that changes that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 696e605c358e7ca5786a13c82504e9d7dbed5eb4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 15:18:59 -0500
Subject: [PATCH v2] Cast result of copyObject() to correct type

copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.

If the compiler supports typeof or something similar, cast the result to
the input type.  This creates a greater amount of type safety.  In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
---
 config/c-compiler.m4  | 27 +
 configure | 40 +++
 configure.in  |  1 +
 src/backend/bootstrap/bootstrap.c |  4 ++--
 src/backend/commands/copy.c   |  2 +-
 src/backend/commands/createas.c   |  2 +-
 src/backend/commands/event_trigger.c  |  8 +++
 src/backend/commands/prepare.c|  4 ++--
 src/backend/commands/view.c   |  2 +-
 src/backend/nodes/copyfuncs.c |  8 +++
 src/backend/optimizer/path/indxpath.c |  4 ++--
 src/backend/optimizer/plan/createplan.c   |  6 ++---
 src/backend/optimizer/plan/initsplan.c|  8 +++
 src/backend/optimizer/plan/planagg.c  |  4 ++--
 src/backend/optimizer/plan/planner.c  |  4 ++--
 src/backend/optimizer/plan/setrefs.c  | 26 ++--
 src/backend/optimizer/plan/subselect.c| 14 +--
 src/backend/optimizer/prep/prepjointree.c |  4 ++--
 src/backend/optimizer/prep/preptlist.c|  2 +-
 src/backend/optimizer/prep/prepunion.c|  4 ++--
 src/backend/optimizer/util/tlist.c| 12 +-
 src/backend/parser/analyze.c  |  2 +-
 src/backend/parser/gram.y |  2 +-
 src/backend/parser/parse_clause.c |  2 +-
 src/backend/parser/parse_expr.c   |  2 +-
 src/backend/parser/parse_relation.c   |  2 +-
 src/backend/parser/parse_utilcmd.c|  6 ++---
 src/backend/rewrite/rewriteHandler.c  |  8 +++
 src/backend/rewrite/rewriteManip.c|  8 +++
 src/backend/tcop/postgres.c   |  6 ++---
 src/backend/utils/cache/plancache.c   | 14 +--
 src/backend/utils/cache/relcache.c|  8 +++
 src/include/nodes/nodes.h |  8 ++-
 src/include/optimizer/tlist.h |  4 ++--
 src/include/pg_config.h.in|  6 +
 src/include/pg_config.h.win32 |  6 +
 36 files changed, 178 insertions(+), 92 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 7d901e1f1a..7afaec5f85 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -178,6 +178,33 @@ fi])# PGAC_C_STATIC_ASSERT
 
 
 
+# PGAC_C_TYPEOF
+# -
+# Check if the C compiler understands typeof or a variant.  Define
+# HAVE_TYPEOF if so, and define 'typeof' to the actual key word.
+#
+AC_DEFUN([PGAC_C_TYPEOF],
+[AC_CACHE_CHECK(for typeof, pgac_cv_c_typeof,
+[pgac_cv_c_typeof=no
+for pgac_kw in typeof __typeof__ decltype; do
+  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
+[int x = 0;
+$pgac_kw(x) y;
+y = x;
+return y;])],
+[pgac_cv_c_typeof=$pgac_kw])
+  test "$pgac_cv_c_typeof" != no && break
+done])
+if test "$pgac_cv_c_typeof" != no; then
+  AC_DEFINE(HAVE_TYPEOF, 1,
+[Define to 1 if your compiler understands `typeof' or something similar.])
+  if test "$pgac_cv_c_typeof" != typeof; then
+AC_DEFINE(typeof, $pgac_cv_c_typeof, [Define to how the compiler spells `typeof'.])
+  fi
+fi])# PGAC_C_TYPEOF
+
+
+
 # PGAC_C_TYPES_COMPATIBLE
 # ---
 # Check if the C compiler understands __builtin_types_compatible_p,
diff --git a/configure b/configure
index b5cdebb510..47a1a59e8e 100755
--- a/configure
+++ b/configure
@@ -11399,6 +11399,46 @@ if test x"$pgac_cv__static_assert" = xyes ; then
 $as_echo "#define HAVE__STATIC_ASSERT 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for typeof" >&5
+$as_echo_n "checking for typeof... " >&6; }
+if ${pgac_cv_c_typeof+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_cv_c_typeof=no
+for pgac_kw in typeof __typeof__ decltype; do
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+int x = 0;
+$pgac_kw(x) y;
+y = x;
+return y;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile 

Re: [HACKERS] Cost model for parallel CREATE INDEX

2017-03-09 Thread Peter Geoghegan
On Wed, Mar 8, 2017 at 5:55 PM, Robert Haas  wrote:
> I like to err on the side of the approach that requires fewer changes.
> That is, if the question is "does pg_restore need to treat this issue
> specially?" and the answer is unclear, I like to assume it probably
> doesn't until some contrary evidence emerges.
>
> I mean, sometimes it is clear that you are going to need special
> handling someplace, and then you have to do it.  But I don't see that
> this is one of those cases, necessarily.

That's what I'll do, then.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 9, 2017 at 2:30 PM, Peter Eisentraut
>  wrote:
>> In practice, I think it's common to do a quick select * from
>> pg_stat_activity to determine whether a database instance is in use.

> I thought of the same kind of thing, and it was discussed upthread.
> There seemed to be more votes for keeping it all in one view, but that
> could change if more people vote.

I've not been paying much attention to this thread, but it seems like
something that would help Peter's use-case and have other uses as well
is a new column that distinguishes different process types --- user
session, background worker, autovacuum worker, etc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-09 Thread Daniel Verite
Vaishnavi Prabakaran wrote:

>if (PQbatchStatus(st->con) != PQBATCH_MODE_ON)
> 
>But, it is better to use if (PQbatchStatus(st->con) ==
> PQBATCH_MODE_OFF) for this verification. Reason is there is one more state
> in batch mode - PQBATCH_MODE_ABORTED. So, if the batch mode is in aborted
> status, this check will still assume that the connection is not in batch
> mode.
> In a same way, "if(PQbatchStatus(st->con) == PQBATCH_MODE_ON)" check is
> better to be modified as "PQbatchStatus(st->con) != PQBATCH_MODE_OFF".

Agreed, these two tests need to be changed to account for the
PQBATCH_MODE_ABORTED case. Fixed in new patch.

> 2.  @@ -2207,7 +2227,47 @@ doCustom(TState *thread, CState *st, StatsData
> *agg)
> + if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
> + {
> + commandFailed(st, "already in batch mode");
> + break;
> + }
> 
> This is not required as below PQbatchBegin() internally does this
> verification.
> 
> + PQbatchBegin(st->con);

The point of this test is to specifically forbid a sequence like this
\beginbatch
...(no \endbatch)
\beginbatch
...
and according to the doc for PQbatchBegin, it looks like the return
code won't tell us:
   "Causes a connection to enter batch mode if it is currently idle or
   already in batch mode.
int PQbatchBegin(PGconn *conn);
   Returns 1 for success"

My understanding is that "already in batch mode" is not an error.

   "Returns 0 and has no effect if the connection is not currently
   idle, i.e. it has a result ready, is waiting for more input from
   the server, etc. This function does not actually send anything to
   the server, it just changes the libpq connection state"

> 3. It is better to check the return value of PQbatchBegin() rather than
> assuming success. E.g: PQbatchBegin() will return false if the connection
> is in copy in/out/both states.

Given point #2 above, I think both tests are needed:
   if (PQbatchStatus(st->con) != PQBATCH_MODE_OFF)
{
   commandFailed(st, "already in batch mode");
   break;
}
if (PQbatchBegin(st->con) == 0)
{
   commandFailed(st, "failed to start a batch");
   break;
}

> > 3. In relation to #2, PQsendQuery() is not forbidden in batch mode
> > although I don't think it can work with it, as it's based on the old
> > protocol.
> >
> It is actually forbidden and you can see the error message "cannot
> PQsendQuery in batch mode, use PQsendQueryParams" when you use
> PQsendQuery().

Sorry for blaming the innocent piece of code, looking closer
it's pgbench that does not display the message.
With the simple query protocol, sendCommand() does essentially:

  r = PQsendQuery(st->con, sql);

... other stuff...

  if (r == 0)
  {
if (debug)
  fprintf(stderr, "client %d could not send %s\n",
  st->id, command->argv[0]);
st->ecnt++;
return false;
  }

So only in debug mode does it inform the user that it failed, and
even then, it does not display PQerrorMessage(st->con).

In non-debug mode it's opaque to the user. If it always fail, it appears
to loop forever. The caller has this comment that is relevant to the problem:

if (!sendCommand(st, command))
  {
/*
 * Failed. Stay in CSTATE_START_COMMAND state, to
 * retry. ??? What the point or retrying? Should
 * rather abort?
 */
return;
  }

I think it doesn't bother anyone up to now because the immediate
failure modes of PQsendQuery() are resource allocation or protocol
failures that tend to never happen.

Anyway currently \beginbatch avoids the problem by checking
whether querymode == QUERY_SIMPLE, to fail gracefully
instead of letting the endless loop happen.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index f6cb5d4..f93673e 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -269,7 +269,8 @@ typedef enum
 *
 * CSTATE_START_COMMAND starts the execution of a command.  On a SQL
 * command, the command is sent to the server, and we move to
-* CSTATE_WAIT_RESULT state.  On a \sleep meta-command, the timer is 
set,
+* CSTATE_WAIT_RESULT state unless in batch mode.
+* On a \sleep meta-command, the timer is set,
 * and we enter the CSTATE_SLEEP state to wait for it to expire. Other
 * meta-commands are executed immediately.
 *
@@ -1882,11 +1883,24 @@ sendCommand(CState *st, Command *command)
if (commands[j]->type != SQL_COMMAND)
continue;
preparedStatementName(name, st->use_file, j);
-   res = PQprepare(st->con, name,
- commands[j]->argv[0], 
commands[j]->argc - 1, NULL);
-   if 

Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management

2017-03-09 Thread Peter Geoghegan
On Thu, Mar 9, 2017 at 11:19 AM, Peter Geoghegan  wrote:
> That patch seems to be solving the problem by completely taking over
> management of temp files from fd.c. That is, these temp files are not
> marked as temp files in the way ordinary temp BufFiles are, with
> explicit buy-in from resowner.c about their temp-ness. It adds
> "#include "catalog/pg_tablespace.h" to buffile.c, even though that
> kind of thing generally lives within fd.c. It does use
> PG_TEMP_FILE_PREFIX, but never manages to use temp_tablespaces if
> that's set by user. It also doesn't do anything about temp_file_limit
> enforcement.

Actually, it's a bigger departure than I suggest here, even.
Currently, literally no BufFile caller ever gets anything other than
fd.c-wise temp files (those marked FD_TEMPORARY). This is the closest
that buffile.c comes to allowing this:

#ifdef NOT_USED
/*
 * Create a BufFile and attach it to an already-opened virtual File.
 *
 * This is comparable to fdopen() in stdio.  This is the only way at present
 * to attach a BufFile to a non-temporary file.  Note that BufFiles created
 * in this way CANNOT be expanded into multiple files.
 */
BufFile *
BufFileCreate(File file)
{
return makeBufFile(file);
}
#endif

IOW, I cannot see why 0007-hj-shared-buf-file-v6.patch leaves parallel
hash join with BufFiles that are even capable of being expanded past
1GiB (1 segment), which is surely not acceptable -- it's a bug. Have I
just missed something?

(This is not to be confused with BufFiles that are interXact -- those
are allowed, but won't work here either. This is why Thomas changed
PHJ to not do it that way some months back.)

At first I thought that it was okay because BufFiles are
immutable/"readonly" once handed over from workers, but of course the
PHJ SharedBufFile mechanism is up-front, and does all resource
management in shared memory. Thus, it must even create BufFiles in
workers that have this only-one-segment restriction as a consequence
of not being temp files (in the conventional sense: FD_TEMPORARY fd.c
segments). I suggest that the MAX_PHYSICAL_FILESIZE stress-test I
sometimes perform [1] be incorporated in Thomas' testing.

(Looks more carefully...)

I now see that PathNameCreateFile() is being called, a new function
which is largely just a new interface to the existing
OpenTemporaryFileInTablespace() temp files. So, if you look carefully,
you notice that you do in fact end up with FD_TEMPORARY fd.c segments
here...so maybe temp_file_limit isn't broken, because, it turns out,
0007-hj-shared-buf-file-v6.patch is still *halfway* buying into the
conventional idea of file temp-ness. But buffile.c if left behind
("file->isTemp == false"), so AFAICT it must still be broken simply
because new segments cannot be written, per BufFileCreate() comments
quoted above.

This has become more of a PHJ review, which is off-topic for this
thread. But my observations illustrate the difficulty with tying
resource manager resources in local memory (temp segments, and
associated BufFile(s)) with clean-up at shared memory segment
detachment. It's possible, but ticklish enough that you'll end up with
some wart or other when you're done. The question, then, is what wart
is the most acceptable for parallel tuplesort? I see two plausible
paths forward right now:

1. Selectively suppress spurious ENOENT-on-unlink() LOG message in the
brief window where it might be spurious. This is the easiest option.
(There is less code this way.)

2. Do more or less what I've already drafted as my V9, which is
described in opening mail to this thread, while accepting the ugliness
that that approach brings with it.

[1] 
postgr.es/m/cam3swzrwdntkhig0gyix_1muaypik3dv6-6542pye2iel-f...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Michael Paquier
On Fri, Mar 10, 2017 at 12:00 AM, Joe Conway  wrote:
> On 03/09/2017 06:34 AM, Robert Haas wrote:
>> On Thu, Mar 9, 2017 at 7:59 AM, Michael Paquier
>>  wrote:
>>> Yes, I agree with that for MD5, and after looking around I can see
>>> (like here http://prosody.im/doc/plain_or_hashed) as well that
>>> SCRAM-hashed is used. Now, there are as well references to the salt,
>>> like in protocol.sgml:
>>> "The salt to use when encrypting the password."
>>> Joe, do you think that in this case using the term "hashing" would be
>>> more appropriate? I would think so as we use it to hash the password.
>>
>> I'm not Joe, but I think that would be more appropriate.
>
> I am Joe and I agree ;-)

OK I'll spawn a new thread on the matter.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling replication connections by default in pg_hba.conf

2017-03-09 Thread Michael Paquier
On Thu, Mar 9, 2017 at 10:43 PM, Peter Eisentraut
 wrote:
> On 3/8/17 02:34, Michael Paquier wrote:
>> This patch looks good to me.
>
> committed

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 2:30 PM, Peter Eisentraut
 wrote:
> Perhaps I'm confused by the title of this thread/CF entry, but
> background workers already do show up in pg_stat_activity.  (You can
> verify that by testing the background sessions patch.)  So which
> additional things are we aiming to see with this?

All the processes that don't normally show up in pg_stat_activity,
such as auxiliary processes.

> In practice, I think it's common to do a quick select * from
> pg_stat_activity to determine whether a database instance is in use.
> (You always see your own session, but that's easy to eyeball.)  If we
> add all the various background processes by default, that will make
> things harder, especially if there is no straightforward way to filter
> them out.
>
> Perhaps a pg_stat_user_* and pg_stat_system_* split like we have for
> some of the statistics tables would be useful?

I thought of the same kind of thing, and it was discussed upthread.
There seemed to be more votes for keeping it all in one view, but that
could change if more people vote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compiler warning in set_tablefunc_size_estimates

2017-03-09 Thread Robert Haas
I tried a non-cassert compile on a machine that has a pickier compiler
than my laptop, and got:

costsize.c: In function ‘set_tablefunc_size_estimates’:
costsize.c:4574:17: error: variable ‘rte’ set but not used
[-Werror=unused-but-set-variable]

That appears to be a legitimate gripe.  Perhaps:

diff --git a/src/backend/optimizer/path/costsize.c
b/src/backend/optimizer/path/costsize.c
index e78f3a8..c23f681 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -4571,12 +4571,9 @@ set_function_size_estimates(PlannerInfo *root,
RelOptInfo *rel)
 void
 set_tablefunc_size_estimates(PlannerInfo *root, RelOptInfo *rel)
 {
-   RangeTblEntry *rte;
-
/* Should only be applied to base relations that are functions */
Assert(rel->relid > 0);
-   rte = planner_rt_fetch(rel->relid, root);
-   Assert(rte->rtekind == RTE_TABLEFUNC);
+   Assert(planner_rt_fetch(rel->relid, root)->rtekind == RTE_TABLEFUNC);

rel->tuples = 100;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] alter enum add value regression

2017-03-09 Thread Andrew Dunstan

It's just been brought to my attention that the following which worked
in current releases is broken in master:


  BEGIN;
  CREATE TYPE test_enum AS ENUM ('v1','v2');
  ALTER TYPE test_enum OWNER TO postgres;
  CREATE TABLE test_table (test_col test_enum DEFAULT 'v1'::test_enum);
  COMMIT;


The reason is shown I think in this comment in enum.c:


* We also insist that the type's pg_type row not be HEAP_UPDATED.  If it
* is, we can't tell whether the row was created or only modified in the
* apparent originating xact, so it might be older than that xact. 
(We do
* not worry whether the enum value is HEAP_UPDATED; if it is, we might
* think it's too new and throw an unnecessary error, but we won't allow
* an unsafe case.)


Clearly this isn't just academic since someone has tripped over it. This
example is based on code in an existing extension install script. Now
the extension is probably somewhat broken because it assumes the
existence of a "postgres" user, but that's a beside the point.


It's not clear to me that we can do anything about this, and the changes
that we made address numerous complaints that we have had about not
being able to add values to an enum in a transaction, so I'm very far
from keen to revert it. But this is sure a pity. At the very least it
deserves a release note.


cheers


andrew






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding an immutable variant of to_date

2017-03-09 Thread Sven R. Kunze

Thanks!

On 08.03.2017 17:37, Andreas Karlsson wrote:
The current code for to_char will on the first call to to_char build 
arrays with the localized names of the week days and the months. I 
suspect that you may need to build something similar but a set of 
arrays per locale.


Not sure what the most C-like way of solving this is. Coming from a 
Python background, I would tend to use a dict. The keys are the locales, 
the values are the arrays.


I would use something which is described here: 
http://stackoverflow.com/questions/3269881/how-to-represent-a-python-like-dictionary-in-c 
. Preferred libc-related, like hcreate and hsearch.



See the DCH_to_char function and its call to cache_locale_time.


Alright. Found the cache here:

 src/backend/utils/adt/pg_locale.c
/* lc_time localization cache */
char   *localized_abbrev_days[7];
char   *localized_full_days[7];
char   *localized_abbrev_months[12];
char   *localized_full_months[12];

/*  */

void
cache_locale_time(void)
{
char   *save_lc_time;
time_t  timenow;
struct tm  *timeinfo;
int i;

#ifdef WIN32
char   *save_lc_ctype;
#endif

/* did we do this already? */
if (CurrentLCTimeValid)
return;

I would replace the invalidation check with a hsearch lookup.



Grepping for the cache variables, I found:

~/src/postgres$ grep -r localized_abbrev_days *
src/backend/utils/adt/pg_locale.c:char *localized_abbrev_days[7];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(_abbrev_days[i], "%a", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_abbrev_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_abbrev_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_abbrev_days[tm->tm_wday], collid);

src/include/utils/pg_locale.h:extern char *localized_abbrev_days[];

~/src/postgres$ grep -r localized_full_days *
src/backend/utils/adt/pg_locale.c:char *localized_full_days[7];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(_full_days[i], "%A", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_full_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_full_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_full_days[tm->tm_wday], collid);

src/include/utils/pg_locale.h:extern char *localized_full_days[];

~/src/postgres$ grep -r localized_abbrev_months *
src/backend/utils/adt/pg_locale.c:char *localized_abbrev_months[12];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(_abbrev_months[i], "%b", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_abbrev_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_abbrev_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_abbrev_months[tm->tm_mon - 1], collid);

src/include/utils/pg_locale.h:extern char *localized_abbrev_months[];

srkunze@mexico:~/src/postgres$ grep -r localized_full_months *
src/backend/utils/adt/pg_locale.c:char *localized_full_months[12];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(_full_months[i], "%B", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_full_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_full_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_full_months[tm->tm_mon - 1], collid);

src/include/utils/pg_locale.h:extern char *localized_full_months[];


It seems to me, that I would need to make them point to the correct 
array address at the beginning of those functions by doing an hsearch.


That step is basically independent of the actual immutable to_date idea.


What do you think?

Best,
Sven



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding an immutable variant of to_date

2017-03-09 Thread Sven R. Kunze

On 08.03.2017 03:37, Robert Haas wrote:

The commit message for 64353640e87b54250f1b8a1d2a708d270dff4bfd has
some interesting perspective on this.


"""
Also, mark to_date() and to_char(interval) as stable; although these appear
not to depend on any GUC variables as of CVS HEAD, that seems a property
unlikely to survive future improvements.  It seems best to mark all the
formatting functions stable and be done with it.
"""

My take away from this commit is the following:

Historically, the immutability of "to_date(text, text)" was an emergent 
feature. Proven to be possibly mutable, the parsing feature had a higher 
priority, so the immutability needed to be removed.


The proposed variant on the other hand should be immutable first before 
everything else. Thus, future improvements cannot violate that. This 
might restrict the possible parsing functionality but that shouldn't be 
a problem in practice.


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning in set_tablefunc_size_estimates

2017-03-09 Thread Tom Lane
Robert Haas  writes:
> I tried a non-cassert compile on a machine that has a pickier compiler
> than my laptop, and got:

> costsize.c: In function ‘set_tablefunc_size_estimates’:
> costsize.c:4574:17: error: variable ‘rte’ set but not used
> [-Werror=unused-but-set-variable]

> That appears to be a legitimate gripe.  Perhaps:

I think PG_USED_FOR_ASSERTS_ONLY would be a better solution.  It's
only happenstance that the function currently uses the RTE just
for this; if it grows another use, your approach would be harder
to clean up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib modules and relkind check

2017-03-09 Thread Stephen Frost
Amit, Michael,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> On 2017/03/09 11:51, Michael Paquier wrote:
> > OK, I am marking that as ready for committer.
> 
> Thanks.

Thanks for this, I've pushed this now.  I do have a few notes about
changes that I made from your patch;

- Generally speaking, the user-facing functions come first in these .c
  files, with a prototype at the top for the static functions defined
  later on in the file.  I went ahead and did that for the functions you
  added too.

- I added more comments to the regression tests, in particular, we
  usually comment when tests are expected to fail.

- I added some additional regression tests to cover more cases,
  particularly ones for things that weren't being tested at all.

- Not the fault of your patch, but there were cases where elog() was
  being used when it really should have been ereport(), so I changed
  those cases to all be, hopefully, consistent throughout.

Thanks again!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-03-09 Thread Michael Paquier
On Thu, Mar 9, 2017 at 10:54 PM, Peter Eisentraut
 wrote:
> On 3/7/17 11:16, Robert Haas wrote:
>> Well, if the problem you're trying to solve is "retain WAL for as long
>> as possible without running out of disk space and having everything go
>> kablooey", then it would solve that problem, and I think that's a very
>> reasonable problem to want to solve.
>
> Could be.  I'm not sure what that means for the presented patch, though.
>  Or whether it addresses Michael's original use case at all.

The patch adding an end-segment command does address my problem,
because I just want to be sure that there is enough space left on disk
for one complete segment. And that's fine to check for that when the
last segment is complete. This needs some additional effort but that's
no big deal either.

Having something like --limit-retained-segments partially addresses
it, as long as there is a way to define an automatic mode, based on
statvfs() obviously.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] compiler warning in set_tablefunc_size_estimates

2017-03-09 Thread Robert Haas
On Thu, Mar 9, 2017 at 4:39 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> I tried a non-cassert compile on a machine that has a pickier compiler
>> than my laptop, and got:
>
>> costsize.c: In function ‘set_tablefunc_size_estimates’:
>> costsize.c:4574:17: error: variable ‘rte’ set but not used
>> [-Werror=unused-but-set-variable]
>
>> That appears to be a legitimate gripe.  Perhaps:
>
> I think PG_USED_FOR_ASSERTS_ONLY would be a better solution.  It's
> only happenstance that the function currently uses the RTE just
> for this; if it grows another use, your approach would be harder
> to clean up.

Yeah, we might have to revert the entire -4/+1 line patch.

Actually, the thing I don't like about that is that that then we're
still emitting code for the planner_rt_fetch.  That probably doesn't
cost much, but why do it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Naytro Naytro
2017-03-09 18:28 GMT+01:00 Robert Haas :

> On Thu, Mar 9, 2017 at 7:47 AM, Naytro Naytro 
> wrote:
> > We are having some performance issues after we upgraded to newest
> > version of PostgreSQL, before it everything was fast and smooth.
> >
> > Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> > upgraded to 9.6.2 with no improvement.
> >
> > Some information about our setup: Freebsd, Solaris (SmartOS), simple
> > master-slave using streaming replication.
> >
> > Problem:
> > Very high system CPU when master is streaming replication data, CPU
> > goes up to 77%. Only one process is generating this load, it's a
> > postgresql startup process. When I attached a truss to this process I
> > saw a lot o read calls with almost the same number of errors (EAGAIN).
> > read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> >
> > Descriptor 6 is a pipe
> >
> > Read call try to read one byte over and over, I looked up to source
> > code and I think this file is responsible for this behavior
> > src/backend/storage/ipc/latch.c. There was no such file in 9.4.
>
> Our latch implementation did get overhauled pretty thoroughly in 9.6;
> see primarily commit 98a64d0bd713cb89e61bef6432befc4b7b5da59e.  But I
> can't guess what is going wrong here based on this information.  It
> might help if you can pull some stack backtraces from the startup
> process.


Dtrace from solaris: http://pastebin.com/u03uVKbr


Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread David Rowley
On 10 March 2017 at 06:17, Robert Haas  wrote:

> On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar 
> wrote:
> > On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar 
> wrote:
> >> I slightly modified your query to reproduce this issue.
> >>
> >> explain analyze select * from r1 where value<555;
> >>
> >> Patch is attached to fix the problem.
> >
> > I forgot to mention the cause of the problem.
> >
> > if (istate->schunkptr < istate->nchunks)
> > {
> >PagetableEntry *chunk = [idxchunks[istate->schunkptr]];
> > PagetableEntry *page = [idxpages[istate->spageptr]];
> > BlockNumber chunk_blockno;
> >
> > In above if condition we have only checked istate->schunkptr <
> > istate->nchunks that means we have some chunk left so we are safe to
> > access idxchunks,  But just after that we are accessing
> > ptbase[idxpages[istate->spageptr]] without checking that accessing
> > idxpages is safe or not.
> >
> > tbm_iterator already handling this case, I broke it in
> tbm_shared_iterator.
>
> I don't know if this is the only problem -- it would be good if David
> could retest -- but it's certainly *a* problem, so committed.
>

Thanks for committing, and generally parallelising more stuff.

I confirm that my test case is now working again.

I'll be in this general area today, so will mention if I stumble over
anything that looks broken.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


  1   2   3   >