Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-22 Thread Kevin Grittner
Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Jul 21, 2015 at 3:10 PM, Kevin Grittner kgri...@ymail.com wrote:
 Kevin Grittner kgri...@ymail.com wrote:

 If you run `make installcheck` against a cluster with
 default_transaction_isolation = 'repeatable read' you get one
 failure:

 + ERROR: brin_summarize_new_values() cannot run in a transaction that has 
 already obtained a snapshot

 This is using source at commit 9faa6ae14f6098e4b55f0131f7ec2694a381fb87.

 Ah, the very next commit fixed this while I was running the tests.

 Still looks broken from here.

You are right; I rushed my test to see whether Tom's patch had any
impact, and neglected to set default_transaction_isolation after
intalling the new build.  :-(  The problem still reliably occurs,
and the error message still references a function name that was not
part of the problem.

I have added this to the PostgreSQL 9.5 Open Items Wiki page.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Jim Nasby

On 7/22/15 6:58 AM, Amit Langote wrote:

On Wed, Jul 22, 2015 at 8:19 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:


Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
disk happens?  Obviously, things become slower, so maybe you need to
consider it for progress reporting purposes.



Perhaps the m_w_m determines how many dead tuples lazy_scan_heap() can
keep track of before doing a lazy_vacuum_indexes() +
lazy_vacuum_heap() round. Smaller the m_w_m, more the number of index
scans, slower the progress?


Yes. Any percent completion calculation will have to account for the 
case of needing multiple passes through all the indexes.


Each dead tuple requires 6 bytes (IIRC) of maintenance work mem. So if 
you're deleting 5M rows with m_w_m=1MB you should be getting many passes 
through the indexes. Studying the output of VACUUM VERBOSE will confirm 
that (or just throw a temporary WARNING in the path where we start the 
scan).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Jeff Janes
On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
  Attached is a patch that implements the vm scan for truncation.  It
  introduces a variable to hold the last blkno which was skipped during the
  forward portion.  Any blocks after both this blkno and after the last
  inspected nonempty page (which the code is already tracking) must have
 been
  observed to be empty by the current vacuum.  Any other process rendering
 the
  page nonempty are required to clear the vm bit, and no other process can
 set
  the bit again during the vacuum's lifetime.  So if the bit is still set,
 the
  page is still empty without needing to inspect it.

 Urgh.  So if we do this, that forever precludes having HOT pruning set
 the all-visible bit.


I wouldn't say forever, as it would be easy to revert the change if
something more important came along that conflicted with it.  I don't think
this change would grow tentacles across the code that make it hard to
revert, you would just have to take the performance hit (and by that time,
maybe HDD will truly be dead anyway and so we don't care anymore). But yes,
that is definitely a downside.  HOT pruning is one example, but also one
could envision having someone (bgwriter?) set vm bits on unindexed tables.
Or if we invent some efficient way to know that no expiring tids for a
certain block range are stored in indexes, other jobs could also set the vm
bit on indexed tables.  Or parallel vacuums in the same table, not that I
really see a reason to have those.


 At the least we'd better document that carefully
 so that nobody breaks it later.  But I wonder if there isn't some
 better approach, because I would certainly rather that we didn't
 foreclose the possibility of doing something like that in the future.


But where do we document it (other than in-place)?  README.HOT doesn't seem
sufficient, and there is no README.vm.

I guess add an Assert(InRecovery || running_a_vacuum); to
the visibilitymap_set with a comment there, except that I don't know how to
implement running_a_vacuum so that it covers manual vacs as well as
autovac.  Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

The advantage of the other approach, just force kernel read-ahead to work
for us, is that it doesn't impose any of these restrictions on future
development.  The disadvantage is that I don't know how to auto-tune it, or
auto-disable it for SSD, and it will never be as quite as efficient.

Cheers,

Jeff


Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-22 Thread Heikki Linnakangas

On 07/22/2015 11:18 AM, Simon Riggs wrote:

On 10 July 2015 at 00:06, Tom Lane t...@sss.pgh.pa.us wrote:


Andres Freund and...@anarazel.de writes:

On 2015-07-06 11:49:54 -0400, Tom Lane wrote:

Rather than reverting cab9a0656c36739f, which would re-introduce a
different performance problem, perhaps we could have COPY create a new
relfilenode when it does this.  That should be safe if the table was
previously empty.



I'm not convinced that cab9a0656c36739f needs to survive in that
form. To me only allowing one COPY to benefit from the wal_level =
minimal optimization has a significantly higher cost than
cab9a0656c36739f.


What evidence have you got to base that value judgement on?

cab9a0656c36739f was based on an actual user complaint, so we have good
evidence that there are people out there who care about the cost of
truncating a table many times in one transaction.  On the other hand,
I know of no evidence that anyone's depending on multiple sequential
COPYs, nor intermixed COPY and INSERT, to be fast.  The original argument
for having this COPY optimization at all was to make restoring pg_dump
scripts in a single transaction fast; and that use-case doesn't care
about anything but a single COPY into a virgin table.



We have to backpatch this fix, so it must be both simple and effective.

Heikki's suggestions may be best, maybe not, but they don't seem
backpatchable.

Tom's suggestion looks good. So does Andres' suggestion. I have coded both.


Thanks. For comparison, I wrote a patch to implement what I had in mind.

When a WAL-skipping COPY begins, we add an entry for that relation in a 
pending-fsyncs hash table. Whenever we perform any action on a heap 
that would normally be WAL-logged, we check if the relation is in the 
hash table, and skip WAL-logging if so.


That was a simplified explanation. In reality, when WAL-skipping COPY 
begins, we also memorize the current size of the relation. Any actions 
on blocks greater than the old size are not WAL-logged, and any actions 
on smaller-numbered blocks are. This ensures that if you did any INSERTs 
on the table before the COPY, any new actions on the blocks that were 
already WAL-logged by the INSERT are also WAL-logged. And likewise if 
you perform any INSERTs after (or during, by trigger) the COPY, and they 
modify the new pages, those actions are not WAL-logged. So starting a 
WAL-skipping COPY splits the relation into two parts: the first part 
that is WAL-logged as usual, and the later part that is not WAL-logged. 
(there is one loose end marked with XXX in the patch on this, when one 
of the pages involved in a cold UPDATE is before the watermark and the 
other is after)


The actual fsync() has been moved to the end of transaction, as we are 
now skipping WAL-logging of any actions after the COPY as well.


And truncations complicate things further. If we emit a truncation WAL 
record in the transaction, we also make an entry in the hash table to 
record that. All operations on a relation that has been truncated must 
be WAL-logged as usual, because replaying the truncate record will 
destroy all data even if we fsync later. But we still optimize for 
BEGIN; CREATE; COPY; TRUNCATE; COPY; style patterns, because if we 
truncate a relation that has already been marked for fsync-at-COMMIT, we 
don't need to WAL-log the truncation either.



This is more invasive than I'd like to backpatch, but I think it's the 
simplest approach that works, and doesn't disable any of the important 
optimizations we have.



And what reason is there to think that this would fix all the problems?


I don't think either suggested fix could be claimed to be a great solution,
since there is little principle here, only heuristic. Heikki's solution
would be the only safe way, but is not backpatchable.


I can't get too excited about a half-fix that leaves you with data 
corruption in some scenarios.


I wrote a little test script to test all these different scenarios 
(attached). Both of your patches fail with the script.


- Heikki

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 2e3b9d2..9ef688b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2021,12 +2021,6 @@ FreeBulkInsertState(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2115,7 +2109,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 	MarkBufferDirty(buffer);
 
 

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Jim Nasby

On 7/22/15 9:15 AM, Robert Haas wrote:

I'm not proposing this feature, I'm merely asking for it to be defined in a
way that makes it work for more than just VACUUM. Once we have a way of
reporting useful information, other processes can be made to follow that
mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
important, even if we never get such information for user queries. But I
hope we do.

I won't get in the way of your search for detailed information in more
complex forms. Both things are needed.

OK.

One idea I have is to create a system where we expose a command tag
(i.e. VACUUM) plus a series of generic fields whose specific meanings
are dependent on the command tag.  Say, 6 bigint counters, 6 float8
counters, and 3 strings up to 80 characters each.  So we have a
fixed-size chunk of shared memory per backend, and each backend that
wants to expose progress information can fill in those fields however
it likes, and we expose the results.

This would be sorta like the way pg_statistic works: the same columns
can be used for different purposes depending on what estimator will be
used to access them.


If we want to expose that level of detail, I think either JSON or arrays 
would make more sense, so we're not stuck with a limited amount of info. 
Perhaps DDL would be OK with the numbers you suggested, but 
https://www.pgcon.org/2013/schedule/events/576.en.html would not, and I 
think wanting query progress is much more common.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-07-22 Thread Robert Haas
On Mon, Jul 6, 2015 at 8:49 PM, Haribabu Kommi kommi.harib...@gmail.com wrote:
 I ran some performance tests on a 16 core machine with large shared
 buffers, so there is no IO involved.
 With the default value of cpu_tuple_comm_cost, parallel plan is not
 getting generated even if we are selecting 100K records from 40
 million records. So I changed the value to '0' and collected the
 performance readings.

 Here are the performance numbers:

 selectivity(millions)  Seq scan(ms)  Parallel scan
  2 workers
 4 workers 8 workers
   0.1  11498.934821.40
 3305.843291.90
   0.4  10942.984967.46
 3338.583374.00
   0.8  11619.445189.61
 3543.863534.40
   1.5  12585.515718.07
 4162.712994.90
   2.7  14725.668346.96
 10429.058049.11
   5.4  18719.00  20212.33 21815.19
  19026.99
   7.2  21955.79  28570.74 28217.60
  27042.27

 The average table row size is around 500 bytes and query selection
 column width is around 36 bytes.
 when the query selectivity goes more than 10% of total table records,
 the parallel scan performance is dropping.

Thanks for doing this testing.  I think that is quite valuable.  I am
not too concerned about the fact that queries where more than 10% of
records are selected do not speed up.  Obviously, it would be nice to
improve that, but I think that can be left as an area for future
improvement.

One thing I noticed that is a bit dismaying is that we don't get a lot
of benefit from having more workers.  Look at the 0.1 data.  At 2
workers, if we scaled perfectly, we would be 3x faster (since the
master can do work too), but we are actually 2.4x faster.  Each
process is on the average 80% efficient.  That's respectable.  At 4
workers, we would be 5x faster with perfect scaling; here we are 3.5x
faster.   So the third and fourth worker were about 50% efficient.
Hmm, not as good.  But then going up to 8 workers bought us basically
nothing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread YANG

Hello,

I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom
=

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2)  10;


  QUERY PLAN
---
 Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual 
time=18792.236..18792.236 rows=1 loops=1)
   -  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108) 
(actual time=4249.656..18792.074 rows=77 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: NoGroup
 Device Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision)))  
'10'::double precision)
 -  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32 
rows=1060 width=0) (actual time=139.399..18499.246 rows=1000 loops=1)
 Planning time: 0.262 ms
 Execution time: 19268.650 ms
(8 rows)



explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;

QUERY PLAN
--
 HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual 
time=11311.568..11311.572 rows=26 loops=1)
   Group Key: t0.cat
   -  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52) 
(actual time=3304.727..11310.021 rows=2307 loops=1)
 Bulkload: On (density: 100.00%)
 Reduction: Local + Global
 -  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=1060 
width=12) (actual time=923.417..2661.113 rows=1000 loops=1)
   Bulkload: On (density: 100.00%)
   Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = 
aid), nrows_ratio: 1.
   -  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 
rows=1060 width=16) (actual time=6.980..871.431 rows=1000 loops=1)
   -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.204..7.309 rows=4 loops=1)
 Planning time: 47.834 ms
 Execution time: 11355.103 ms
(12 rows)


without pg_strom


test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + 
(y-12.8)^2)  10;
   
QUERY PLAN

 Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual 
time=3880.379..3880.379 rows=1 loops=1)
   -  Seq Scan on t0  (cost=0.00..417859.65 rows=353 width=0) (actual 
time=0.075..3859.200 rows=314063 loops=1)
 Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision)))  
'10'::double precision)
 Rows Removed by Filter: 9685937
 Planning time: 0.411 ms
 Execution time: 3880.445 ms
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
  QUERY PLAN
--
 HashAggregate  (cost=431593.73..431594.05 rows=26 width=12) (actual 
time=4960.810..4960.812 rows=26 loops=1)
   Group Key: t0.cat
   -  Hash Join  (cost=1234.00..381593.43 rows=1060 width=12) (actual 
time=20.859..3367.510 rows=1000 loops=1)
 Hash Cond: (t0.aid = t1.aid)
 -  Seq Scan on t0  (cost=0.00..242858.60 rows=1060 width=16) 
(actual time=0.021..895.908 rows=1000 loops=1)
 -  Hash  (cost=734.00..734.00 rows=4 width=4) (actual 
time=20.567..20.567 rows=4 loops=1)
   Buckets: 65536  Batches: 1  Memory Usage: 1919kB
   -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4) 
(actual time=0.017..11.013 rows=4 loops=1)
 Planning time: 0.567 ms
 Execution time: 4961.029 ms
(10 rows)



Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with

,
| ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
| make -j8 all
| make install
`

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin

,
| make
| make install
`


4. create a db with 

Re: [HACKERS] A huge debt of gratitude - Michael Stonebraker

2015-07-22 Thread Robert Haas
On Tue, Jul 21, 2015 at 6:42 PM, Jolly Chen jo...@chenfamily.com wrote:
 You have probably heard that Mike Stonebraker recently won the Turing award.  
 A recording of his award lecture is available at:
 https://www.youtube.com/watch?v=BbGeKi6T6QI

 It is an entertaining talk overall. If you fast forward to about the 1:07 
 mark, he makes some comments about postgres.

 Here’s my rough transcription:

 The abstract data type system in postgres has been added to a lot of 
 relational database systems. It's kind of de facto table stakes for 
 relational databases these days, essentially intact.  That idea was really a 
 good one. It was mentioned in the citation for my Turing award winning.  
 However, serendipity played a huge role, which is, the biggest impact of 
 postgres by far came from two Berkeley students that I'll affectionately call 
 Grumpy and Sleepy.  They converted the academic postgres prototype from QUEL 
 to SQL in 1995. This was in parallel to the commercial activity. And then a 
 pick-up team of volunteers, none of whom have anything to do with me or 
 Berkeley, have been shepherding that open source system ever since 1995. The 
 system that you get off the web for postgres comes from this pick-up team.  
 It is open source at its best and I want to just mention that I have nothing 
 to do with that and that collection of folks we all owe a huge debt of 
 gratitude to, because they have robustize that code line and made it so it 
 really works.”

 Thank you all so much for your hard work over the last twenty years!!

Wow, thanks for reaching out.  Here is a quote from the current
version of src/test/regress/input/misc.source:

--
-- BTREE shutting out non-functional updates
--
-- the following two tests seem to take a long time on some
-- systems.This non-func update stuff needs to be examined
-- more closely.- jolly (2/22/96)
--

That comment might be obsolete, but we still have it, and a few other
references.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Josh Berkus
On 07/22/2015 08:16 AM, YANG wrote:
 
 Hello,
 
 I've performed some tests on pg_strom according to the wiki. But it seems that
 queries run slower on GPU than CPU. Can someone shed a light on what's wrong
 with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
 Ubuntu 15.04. And the results was

I believe that pgStrom has its own mailing list.  KaiGai?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Volatility of pg_xact_commit_timestamp() and pg_last_committed_xact()

2015-07-22 Thread Robert Haas
On Thu, Jul 16, 2015 at 9:49 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Volatilities of pg_xact_commit_timestamp() and pg_last_committed_xact()
 are now STABLE. But ISTM that those functions can return different results
 even within a single statement. So we should change the volatilities of them
 to VOLATILE?

Sounds right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A huge debt of gratitude - Michael Stonebraker

2015-07-22 Thread Jolly Chen
Hey everyone,

You have probably heard that Mike Stonebraker recently won the Turing award.  A 
recording of his award lecture is available at:
https://www.youtube.com/watch?v=BbGeKi6T6QI

It is an entertaining talk overall. If you fast forward to about the 1:07 mark, 
he makes some comments about postgres.

Here’s my rough transcription:

The abstract data type system in postgres has been added to a lot of 
relational database systems. It's kind of de facto table stakes for relational 
databases these days, essentially intact.  That idea was really a good one. It 
was mentioned in the citation for my Turing award winning.  However, 
serendipity played a huge role, which is, the biggest impact of postgres by far 
came from two Berkeley students that I'll affectionately call Grumpy and 
Sleepy.  They converted the academic postgres prototype from QUEL to SQL in 
1995. This was in parallel to the commercial activity. And then a pick-up team 
of volunteers, none of whom have anything to do with me or Berkeley, have been 
shepherding that open source system ever since 1995. The system that you get 
off the web for postgres comes from this pick-up team.  It is open source at 
its best and I want to just mention that I have nothing to do with that and 
that collection of folks we all owe a huge debt of gratitude to, because they 
have robustize that code line and made it so it really works.”

Thank you all so much for your hard work over the last twenty years!!

Affectionately,

Grumpy



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Josh Berkus
All:

Sounds like the overwhemling consensus is Alpha2 then.  Will run with it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Robert Haas
On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund and...@anarazel.de wrote:
 But I'm not going to complain too loudly if we don't do invalidation.

Not doing invalidation seems silly to me.  But I don't want to bend
Paul too far around the axle, either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script other stuff

2015-07-22 Thread Josh Berkus
On 07/21/2015 10:25 PM, Fabien COELHO wrote:
 
 Hello Josh,
 
 Maybe -f file.sql:weight (yuk from my point of view, but it can be done
 easily).

 Maybe it's past time for pgbench to have a config file?
 
 That is an idea.  For simple usage, for backward compatibility and for
 people like me who like them, ISTM that options are fine too:-)
 
 Also this may mean adding a dependency to some YAML library, configure
 issues (I'm not sure whether pg currently uses YAML, and JSON is quite
 verbose), maybe conditionals around the feature to compile without the
 dependency, more documentation...
 
 I'm not sure all that is desirable just for weighting scripts.

Maybe not.

If so, I would vote for:

-f script1.bench:3 -f script2.bench:1

over:

-f script1.bench -w 3 -f script2.bench -w 1

Making command-line options order-dependant breaks a lot of system call
libraries in various languages, as well as being easy to mess up.

 Given that we want to define some per-workload options, the config file
 would probably need to be YAML or JSON, e.g.:

 [...]

 script1:
 file: script1.bench
 weight: 3
 script2:
 file: script2.bench
 weight: 1

 the above would execute a pgbench with 16 clients, 4 threads, script1
 three times as often as script2, and report stats at the script (rather
 than SQL statement) level.
 
 Yep. Probably numbering within field names should be avoided, so a list
 of records that could look like:

Oh, you misunderstand.  script1 and script2 are meant to be
user-supplied names which then get reported in things like response time
output.  They're labels. Better example:

deposit:
file: deposit.bench
weight: 3
withdrawal:
file: withdrawal.bench
weight: 3
reporting:
file: summary_report.bench
weigh: 1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Eliminating CREATE INDEX comparator TID tie-breaker overhead

2015-07-22 Thread Robert Haas
On Tue, Jul 21, 2015 at 4:06 PM, Peter Geoghegan p...@heroku.com wrote:
 Design considerations and consequences
 

Good write-up.

 I'm not concerned about synchronized scans breaking my assumption of a
 physical ordering of heaptuples being fed to tuplesort.c. I think that
 it is unlikely to ever be worth seriously considering this case.

Why not?

 I have a hard time imagining anything (beyond synchronous scans)
 breaking my assumption that index tuplesorts receive tuples in heap
 physical order. If anything was to break that in the future (e.g.
 parallelizing the heap scan for index builds), then IMV the onus
 should be on that new case to take appropriate precautions against
 breaking my assumption.

I'm very dubious about that.  There are lots of reasons why we might
want to read tuples out of order; for example, suppose we want a
parallel sequential scan to feed the sort.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin index vacuum versus transaction snapshots

2015-07-22 Thread Alvaro Herrera
Kevin Grittner wrote:
 If you run `make installcheck` against a cluster with
 default_transaction_isolation = 'repeatable read' you get one
 failure:

So there's some code that's specifically intended to handle this case:

/*
 * If using transaction-snapshot mode, it would 
be possible
 * for another transaction to insert a tuple 
that's not
 * visible to our snapshot if we have already 
acquired one,
 * when in snapshot-isolation mode; therefore, 
disallow this
 * from running in such a transaction unless a 
snapshot hasn't
 * been acquired yet.
 *
 * This code is called by VACUUM and
 * brin_summarize_new_values. Have the error 
message mention
 * the latter because VACUUM cannot run in a 
transaction and
 * thus cannot cause this issue.
 */
if (IsolationUsesXactSnapshot()  
FirstSnapshotSet)
ereport(ERROR,

(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 
errmsg(brin_summarize_new_values() cannot run in a transaction that has 
already obtained a snapshot)));

However, this comment is full of it because a snapshot is obtained even
for VACUUM.  So the fact that brin_summarize_new_values() is mentioned
as being in trouble is just a very minor issue: fixing that would just
be a matter of passing a flag down from the caller into
summarize_range() to indicate whether it's vacuum or the function.  The
real problem is that VACUUM should be allowed to run without error.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5

2015-07-22 Thread Jeevan Chalke
Hi

It looks like we have broken the ROW expression without explicit
ROW keyword in GROUP BY.
I mean, after Grouping sets merge, if we have (c1, c2) in group by,
we are treating it as ROW expression for grouping, but at the same
time we are allowing individual column in the target list.
However this was not true with PG9.4 where we error out saying
column c1 must appear in the GROUP BY clause...

But if I use explicit ROW keyword, like ROW(c1, c2), then on PG95
it error outs for individual column reference in select list.

Example may clear more:

ON PG 9.5 (after grouping sets implementation)

postgres=# create view gstest1(a,b,v)
postgres-#   as values (1,1,10),(1,1,11),(1,2,12),(1,2,13),(1,3,14),
postgres-# (2,3,15),
postgres-# (3,3,16),(3,4,17),
postgres-# (4,1,18),(4,1,19);
CREATE VIEW

postgres=#
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
 a | b | max
---+---+-
 1 | 1 |  11
 1 | 2 |  13
 1 | 3 |  14
 2 | 3 |  15
 3 | 3 |  16
 3 | 4 |  17
 4 | 1 |  19
(7 rows)

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column gstest1.a must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
   ^

In above example, you see that when we have only (a, b), it is working fine.
But when we have ROW(a, b), it is throwing an error.
On PG 9.4 both cases are failing. Here it is:

postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column gstest1.a must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY (a, b) ORDER BY 1,...
   ^
postgres=# SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b)
ORDER BY 1, 2, 3 DESC;
ERROR:  column gstest1.a must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: SELECT a, b, max(v) FROM gstest1 GROUP BY ROW(a, b) ORDER BY...
   ^

Do we broke ROW expression semantics in grouping sets implementation?

Any idea why is this happening?

Thanks
-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-22 Thread Dean Rasheed
On 3 July 2015 at 20:50, Tom Lane t...@sss.pgh.pa.us wrote:
 Petr Korobeinikov pkorobeini...@gmail.com writes:
 Fixed. Now both \ev and \sv  numbering lines starting with 1. New version
 attached.

 Applied with a fair amount of mostly-cosmetic adjustment.

 As I've already noticed that pg_get_viewdef() does not support full syntax
 of creating or replacing views.

 Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
 of a PITA for this purpose that it doesn't include the CREATE text for
 you, but we're surely not changing that behavior now.


This appears to be missing support for view options (WITH CHECK OPTION
and security_barrier), so editing a view with either of those options
will cause them to be stripped off.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql :: support for \ev viewname and \sv viewname

2015-07-22 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 On 3 July 2015 at 20:50, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh?  If that were true, pg_dump wouldn't work on such views.  It is kind
 of a PITA for this purpose that it doesn't include the CREATE text for
 you, but we're surely not changing that behavior now.

 This appears to be missing support for view options (WITH CHECK OPTION
 and security_barrier), so editing a view with either of those options
 will cause them to be stripped off.

Hm.  Why exactly were those not implemented as part of pg_get_viewdef?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script other stuff

2015-07-22 Thread Fabien COELHO



If so, I would vote for:
  -f script1.bench:3 -f script2.bench:1
over:
  -f script1.bench -w 3 -f script2.bench -w 1


Ok, I'll take that into consideration. Any other opinion out there? The 
current v3 version is:


  -w 3 -f script1.bench -w 1 -f script2.bench

With provision to generate errors if a -w is set but not used,
in two case.

 - in the middle ... -w 4 no script option... -w 1 ...
 - in the end ... -w 1 no script option...

I can provide -f x:weight easilly, but this mean that there will be no way 
to associate weight for internal scripts. Not orthogonal, not very 
elegant, but no big deal.


Oh, you misunderstand.  script1 and script2 are meant to be 
user-supplied names which then get reported in things like response time 
output.  They're labels.


Ok, that is much better. This means that labels should not choose names 
which may interact with other commands, so maybe a list would have been 
nice as well. Anyway, I do not think it is the way to go just for this 
feature.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Eliminating CREATE INDEX comparator TID tie-breaker overhead

2015-07-22 Thread Peter Geoghegan
On Wed, Jul 22, 2015 at 11:03 AM, Robert Haas robertmh...@gmail.com wrote:
 I'm not concerned about synchronized scans breaking my assumption of a
 physical ordering of heaptuples being fed to tuplesort.c. I think that
 it is unlikely to ever be worth seriously considering this case.

 Why not?

The case for doing this tie-breaker is theoretical. The original
rationale for adding it is now obsolete. On the other hand, the cost
of doing it is most certainly not theoretical.

If it's absolutely necessary to preserve a guarantee that equal tuples
are in TID order (rather than at most two staggered sequential
groupings per set of equal tuples -- possible with synchronized
scans), then I suggest we detect synchronized scans and disable this
optimization. They're not especially common, so it would still be
worthwhile, in my estimation.

 I have a hard time imagining anything (beyond synchronous scans)
 breaking my assumption that index tuplesorts receive tuples in heap
 physical order. If anything was to break that in the future (e.g.
 parallelizing the heap scan for index builds), then IMV the onus
 should be on that new case to take appropriate precautions against
 breaking my assumption.

 I'm very dubious about that.  There are lots of reasons why we might
 want to read tuples out of order; for example, suppose we want a
 parallel sequential scan to feed the sort.

I agree that there are many reasons to want to do that. If we ever get
parallel sorts, then having a bunch of memtuple arrays, each fed by a
worker participating in a parallel scan makes sense. Those runs could
still be sorted in physical order. Only the merge phase would have to
do pointer chasing to tie-break on the real TID, and maybe not even
then (because run number can also be a proxy for physical order when
merging, assuming some new parallel internal sort algorithm).
Actually, for tape sort/replacement selection sort, the initial heap
build (where run number 0 is assigned currently) could probably reuse
this trick. I just haven't done that because it would be significantly
more invasive and less helpful.

If it's just a matter of wanting to parallelize the heap scan for its
own sake, then I don't think that's likely to be a terribly effective
optimization for B-Tree index builds; most of the cost is always going
to be in the sort proper, which I'm targeting here. In any case, I
think that the very common case where an int4 PK index is built using
quicksort should not have to suffer to avoid a minor inconveniencing
of (say) parallel sorts.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-22 Thread Adam Brightwell
 I don't think there's any line near pg_dumpall.  That tool seems to
 have grown out of desperation without much actual design.  I think it
 makes more sense to plan around that's the best pg_dump behavior for the
 various use cases.

Ok.

 I like Noah's proposal of having pg_dump --create reproduce all
 database-level state.

Should it be enabled by default?  If so, then wouldn't it make more
sense to call it --no-create and do the opposite?  So, --no-create
would exclude rather than include database-level information?  Would
enabling it by default cause issues with the current expected use of
the tool by end users?

How would this handle related global objects? It seems like this part
could get a little tricky.

Taking it one step further, would a --all option that dumps all
databases make sense as well?  Of course I know that's probably a
considerable undertaking and certainly beyond the current scope.
Though, I thought I'd throw it out there.

Also, I think this would potentially conflict with what Fabrízio is
doing with CURRENT_DATABASE on COMMENT, though, I think it might be a
preferable solution.

https://commitfest.postgresql.org/5/229/

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Andres Freund
On 2015-07-22 14:55:15 -0400, Tom Lane wrote:
 Just to be clear here: the case we are concerned about is, given that
 we have determined that function X is or is not a member of one of the
 extensions marked shippable for a given connection, is it likely that
 that status will change (while the function continues to exist with
 the same OID) during the lifespan of the connection?  If it did change,
 how critical would it be for us to honor the new shippability criterion
 on that connection?  My answer to both is not very.  So I'm not excited
 about expending lots of code or cycles to check for such changes.

It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS
SET .. to add an extension to be shippable while connections are already
using the fdw. It'll be confusing if some clients are fast and some
others are really slow.

I think we should at least add invalidations for changing server
options. I think adding support for extension upgrades wouldn't hurt
either. Not particularly excited to ALTER EXTENSION ADD, but we could
add it easy enough.

 If we were trying to cache things across more than a connection lifespan,
 the answer might be different.

I think connection poolers defeat that logic more widely tha nwe
sometimes assume.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script other stuff

2015-07-22 Thread Andres Freund
On 2015-07-22 10:54:14 -0700, Josh Berkus wrote:
 Making command-line options order-dependant breaks a lot of system call
 libraries in various languages, as well as being easy to mess up.

What?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund and...@anarazel.de wrote:
 But I'm not going to complain too loudly if we don't do invalidation.

 Not doing invalidation seems silly to me.  But I don't want to bend
 Paul too far around the axle, either.

Just to be clear here: the case we are concerned about is, given that
we have determined that function X is or is not a member of one of the
extensions marked shippable for a given connection, is it likely that
that status will change (while the function continues to exist with
the same OID) during the lifespan of the connection?  If it did change,
how critical would it be for us to honor the new shippability criterion
on that connection?  My answer to both is not very.  So I'm not excited
about expending lots of code or cycles to check for such changes.

If we were trying to cache things across more than a connection lifespan,
the answer might be different.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Paul Ramsey
On July 22, 2015 at 12:15:14 PM, Andres Freund (and...@anarazel.de) wrote:
It doesn't seem that unlikely that somebody does an ALTER SERVER OPTIONS 
SET .. to add an extension to be shippable while connections are already 
using the fdw. It'll be confusing if some clients are fast and some 
others are really slow. 
This seems more likely than anyone mucking around with extension stuff (adding 
new functions (and working with FDW in production at the same time?)) or 
adding/dropping whole extensions (you’ll have more problems than a stale cache, 
whole columns will disappear if you DROP EXTENSION postgis CASCADE), and has 
the added benefit of not needing me to muck into core stuff for my silly 
feature.

I’ll have a look at doing invalidation for the case of changes to the FDW 
wrappers and servers.

P. 



[HACKERS] MultiXact member wraparound protections are now enabled

2015-07-22 Thread Peter Eisentraut
Why is this message logged by default in a fresh installation?  The
technicality of that message doesn't seem to match the kinds of messages
that we normally print at startup.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] security labels on databases are bad for dump restore

2015-07-22 Thread Fabrízio de Royes Mello
On Wed, Jul 22, 2015 at 4:42 PM, Adam Brightwell 
adam.brightw...@crunchydatasolutions.com wrote:

 [...]

 Also, I think this would potentially conflict with what Fabrízio is
 doing with CURRENT_DATABASE on COMMENT, though, I think it might be a
 preferable solution.

 https://commitfest.postgresql.org/5/229/


Unfortunately this code is a bit weird and will be better to move to the
next commitfest (I have no time to improve it yet), so we can join efforts
and implement all ideas and make the reviewers life easier with a more
consistency patch.

Seems reasonable?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Minor code improvements to create_foreignscan_plan/ExecInitForeignScan

2015-07-22 Thread Etsuro Fujita

On 2015/07/10 21:59, David Rowley wrote:

On 10 July 2015 at 21:40, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
mailto:fujita.ets...@lab.ntt.co.jp wrote:

To save cycles, I modified create_foreignscan_plan so that it detects
whether any system columns are requested if scanning a base relation.
Also, I revised other code there a little bit.

For ExecInitForeignScan, I simplified the code there to determine the
scan tuple type, whith seems to me complex beyound necessity.  Maybe
that might be nitpicking, though.


For the latter, I misunderstood that the latest foreign-join pushdown 
patch allows fdw_scan_tlist to be set to a targetlist even for simple 
foreign table scans.  Sorry for that, but I have a concern about that. 
I'd like to mention it in a new thread later.



I just glanced at this and noticed that the method for determining if
there's any system columns could be made a bit nicer.

/* Now, are any system columns requested from rel? */
scan_plan-fsSystemCol = false;
for (i = FirstLowInvalidHeapAttributeNumber + 1; i  0; i++)
{
if (bms_is_member(i - FirstLowInvalidHeapAttributeNumber, attrs_used))
{
scan_plan-fsSystemCol = true;
break;
}
}

I think could just be written as:
/* Now, are any system columns requested from rel? */
if (!bms_is_empty(attrs_used) 
bms_next_member(attrs_used, -1)  -FirstLowInvalidHeapAttributeNumber)
scan_plan-fsSystemCol = true;
else
scan_plan-fsSystemCol = false;

I know you didn't change this, but just thought I'd mention it while
there's an opportunity to fix it.


Good catch!

Will update the patch (and drop the ExecInitForeignScan part from the 
patch).


Sorry for the delay.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-22 Thread Andres Freund
On 2015-07-21 21:37:41 +0200, Martijn van Oosterhout wrote:
 On Tue, Jul 21, 2015 at 02:24:47PM -0400, Todd A. Cook wrote:
  Hi,
  
  This thread seemed to trail off without a resolution.  Was anything done?
 
 Not that I can tell.

Heikki and I had some in-person conversation about it at a conference,
but we didn't really find anything we both liked...

I was the original poster of this thread. We've
 worked around the issue by placing a CHECKPOINT command at the end of
 the migration script.  For us it's not a performance issue, more a
 correctness one, tables were empty when they shouldn't have been.

If it's just correctness, you could just use wal_level = archive.

 I'm hoping a fix will appear in the 9.5 release, since we're intending
 to release with that version.  A forced checkpoint every now and them
 probably won't be a serious problem though.

We're imo going to have to fix this in the back branches.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Etsuro Fujita
Hi,

The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
to a targetlist even for simple foreign table scans.  However, since I
think we assume that the test tuple of a foreign table for an EPQ
testing, whether it may be copied from the whole-row var or returned by
the RefetchForeignRow routine, has the rowtype declared for the foreign
table, ISTM that EPQ testing doesn't work properly in such a case since
that the targetlist and qual are adjusted to reference fdw_scan_tlist in
such a case.  Maybe I'm missing something though.

I don't understand custom scans/joins exactly, but I have a similar
concern for the simple-custom-scan case too.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-22 Thread Bjorn Munch
On 22/07 02.29, Noah Misch wrote:
  I ran this program on Solaris 9 U5 (September 2006) on Sparc and got:
 
 I appreciate your testing.  A few sources give December 2003 as the month for
 Solaris 9 Update 5; would you verify the vintage you used?

Sorry I was mis-parsing the /etc/release. 9/05 is the month. I should
know that. :-/ This was Solaris 9 U9 from September 2005.

From the output it looks like the bug was present here?

I did a quick search for bugs in libc with strxfrm in the title and
got a few hits but none that seemed to be this one.

- Bjorn


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Simon Riggs
On 21 July 2015 at 21:24, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  Yes, I suggest just a single column on pg_stat_activity called
 pct_complete
 
  trace_completion_interval = 5s (default)
 
  Every interval, we report the current % complete for any operation that
  supports it. We just show NULL if the current operation has not reported
  anything or never will.

 I am deeply skeptical about the usefulness of a progress-reporting
 system that can only report one number.  I think that, in many cases,
 it won't be possible to compute an accurate completion percentage, and
 even if it is, people may want more data than that for various reasons
 anyway.


The goal here was to have a common metric for all tasks. Multiple numbers
are OK for me, but not extended detail (yet!)

As I said later:

Simon Riggs si...@2ndquadrant.com wrote:
 I'm interested in seeing a report that relates to actual progress made.
 Predicted total work required is also interesting, but is much less
trustworthy figure.

I'm aware of the difficulties for VACUUM in particular and agree with your
scenario/details.

That does not deter me from wishing to see high level information, even it
varies or is inaccurate. The arrival time on my Sat Nav is still useful,
even if it changes because of traffic jams that develop while my journey is
in progress. If the value bothers me, I look at the detail. So both summary
and detail information are useful, but summary is more important even
though it is less accurate.


Instead of getting told we're X% done (according to some arcane
 formula), it's quite reasonable to think that people will want to get
 a bunch of values, e.g.:

 1. For what percentage of heap pages have we completed phase one (HOT
 prune + mark all visible if appropriate + freeze + remember dead
 tuples)?
 2. For what percentage of heap pages have we completed phase two (mark
 line pointers unused)?
 3. What percentage of maintenance_work_mem are we currently using to
 remember tuples?

 For a query, the information we want back is likely to be even more
 complicated; e.g. EXPLAIN output with row counts and perhaps timings
 to date for each plan node.  We can insist that all status reporting
 get boiled down to one number, but I suspect we would be better off
 asking ourselves how we could let commands return a richer set of
 data.


I agree that it is desirable to have a more detailed breakdown of what is
happening. As soon as we do that we hit the need for very action-specific
information reporting, which renders the topic much harder and much more
specific.

For me, the user workflow looks like these

Worried: Task X is taking ages? When is it expected to finish?
Ops: 13:50
sometime later, about 14:00
Worried: Task X is still running? But I thought its ETA was 13:50?
Ops: Now says 14:30
Worried: Is it stuck, or is it making progress?
Ops: Looks like its making progress
Worried: Can we have a look at it and find out what its doing?

Worried: When will Task Y finish?
Ops: Monday at 11am
Worried: Bad news! We should cancel it on Sunday evening.

The point is that nobody looks at the detailed info until we have looked at
the summary. So the summary of progress/completion time is important, even
if it is possibly wrong. The detail is also useful. I think we should have
both, but I'd like to see the summary info first, because it is the most
useful, best leading indicator of problems.

In terms of VACUUM specifically: VACUUM should be able to assess beforehand
whether it will scan the indexes, or it can just assume that it will need
to scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
task it has before it starts.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai
Fujita-san,

Sorry for my late response.

 The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
 to a targetlist even for simple foreign table scans.  However, since I
 think we assume that the test tuple of a foreign table for an EPQ
 testing, whether it may be copied from the whole-row var or returned by
 the RefetchForeignRow routine, has the rowtype declared for the foreign
 table, ISTM that EPQ testing doesn't work properly in such a case since
 that the targetlist and qual are adjusted to reference fdw_scan_tlist in
 such a case.  Maybe I'm missing something though.

Let me confirm step-by-step.
For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
with row-type compatible to the base foreign table. Then, this record
is stored in the es_epqTuple[] indexed by the base relation.

According to the previous discussion, I expect these tuples are re-checked
by built-in execution plan, but equivalent to the sub-plan entirely pushed
out to the remote side.
Do we see the same assumption?

If so, next step is enhancement of ExecScanFetch() to run the alternative
built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
In this case, expression nodes adjusted to fdw_scan_tlist never called,
so it should not lead any problems...?

 I don't understand custom scans/joins exactly, but I have a similar
 concern for the simple-custom-scan case too.

In case of custom scan/join, it fetches a record using heap_fetch()
identified by ctid, and saved to es_epqTuple[].
Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
of custom-join (scanrelid==0), it shall call the equivalent alternatives
if possible, or calls ExecProcNode() towards the underlying nodes then
re-construct its result according to the custom_scan_tlist definition.

It does not look to me problematic.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging problem in 9.4.3?

2015-07-22 Thread Simon Riggs
On 10 July 2015 at 00:06, Tom Lane t...@sss.pgh.pa.us wrote:

 Andres Freund and...@anarazel.de writes:
  On 2015-07-06 11:49:54 -0400, Tom Lane wrote:
  Rather than reverting cab9a0656c36739f, which would re-introduce a
  different performance problem, perhaps we could have COPY create a new
  relfilenode when it does this.  That should be safe if the table was
  previously empty.

  I'm not convinced that cab9a0656c36739f needs to survive in that
  form. To me only allowing one COPY to benefit from the wal_level =
  minimal optimization has a significantly higher cost than
  cab9a0656c36739f.

 What evidence have you got to base that value judgement on?

 cab9a0656c36739f was based on an actual user complaint, so we have good
 evidence that there are people out there who care about the cost of
 truncating a table many times in one transaction.  On the other hand,
 I know of no evidence that anyone's depending on multiple sequential
 COPYs, nor intermixed COPY and INSERT, to be fast.  The original argument
 for having this COPY optimization at all was to make restoring pg_dump
 scripts in a single transaction fast; and that use-case doesn't care
 about anything but a single COPY into a virgin table.


We have to backpatch this fix, so it must be both simple and effective.

Heikki's suggestions may be best, maybe not, but they don't seem
backpatchable.

Tom's suggestion looks good. So does Andres' suggestion. I have coded both.

And what reason is there to think that this would fix all the problems?


I don't think either suggested fix could be claimed to be a great solution,
since there is little principle here, only heuristic. Heikki's solution
would be the only safe way, but is not backpatchable.

Forcing SKIP_FSM to always extend has no negative side effects in other
code paths, AFAICS.

Patches attached. Martijn, please verify.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


fix_wal_logging_copy_truncate.v1.patch
Description: Binary data


fix_copy_zero_blocks.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro HORIGUCHI
 Sent: Wednesday, July 22, 2015 4:10 PM
 To: robertmh...@gmail.com
 Cc: hlinn...@iki.fi; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Asynchronous execution on FDW
 
 Hello, thank you for the comment.
 
 At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas robertmh...@gmail.com wrote
 in ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com
  On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
   At a quick glance, I think this has all the same problems as starting the
   execution at ExecInit phase. The correct way to do this is to kick off the
   queries in the first IterateForeignScan() call. You said that ExecProc
   phase does not fit - why not?
 
  What exactly are those problems?
 
  I can think of these:
 
  1. If the scan is parametrized, we probably can't do it for lack of
  knowledge of what they will be.  This seems easy; just don't do it in
  that case.
 
 We can put an early kick to foreign scans only for the first shot
 if we do it outside (before) ExecProc phase.
 
 Nestloop
 - SeqScan
 - Append
- Foreign (Index) Scan
- Foreign (Index) Scan
..
 
 This plan premises precise (even to some extent) estimate for
 remote query but async execution within ExecProc phase would be
 in effect for this case.
 
 
  2. It's possible that we're down inside some subtree of the plan that
  won't actually get executed.  This is trickier.
 
 As for current postgres_fdw, it is done simply abandoning queued
 result then close the cursor.
 
  Consider this:
 
  Append
  - Foreign Scan
  - Foreign Scan
  - Foreign Scan
  repeat 17 more times
 
  If we don't start each foreign scan until the first tuple is fetched,
  we will not get any benefit here, because we won't fetch the first
  tuple from query #2 until we finish reading the results of query #1.
  If the result of the Append node will be needed in its entirety, we
  really, really want to launch of those queries as early as possible.
  OTOH, if there's a Limit node with a small limit on top of the Append
  node, that could be quite wasteful.
 
 It's the nature of speculative execution, but the Limit will be
 pushed down onto every Foreign Scans near future.
 
  We could decide not to care: after all, if our limit is
  satisfied, we can just bang the remote connections shut, and if
  they wasted some CPU, well, tough luck for them.  But it would
  be nice to be smarter.  I'm not sure how, though.
 
 Appropriate fetch size will cap the harm and the case will be
 handled as I mentioned above as for postgres_fdw.

Horiguchi-san,

Let me ask an elemental question.

If we have ParallelAppend node that kicks a background worker process for
each underlying child node in parallel, does ForeignScan need to do something
special?

Expected waste of CPU or I/O is common problem to be solved, however, it does
not need to add a special case handling to ForeignScan, I think.
How about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment in setrefs.c

2015-07-22 Thread Etsuro Fujita

On 2015/07/21 1:38, Alvaro Herrera wrote:

Etsuro Fujita wrote:

I ran into a typo in a comment in setrefs.c.  Patch attached.


Fixed by Heikki in 7845db2aa.


Thank you for letting me know about that, Alvaro!  And thanks Heikki for 
picking this up!


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-07-22 Thread Alexander Korotkov
On Wed, Jul 22, 2015 at 8:48 AM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Mon, Jul 20, 2015 at 9:18 PM, Alexander Korotkov
 a.korot...@postgrespro.ru wrote:
  attached patch allows pg_rewind to work when target timeline was
 switched.
  Actually, this patch fixes TODO from pg_rewind comments.
 
/*
 * Trace the history backwards, until we hit the target timeline.
 *
 * TODO: This assumes that there are no timeline switches on the target
 * cluster after the fork.
 */
 
  This patch allows pg_rewind to handle data directory synchronization is
 much
  more general way. For instance, user can return promoted standby to old
  master.

 Yes. That would be useful.

  In this patch target timeline history is exposed as global variable.
 Index
  in target timeline history is used in function interfaces instead of
  specifying TLI directly. Thus, SimpleXLogPageRead() can easily start
 reading
  XLOGs from next timeline when current timeline ends.

 The implementation looks rather neat by having a first look at it, but
 the attached patch fails to compile:
 pg_rewind.c:488:4: error: use of undeclared identifier 'targetEntry'
 targetEntry = targetHistory[i];
 ^
 Nice to see as well that this has been added to the CF of September.


Uh, sorry. Fixed version is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg-rewind-target-switch-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] polymorphic types - enforce casting to most common type automatically

2015-07-22 Thread Heikki Linnakangas

On 07/11/2015 12:19 AM, Pavel Stehule wrote:

2015-07-10 18:43 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:


An example of what would presumably happen if we adopted this sort of rule
(I've not checked whether the patch as written does this, but it would
logically follow) is that appending a float to an integer array would
cause the whole array to be silently promoted to float, with attendant
possible loss of precision for existing array elements.


it is based on select_common_type() - so it is use only available implicit
casts.


Without patch:

postgres=# select pg_typeof(array_append('{1,2,3}'::int[], 1.2::float));
ERROR:  function array_append(integer[], double precision) does not exist

With patch:

postgres=# select pg_typeof(array_append('{1,2,3}'::int[], 1.2::float));
 pg_typeof

 double precision[]
(1 row)


Yeah, I agree with Tom that we don't want that change in behaviour. I'll 
mark this as rejected.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solaris testers wanted for strxfrm() behavior

2015-07-22 Thread Noah Misch
On Tue, Jul 21, 2015 at 04:45:21PM +0200, Bjorn Munch wrote:
 On 27/06 12.51, Noah Misch wrote:
  
  PostgreSQL 9.5 adds a strxfrm() call in bttext_abbrev_convert(), which does
  not account for the Solaris bug.  I wish to determine whether that bug is
  still relevant today.  If you have access to Solaris with the 
  is_IS.ISO8859-1
  locale installed (or root access to install it), please compile and run the
  attached test program on each Solaris version you have available.  Reply 
  here
  with the program's output.  I especially need a report from Solaris 10, but
  reports from older and newer versions are valuable.  Thanks.
 
 I ran this program on Solaris 9 U5 (September 2006) on Sparc and got:

I appreciate your testing.  A few sources give December 2003 as the month for
Solaris 9 Update 5; would you verify the vintage you used?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev 
i.kurbangal...@postgrespro.ru wrote in 55ae2cd9.4050...@postgrespro.ru
 On 07/21/2015 01:18 PM, Andres Freund wrote:
  On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:
  /*
 * Top-level transactions are identified by VirtualTransactionIDs
 * comprising
  diff --git a/src/include/storage/lwlock.h
  b/src/include/storage/lwlock.h
  index cff3b99..55b0687 100644
  --- a/src/include/storage/lwlock.h
  +++ b/src/include/storage/lwlock.h
  @@ -58,6 +58,9 @@ typedef struct LWLock
#ifdef LOCK_DEBUG
 struct PGPROC *owner;   /* last exlusive owner of the lock */
#endif
  +
  + /* LWLock group, initialized as -1, calculated in first acquire */
  +  int group;
} LWLock;
  I'd very much like to avoid increasing the size of struct LWLock. We
  have a lot of those and I'd still like to inline them with the buffer
  descriptors. Why do we need a separate group and can't reuse the
  tranche?  That might require creating a few more tranches, but ...?
 
  Andres
 Do you mean moving LWLocks defined by offsets and with dynamic sizes
 to separate tranches?

I think it is too much for the purpose. Only two new tranches and
maybe one or some new members (maybe representing the group) of
trances will do, I suppose.

 It sounds like an good option, but it will require refactoring of
 current tranches. In current implementation
 i tried not change current things.

Now one of the most controversial points of this patch is the
details of the implement, which largely affects performance drag,
maybe.


From the viewpoint of performance, I have some comments on the feature:

 - LWLockReportStat runs a linear search loop which I suppose
   should be avoided even if the loop count is rather small for
   LWLocks, as Fujii-san said upthread or anywhere.

 - Currently pg_stat_activity view is the only API, which would
   be a bit heavy for sampling use. It'd be appreciated to have a
   far lighter means to know the same informtion.

 Simple solution will be using uint8 for tranche (because we have only
 3 of them now,
 and it will take much time to get to 255) and uint8 for group. In this
 case size will not change.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Simon Riggs
On 21 July 2015 at 22:01, Josh Berkus j...@agliodbs.com wrote:


 My question for Hackers is: should this be Alpha2 or Beta 1?


Alpha2

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


[HACKERS] We need to support ForeignRecheck for late row locking, don't we?

2015-07-22 Thread Etsuro Fujita
While working on the issue Foreign join pushdown vs EvalPlanQual, I
happened to notice odd behaviors of late row locking in FDWs.  An
example will be shown below using the attached postgres_fdw patch, which
is basically the same as [1], but I changed its isolation level to READ
COMMITTED and modified it so that the output parameter updated of
RefetchForeignRow is updated accordingly.

[Create an environment]

mydatabase=# create table mytable (a int);
mydatabase=# insert into mytable values (1);
postgres=# create extension postgres_fdw;
postgres=# create server myserver foreign data wrapper postgres_fdw
options (dbname 'mydatabase');
postgres=# create user mapping for current_user server myserver;
postgres=# create foreign table ftable (a int) server myserver options
(table_name 'mytable');

[Run concurrent transactions that behave oddly]

mydatabase=# begin;
mydatabase=# update mytable set a = a + 1;
postgres=# select * from ftable where a = 1 for update;
mydatabase=# commit;
(After the commit, the following result will be shown in the terminal
for the postgres database.  Note that the result doesn't satify a = 1.)
 a
---
 2
(1 row)

I think the reason for that is because we don't check pushed-down quals
inside an EPQ testing even if what was fetched by RefetchForeignRow was
an updated version of the tuple rather than the same version previously
obtained.  So, to fix this, I'd like to propose that pushed-down quals
be checked in ForeignRecheck.

Comments welcome!

Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/16016.1431455...@sss.pgh.pa.us
*** a/contrib/postgres_fdw/connection.c
--- b/contrib/postgres_fdw/connection.c
***
*** 384,390  begin_remote_xact(ConnCacheEntry *entry)
  		if (IsolationIsSerializable())
  			sql = START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  		else
! 			sql = START TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  		do_sql_command(entry-conn, sql);
  		entry-xact_depth = 1;
  	}
--- 384,390 
  		if (IsolationIsSerializable())
  			sql = START TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  		else
! 			sql = START TRANSACTION ISOLATION LEVEL READ COMMITTED;
  		do_sql_command(entry-conn, sql);
  		entry-xact_depth = 1;
  	}
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 89,94  typedef struct PgFdwRelationInfo
--- 89,96 
   *
   * 1) SELECT statement text to be sent to the remote server
   * 2) Integer list of attribute numbers retrieved by the SELECT
+  * 3) SELECT statement text to be sent to the remote server
+  * 4) Integer list of attribute numbers retrieved by the SELECT
   *
   * These items are indexed with the enum FdwScanPrivateIndex, so an item
   * can be fetched with list_nth().  For example, to get the SELECT statement:
***
*** 99,105  enum FdwScanPrivateIndex
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs
  };
  
  /*
--- 101,111 
  	/* SQL statement to execute remotely (as a String node) */
  	FdwScanPrivateSelectSql,
  	/* Integer list of attribute numbers retrieved by the SELECT */
! 	FdwScanPrivateRetrievedAttrs,
! 	/* SQL statement to execute remotely (as a String node) */
! 	FdwScanPrivateSelectSql2,
! 	/* Integer list of attribute numbers retrieved by SELECT */
! 	FdwScanPrivateRetrievedAttrs2
  };
  
  /*
***
*** 187,192  typedef struct PgFdwModifyState
--- 193,222 
  } PgFdwModifyState;
  
  /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retrieved by SELECT */
+ 
+ 	/* info about parameters for prepared statement */
+ 	int			p_nums;			/* number of parameters to transmit */
+ 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
+ 
+ 	/* working memory context */
+ 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ } PgFdwFetchState;
+ 
+ /*
   * Workspace for analyzing a foreign table.
   */
  typedef struct PgFdwAnalyzeState
***
*** 277,282  static TupleTableSlot *postgresExecForeignDelete(EState *estate,
--- 307,318 
  static void postgresEndForeignModify(EState *estate,
  		 ResultRelInfo *resultRelInfo);
  static int	postgresIsForeignRelUpdatable(Relation rel);
+ static RowMarkType postgresGetForeignRowMarkType(RangeTblEntry *rte,
+ 			  LockClauseStrength strength);
+ static HeapTuple postgresRefetchForeignRow(EState *estate,
+ 		  

Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Etsuro Fujita
Hi KaiGai-san,

On 2015/07/22 16:44, Kouhei Kaigai wrote:
 The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
 to a targetlist even for simple foreign table scans.  However, since I
 think we assume that the test tuple of a foreign table for an EPQ
 testing, whether it may be copied from the whole-row var or returned by
 the RefetchForeignRow routine, has the rowtype declared for the foreign
 table, ISTM that EPQ testing doesn't work properly in such a case since
 that the targetlist and qual are adjusted to reference fdw_scan_tlist in
 such a case.  Maybe I'm missing something though.

 Let me confirm step-by-step.
 For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
 with row-type compatible to the base foreign table. Then, this record
 is stored in the es_epqTuple[] indexed by the base relation.
 
 According to the previous discussion, I expect these tuples are re-checked
 by built-in execution plan, but equivalent to the sub-plan entirely pushed
 out to the remote side.
 Do we see the same assumption?

No, what I'm concerned about is the case when scanrelid  0.

 If so, next step is enhancement of ExecScanFetch() to run the alternative
 built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
 In this case, expression nodes adjusted to fdw_scan_tlist never called,
 so it should not lead any problems...?

When scanrelid = 0, I think we should run the alternative plans in
ExecScanFetch or somewhere, as you mentioned.

 I don't understand custom scans/joins exactly, but I have a similar
 concern for the simple-custom-scan case too.

 In case of custom scan/join, it fetches a record using heap_fetch()
 identified by ctid, and saved to es_epqTuple[].
 Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
 of custom-join (scanrelid==0), it shall call the equivalent alternatives
 if possible, or calls ExecProcNode() towards the underlying nodes then
 re-construct its result according to the custom_scan_tlist definition.
 
 It does not look to me problematic.

Sorry, I don't understand what you mean.  Maybe I have to learn more
about custom scans/joins, but thanks for the explanation!

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Thakur, Sameer
Hello,
I think it'd be better to combine both numbers into one report:
It'd also be good to standardize on where the * 100 is happening.
Done
can be replaced by
(itemptr-ipblkid != vacrelstats-last_scanned_page)
Get compiler error : invalid operands to binary != (have ‘BlockIdData’ and 
‘BlockIdData’)
vacrelstats-current_index_scanned_page_count++;
Done
Please find v3 attached.

I am struggling to create  maintenance work memory exhaustion.  Did the 
following
maintenance_work_mem=1MB.
Inserted 10 million records in tbl1 with 3 indexes. Deleted 5 million and 
vacuumed. So far no error. I could keep bumping up the records to say 100 
million and try to get this error.
This seems a tedious manner to simulate maintenance work memory exhaustion. Is 
there a better way?
To insert I am using COPY (from a csv which has 10 million records) and 
building indexes after insert is complete.
Thank you
Sameer

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.


IndexScanProgress_v3.patch
Description: IndexScanProgress_v3.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Alvaro Herrera
Thakur, Sameer wrote:
 Hello,
 I think it'd be better to combine both numbers into one report:
 It'd also be good to standardize on where the * 100 is happening.
 Done
 can be replaced by
 (itemptr-ipblkid != vacrelstats-last_scanned_page)
 Get compiler error : invalid operands to binary != (have ‘BlockIdData’ and 
 ‘BlockIdData’)
 vacrelstats-current_index_scanned_page_count++;
 Done
 Please find v3 attached.
 
 I am struggling to create  maintenance work memory exhaustion.  Did the 
 following
 maintenance_work_mem=1MB.
 Inserted 10 million records in tbl1 with 3 indexes. Deleted 5 million and 
 vacuumed. So far no error. I could keep bumping up the records to say 100 
 million and try to get this error.
 This seems a tedious manner to simulate maintenance work memory exhaustion. 
 Is there a better way?
 To insert I am using COPY (from a csv which has 10 million records) and 
 building indexes after insert is complete.

I don't think there's any maintenance work exhaustion that results in an
error.  The system is designed to use all the memory it is allowed to,
and to have other strategies when it's not sufficient to do the whole
sort.

Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
disk happens?  Obviously, things become slower, so maybe you need to
consider it for progress reporting purposes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Amit Langote
On Wed, Jul 22, 2015 at 8:19 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

 Not sure what Jim meant.  Maybe he meant to be aware of when spilling to
 disk happens?  Obviously, things become slower, so maybe you need to
 consider it for progress reporting purposes.


Perhaps the m_w_m determines how many dead tuples lazy_scan_heap() can
keep track of before doing a lazy_vacuum_indexes() +
lazy_vacuum_heap() round. Smaller the m_w_m, more the number of index
scans, slower the progress?

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai
 -Original Message-
 From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp]
 Sent: Wednesday, July 22, 2015 7:05 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ 
 testing,
 doesn't it?
 
 Hi KaiGai-san,
 
 On 2015/07/22 16:44, Kouhei Kaigai wrote:
  The latest foreign-join pushdown patch allows fdw_scan_tlist to be set
  to a targetlist even for simple foreign table scans.  However, since I
  think we assume that the test tuple of a foreign table for an EPQ
  testing, whether it may be copied from the whole-row var or returned by
  the RefetchForeignRow routine, has the rowtype declared for the foreign
  table, ISTM that EPQ testing doesn't work properly in such a case since
  that the targetlist and qual are adjusted to reference fdw_scan_tlist in
  such a case.  Maybe I'm missing something though.
 
  Let me confirm step-by-step.
  For EPQ testing, whole-row-reference or RefetchForeignRow pulls a record
  with row-type compatible to the base foreign table. Then, this record
  is stored in the es_epqTuple[] indexed by the base relation.
 
  According to the previous discussion, I expect these tuples are re-checked
  by built-in execution plan, but equivalent to the sub-plan entirely pushed
  out to the remote side.
  Do we see the same assumption?
 
 No, what I'm concerned about is the case when scanrelid  0.

Hmm. if scanrelid  0, then fdw_scan_tlist should be NIL.
I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
the GetForeignPlan() in createplan.c.

I'm curious why you tried to put valid fdw_scan_tlist for scanrelid  0.
It's unusual.

  If so, next step is enhancement of ExecScanFetch() to run the alternative
  built-in plans towards each es_epqTuple[] records, if given scanrelid==0.
  In this case, expression nodes adjusted to fdw_scan_tlist never called,
  so it should not lead any problems...?
 
 When scanrelid = 0, I think we should run the alternative plans in
 ExecScanFetch or somewhere, as you mentioned.

OK,

  I don't understand custom scans/joins exactly, but I have a similar
  concern for the simple-custom-scan case too.
 
  In case of custom scan/join, it fetches a record using heap_fetch()
  identified by ctid, and saved to es_epqTuple[].
  Then, EvalPlanQual() walks down the plan-tree. Once it appears a node
  of custom-join (scanrelid==0), it shall call the equivalent alternatives
  if possible, or calls ExecProcNode() towards the underlying nodes then
  re-construct its result according to the custom_scan_tlist definition.
 
  It does not look to me problematic.
 
 Sorry, I don't understand what you mean.  Maybe I have to learn more
 about custom scans/joins, but thanks for the explanation!

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev

Hi!

Following discussion at https://commitfest.postgresql.org/5/190/ patch, I found 
(at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char 
array as string:


#include stdio.h

#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \

((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})

#define ERRCODE_WARNING_DEPRECATED_FEATURE MAKE_SQLSTATE('0','1','P','0','1')

int
main(int argn, char* argv[])
{
char*x = ERRCODE_WARNING_DEPRECATED_FEATURE;

printf(%s\n, x);

return 0;
}

That works for clang ang gcc48 with flags -Wall -Wextra -ansi (or -std=c90) 
without warnings, but doesn't work with -pedantic. Is that enough to to work 
with other compilers?


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing vacuum truncation scans

2015-07-22 Thread Robert Haas
On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 Attached is a patch that implements the vm scan for truncation.  It
 introduces a variable to hold the last blkno which was skipped during the
 forward portion.  Any blocks after both this blkno and after the last
 inspected nonempty page (which the code is already tracking) must have been
 observed to be empty by the current vacuum.  Any other process rendering the
 page nonempty are required to clear the vm bit, and no other process can set
 the bit again during the vacuum's lifetime.  So if the bit is still set, the
 page is still empty without needing to inspect it.

Urgh.  So if we do this, that forever precludes having HOT pruning set
the all-visible bit.  At the least we'd better document that carefully
so that nobody breaks it later.  But I wonder if there isn't some
better approach, because I would certainly rather that we didn't
foreclose the possibility of doing something like that in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 Following discussion at https://commitfest.postgresql.org/5/190/ patch, I 
 found 
 (at seems to me) a way to stringify MAKE_SQLSTATE(), the idea is to use char 
 array as string:

 #define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \
 ((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})

I'm pretty sure that's a gcc-ism, not standard C.

As I mentioned in the other thread, if we wanted to provide string
versions of the ERRCODE macros, the safest way to do it would be
to create a separate header file with defines like

#define ERRCODE_WARNING_DEPRECATED_FEATURE 01P01

Now that we generate errcodes.h from errcodes.txt anyway, it would
certainly be a trivial extension of the build infrastructure to make
a second header like this.

The real question is do we want to.  What's your use-case for this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-07-22 Thread Robert Haas
On Mon, Jul 20, 2015 at 8:20 AM, Michael Paquier
michael.paqu...@gmail.com wrote:
 After a certain amount of time without anything happening, I would
 recommend just adding it to a CF to have it get attention. I imagine
 that it is one of the reasons why there is as well a category Bug
 Fix.

+1.  I would recommend adding it to the CF *immediately* to have it
get attention.   The CF app is basically our patch tracker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Etsuro Fujita

On 2015/07/22 12:40, Noah Misch wrote:

I vote for alpha2.  Comparing the Open Issues and resolved after 9.5alpha1
sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items, we've
not transitioned to a qualitatively different level of API stability.


+1 for alpha2.

I'd like to propose a patch for the issue Foreign join pushdown vs 
EvalPlanQual, but I need more time ...


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kyotaro HORIGUCHI
Hello, thank you for the comment.

At Fri, 17 Jul 2015 14:34:53 -0400, Robert Haas robertmh...@gmail.com wrote 
in ca+tgmoaijk1svzw_gkfu+zssxcijkfelqu2aomvuphpsfw4...@mail.gmail.com
 On Fri, Jul 3, 2015 at 4:41 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
  At a quick glance, I think this has all the same problems as starting the
  execution at ExecInit phase. The correct way to do this is to kick off the
  queries in the first IterateForeignScan() call. You said that ExecProc
  phase does not fit - why not?
 
 What exactly are those problems?
 
 I can think of these:
 
 1. If the scan is parametrized, we probably can't do it for lack of
 knowledge of what they will be.  This seems easy; just don't do it in
 that case.

We can put an early kick to foreign scans only for the first shot
if we do it outside (before) ExecProc phase.

Nestloop
- SeqScan
- Append
   - Foreign (Index) Scan
   - Foreign (Index) Scan
   ..

This plan premises precise (even to some extent) estimate for
remote query but async execution within ExecProc phase would be
in effect for this case.


 2. It's possible that we're down inside some subtree of the plan that
 won't actually get executed.  This is trickier.

As for current postgres_fdw, it is done simply abandoning queued
result then close the cursor.

 Consider this:
 
 Append
 - Foreign Scan
 - Foreign Scan
 - Foreign Scan
 repeat 17 more times
 
 If we don't start each foreign scan until the first tuple is fetched,
 we will not get any benefit here, because we won't fetch the first
 tuple from query #2 until we finish reading the results of query #1.
 If the result of the Append node will be needed in its entirety, we
 really, really want to launch of those queries as early as possible.
 OTOH, if there's a Limit node with a small limit on top of the Append
 node, that could be quite wasteful.

It's the nature of speculative execution, but the Limit will be
pushed down onto every Foreign Scans near future.

 We could decide not to care: after all, if our limit is
 satisfied, we can just bang the remote connections shut, and if
 they wasted some CPU, well, tough luck for them.  But it would
 be nice to be smarter.  I'm not sure how, though.

Appropriate fetch size will cap the harm and the case will be
handled as I mentioned above as for postgres_fdw.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 For me, the user workflow looks like these

 Worried: Task X is taking ages? When is it expected to finish?
 Ops: 13:50
 sometime later, about 14:00
 Worried: Task X is still running? But I thought its ETA was 13:50?
 Ops: Now says 14:30
 Worried: Is it stuck, or is it making progress?
 Ops: Looks like its making progress
 Worried: Can we have a look at it and find out what its doing?

How does Ops know that it is making progress?  Just because the
completion percentage is changing?

 In terms of VACUUM specifically: VACUUM should be able to assess beforehand
 whether it will scan the indexes, or it can just assume that it will need to
 scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
 task it has before it starts.

Well, we can assume that it will scan the indexes exactly once, but
the actual number may be more or less; and the cost of rescanning the
heap in phase 2 is also hard to estimate.

Maybe I'm worrying over nothing, but I have a feeling that if we try
to do what you're proposing here, we're gonna end up with this:

https://xkcd.com/612/

Most of the progress estimators I have seen over the ~30 years that
I've been playing with computers have been unreliable, and many of
those have been unreliable to the point of being annoying.  I think
that's likely to happen with what you are proposing too, though of
course like all predictions of the future it could turn out to be
wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Simon Riggs
On 22 July 2015 at 13:00, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs si...@2ndquadrant.com
 wrote:
  For me, the user workflow looks like these
 
  Worried: Task X is taking ages? When is it expected to finish?
  Ops: 13:50
  sometime later, about 14:00
  Worried: Task X is still running? But I thought its ETA was 13:50?
  Ops: Now says 14:30
  Worried: Is it stuck, or is it making progress?
  Ops: Looks like its making progress
  Worried: Can we have a look at it and find out what its doing?

 How does Ops know that it is making progress?  Just because the
 completion percentage is changing?


You could, but that is not the way I suggested.

We need
* Some measure of actual progress (the definition of which may vary from
action to action, e.g. blocks scanned)
* Some estimate of the total work required
* An estimate of the estimated time of completion - I liked your view that
this prediction may be costly to request


  In terms of VACUUM specifically: VACUUM should be able to assess
 beforehand
  whether it will scan the indexes, or it can just assume that it will
 need to
  scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
  task it has before it starts.

 Well, we can assume that it will scan the indexes exactly once, but
 the actual number may be more or less; and the cost of rescanning the
 heap in phase 2 is also hard to estimate.

 Maybe I'm worrying over nothing, but I have a feeling that if we try
 to do what you're proposing here, we're gonna end up with this:

 https://xkcd.com/612/

 Most of the progress estimators I have seen over the ~30 years that
 I've been playing with computers have been unreliable, and many of
 those have been unreliable to the point of being annoying.  I think
 that's likely to happen with what you are proposing too, though of
 course like all predictions of the future it could turn out to be
 wrong.


Almost like an Optimizer then. Important, often annoyingly wrong, needs
more work.

I'm not proposing this feature, I'm merely asking for it to be defined in a
way that makes it work for more than just VACUUM. Once we have a way of
reporting useful information, other processes can be made to follow that
mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
important, even if we never get such information for user queries. But I
hope we do.

I won't get in the way of your search for detailed information in more
complex forms. Both things are needed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] could not truncate directory pg_subtrans: apparent wraparound

2015-07-22 Thread Heikki Linnakangas

On 06/13/2015 05:02 AM, Thomas Munro wrote:

Since the multixact equivalent of this problem[1] fell through the
cracks on the multixact mega-thread, here is an updated patch that
addresses this problem for both pg_subtrans and pg_multixact/offsets
using the same approach: always step back one multixact/xid (rather
than doing so only if oldest == next, which seemed like an unnecessary
complication, and a bit futile since the result of such a test is only
an instantaneous snapshot).  I've added this to the commitfest[2].  I
am also attaching a new set of repro scripts including a pair to test
the case where next multixact/xid == first valid ID (the scripts with
'wraparound' in the name, which use dirty pg_resetxlog tricks to get
into that situation).  In my previous patch I naively subtracted one,
which didn't work for those (even rarer!) cases.  The new patch steps
over the special ID values.


Thanks, great repro scripts!


I also took a look at the pg_clog and pg_commit_ts truncation
functions.  You could argue that they have the same problem in theory
(they pass a page number derived from the oldest xid to
SimpleLruTruncate, and maybe there is a way for that to be an xid that
hasn't been issued yet), but in practice I don't think it's a
reachable condition.  They use the frozen xid that is updated by
vacuuming, but vacuum itself advances the next xid counter in the
process.  Is there a path though the vacuum code that ever exposes
frozen xid == next xid?  In contrast, for pg_subtrans we use
GetOldestXmin(), which is equal to the next xid if there are no
running transactions, and for pg_multixact we use the oldest
multixact, which can be equal to the next multixact ID after a
wraparound vacuum because vacuum itself doesn't always consume
multixacts.


Yeah, I think pg_clog and pg_commit_ts are safe, for now. It's rather 
accidental, though.


There is one more instance of this bug though: in 9.2 and below, 
pg_multixact members are truncated similarly. Attached are corresponding 
repro-scripts for that case.


Committed with the additional fix for that.

- Heikki



repro-bogus-multixact-members-error.sh
Description: application/shellscript


repro-bogus-multixact-members-error-wraparound.sh
Description: application/shellscript

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 22 Jul 2015 17:50:35 +0300, Ildus Kurbangaliev 
i.kurbangal...@postgrespro.ru wrote in 55afadbb.9090...@postgrespro.ru
 On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:
  Hello,
 
  At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev
  i.kurbangal...@postgrespro.ru wrote in
  55ae2cd9.4050...@postgrespro.ru
  On 07/21/2015 01:18 PM, Andres Freund wrote:
  I'd very much like to avoid increasing the size of struct LWLock. We
  have a lot of those and I'd still like to inline them with the buffer
  descriptors. Why do we need a separate group and can't reuse the
  tranche?  That might require creating a few more tranches, but ...?
 
  Andres
  Do you mean moving LWLocks defined by offsets and with dynamic sizes
  to separate tranches?
  I think it is too much for the purpose. Only two new tranches and
  maybe one or some new members (maybe representing the group) of
  trances will do, I suppose.
 
 Can you explain why only two new tranches?
 There is 13 types of lwlocks (besides individual), and we need
 separate them somehow.

Sorry, I minunderstood about tranche.

Currently tranches other than main are used by WALInsertLocks and
ReplicationOrigins. Other dynamic locks are defined as parts of
main LWLokcs since they have the same shape with individual
lwlocks. Leaving the individual locks, every lock groups may have
their own tranche if we allow lwlocks to have own tranche even if
it is in MainLWLockArray. New 13-16 trances will be added but no
need to register their name in LWLOCK_GROUPS[]. After all, this
array would be renamed such as IndividualLWLockNames and the
name-lookup can be done by the follwoing simple steps.

- If the the lock is in main tranche, lookup the individual name
  array for its name.

- Elsewise, use the name of its tranche.

Does this make sense?

  It sounds like an good option, but it will require refactoring of
  current tranches. In current implementation
  i tried not change current things.
  Now one of the most controversial points of this patch is the
  details of the implement, which largely affects performance drag,
  maybe.
 
 
   From the viewpoint of performance, I have some comments on the
   feature:
 
- LWLockReportStat runs a linear search loop which I suppose
  should be avoided even if the loop count is rather small for
  LWLocks, as Fujii-san said upthread or anywhere.
 It runs only one time in first acquirement. In previous patch
 it was much heavier. Anyway this code will be removed if we
 split main tranche to smaller tranches.

Ah, this should be the same with what I wrote above, isn't it?

- Currently pg_stat_activity view is the only API, which would
  be a bit heavy for sampling use. It'd be appreciated to have a
  far lighter means to know the same informtion.
 Yes, pg_stat_activity is just information about current wait,
 and it's too heavy for sampling.  Main goal of this patch was
 creating base structures that can be used later.

Ok, I see it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Gsets: ROW expression semantic broken between 9.4 and 9.5

2015-07-22 Thread Andres Freund
Hi,

On 2015-07-22 23:53:26 +0530, Jeevan Chalke wrote:
 It looks like we have broken the ROW expression without explicit
 ROW keyword in GROUP BY.

That was intentional, and is actually standards required
behaviour. GROUP BY (a, b) is the same as GROUP BY a,b. It'd otherwise
be pretty confusing because parens in GROUPING SETS would mean something
different than in GROUP BY.

We should probably add a release note entry about it...

Thanks for testing gsets!

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Asynchronous execution on FDW

2015-07-22 Thread Kyotaro HORIGUCHI
Hello,

 Let me ask an elemental question.
 
 If we have ParallelAppend node that kicks a background worker process for
 each underlying child node in parallel, does ForeignScan need to do something
 special?

Although I don't see the point of the background worker in your
story but at least for ParalleMergeAppend, it would frequently
discontinues to scan by upper Limit so one more state, say setup
- which mans a worker is allocated but not started- would be
useful and the driver node might need to manage the number of
async execution. Or the driven nodes might do so inversely.

As for ForeignScan, it is merely an API for FDW and does nothing
substantial so it would have nothing special to do. As for
postgres_fdw, current patch restricts one execution per one
foreign server at once by itself. We would have to provide
another execution management if we want to have two or more
simultaneous scans per one foreign server at once.

Sorry for the focusless discussion but does this answer some of
your question?

 Expected waste of CPU or I/O is common problem to be solved, however, it does
 not need to add a special case handling to ForeignScan, I think.
 How about your opinion?

I agree with you that ForeignScan as the wrapper for FDWs don't
need anything special for the case. I suppose for now that
avoiding the penalty from abandoning too many speculatively
executed scans (or other works on bg worker like sorts) would be
a business of the upper node of FDWs, or somewhere else.

However, I haven't dismissed the possibility that some common
works related to resource management could be integrated into
executor (or even into planner), but I see none for now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread dinesh kumar
Hi All,

On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jim Nasby jim.na...@bluetreble.com writes:
  On 7/13/15 3:39 PM, dinesh kumar wrote:
  Ah. It's' my bad interpretation. Let me work on it, and will send a new
  patch as a wrapper sql function for ereport.

  You might want to present a plan for that; it's not as trivial as it
  sounds due to how ereport works. In particular, I'd want to see (at
  minimum) the same functionality that plpgsql's RAISE command now
  provides (errdetail, errhint, etc).


Jim,

For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In our
to do item description, I found this wrapper needs to return Anyelement.
But, I believe, return VOID is enough for this function. Let me know if I
erred here.

In design phase,

1. I took a CustomDataType with the elevel code, elevel text

2. Populated this CDT with all existing pre-processors, except {FATAL,
PANIC}. Since, we don't expose these to client.

3. By matching the user elevel text, processing the report log function.

Find the attached patch with implementation.



 The real question is why the existing functionality in plpgsql isn't
 sufficient.  Somebody who wants a log from SQL function can easily
 write a simple plpgsql function that does exactly what they want,
 with no more or fewer bells-n-whistles than they need.  If we try
 to create a SQL function that does all that, it's likely to be a mess
 to use, even with named arguments.

 I'm not necessarily against the basic idea, but I think inventing
 something that actually offers an increment in usability compared
 to the existing alternative is going to be harder than it sounds.


Tom,

I agree with your inputs. We can build  pl/pgsql function as alternative
for this.

My initial proposal, and implementation was, logging messages to log file
irrespectively of our log settings. I was not sure we can do this with some
pl/perlu. And then, I started working on our to do item,
ereport, wrapper callable from SQL, and found it can be useful to have a
direct function call with required log level.

Thanks.

Regards,
Dinesh
manojadinesh.blogspot.com

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 76f77cb..43dbaec 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17850,6 +17850,15 @@ postgres=# SELECT * FROM 
pg_xlogfile_name_offset(pg_stop_backup());
 Return information about a file.
/entry
   /row
+  row
+   entry
+literalfunctionpg_report_log(parameterelevel/typetext/, 
parametermessage/ typeanyelement/type, 
parameterishidestmt/typeboolean/)/function/literal
+   /entry
+   entrytypevoid/type/entry
+   entry
+Write message into log file as per log level.
+   /entry
+  /row
  /tbody
 /tgroup
/table
@@ -17918,6 +17927,24 @@ SELECT (pg_stat_file('filename')).modification;
 /programlisting
/para
 
+   indexterm
+primarypg_report_log/primary
+   /indexterm
+   para
+functionpg_report_log/ is useful to write custom messages
+into current log destination and returns typevoid/type.
+This function don't support the PANIC, FATAL log levels due to their 
unique internal DB usage, which may cause the database instability. Using 
parameterishidestmt/, function can write or ignore the current SQL 
statement into the log file.
+Typical usages include:
+programlisting
+postgres=# SELECT pg_report_log('NOTICE', 'Custom Message', true);
+NOTICE:  Custom Message
+ pg_report_log 
+---
+ 
+(1 row)
+/programlisting
+   /para
+
   /sect2
 
   sect2 id=functions-advisory-locks
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
index c0495d9..1c7c263 100644
--- a/src/backend/utils/adt/misc.c
+++ b/src/backend/utils/adt/misc.c
@@ -76,6 +76,91 @@ current_query(PG_FUNCTION_ARGS)
 }
 
 /*
+ * pg_report_log()
+ *
+ * Printing custom log messages in log file.
+ */
+
+typedef struct
+{
+   int ecode;
+   char *level;
+} errorlevels;
+
+Datum
+pg_report_log(PG_FUNCTION_ARGS)
+{
+
+   /*
+* Do not add FATAL, PANIC log levels to the below list.
+*/
+   errorlevels elevels[]={
+   {DEBUG5, DEBUG5}, {DEBUG4, DEBUG4}, {DEBUG3, 
DEBUG3},
+   {DEBUG2, DEBUG2}, {DEBUG1, DEBUG1}, {LOG, LOG},
+   {COMMERROR, COMMERROR}, {INFO, INFO}, {NOTICE, 
NOTICE},
+   {WARNING, WARNING}, {ERROR, ERROR}
+   /*
+* Adding PGERROR to elevels if WIN32
+*/
+   #ifdef WIN32
+   ,{PGERROR, PGERROR}
+   #endif
+   };
+
+   int itr = 0;
+   bool ishidestmt = false;
+   int noelevel = (int) sizeof(elevels)/sizeof(*elevels);
+   char *level;
+
+   level = text_to_cstring(PG_GETARG_TEXT_P(0));
+   ishidestmt = PG_GETARG_BOOL(2);
+
+  

Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Kouhei Kaigai
Hi Yang,

 I've performed some tests on pg_strom according to the wiki. But it seems that
 queries run slower on GPU than CPU. Can someone shed a light on what's wrong
 with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
 Ubuntu 15.04. And the results was

  :
 ,
 | LOG:  CUDA Runtime version: 7.0.0
 | LOG:  NVIDIA driver version: 346.59
 | LOG:  GPU0 Quadro K620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 
 2047MB
 (128bits, 900KHz), capability 5.0
 | LOG:  NVRTC - CUDA Runtime Compilation vertion 7.0
 | LOG:  redirecting log output to logging collector process
 | HINT:  Future log output will appear in directory pg_log.
 `

It looks to me your GPU processor has poor memory access capability,
thus, preprocess of aggregation (that heavy uses atomic operations
towards the global memory) consumes majority of processing time.
Please try the query with:
  SET pg_strom.enable_gpupreagg = off;
GpuJoin uses less atomic operation, so it has an advantage.

GPU's two major advantage are massive amount of cores and higher
memory bandwidth than GPU, so, fundamentally, I'd like to recommend
to use better GPU board...
According to NVIDIA, K620 uses DDR3 DRAM, thus here is no advantage
on memory access speed. How about GTX750Ti (no external power is
needed like K620) or AWS's g2.2xlarge instance type?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of YANG
 Sent: Thursday, July 23, 2015 12:16 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Queries runs slow on GPU with PG-Strom
 
 
 Hello,
 
 I've performed some tests on pg_strom according to the wiki. But it seems that
 queries run slower on GPU than CPU. Can someone shed a light on what's wrong
 with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
 Ubuntu 15.04. And the results was
 
 with pg_strom
 =
 
 explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2)  10;
 
 
 QUERY PLAN
 
 
 ---
  Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual
 time=18792.236..18792.236 rows=1 loops=1)
-  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108)
 (actual time=4249.656..18792.074 rows=77 loops=1)
  Bulkload: On (density: 100.00%)
  Reduction: NoGroup
  Device Filter: (sqrtx - '25.6'::double precision) ^ '2'::double
 precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) 
 '10'::double precision)
  -  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32
 rows=1060 width=0) (actual time=139.399..18499.246 rows=1000 loops=1)
  Planning time: 0.262 ms
  Execution time: 19268.650 ms
 (8 rows)
 
 
 
 explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
 
 QUERY
 PLAN
 
 --
  HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual
 time=11311.568..11311.572 rows=26 loops=1)
Group Key: t0.cat
-  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52)
 (actual time=3304.727..11310.021 rows=2307 loops=1)
  Bulkload: On (density: 100.00%)
  Reduction: Local + Global
  -  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=1060
 width=12) (actual time=923.417..2661.113 rows=1000 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid =
 aid), nrows_ratio: 1.
-  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60
 rows=1060 width=16) (actual time=6.980..871.431 rows=1000 loops=1)
-  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=4)
 (actual time=0.204..7.309 rows=4 loops=1)
  Planning time: 47.834 ms
  Execution time: 11355.103 ms
 (12 rows)
 
 
 without pg_strom
 
 
 test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 +
 (y-12.8)^2)  10;
 
 QUERY PLAN
 
 
 
  Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual
 time=3880.379..3880.379 rows=1 loops=1)
-  Seq Scan on t0  (cost=0.00..417859.65 rows=353 width=0) (actual
 time=0.075..3859.200 rows=314063 loops=1)
  Filter: (sqrtx - '25.6'::double precision) ^ '2'::double 
 precision)
 + ((y - '12.8'::double 

[HACKERS] Autonomous Transaction is back

2015-07-22 Thread Rajeev rastogi
After few failed attempt to propose Autonomous transaction earlier. I along 
with Simon Riggs would like to propose again but completely different in 
approach.

We also had discussion about this feature in last PGCon2015 Unconference Day, 
those who missed this discussion, please refer

https://wiki.postgresql.org/wiki/AutonomousTransactionsUnconference2015


Before jumping into the design and code proposal for this feature, me along 
with Simon Riggs wanted to propose its behavior and usage to keep everyone in 
the same boat.
So we have summarized the behavior and usage of the Autonomous Transaction 
based on the discussion with community members in last PGCon2015 Unconference 
Day:

Behavior of Autonomous Transaction:
1.The autonomous transaction treated as a completely different 
transaction from the master transaction.
2.It should be allowed to deadlock with master transaction. We need 
to work-out a solution to avoid deadlock.
3.It can support multiple level of nesting based on the 
configuration (may be max as 70).
4.Outer (i.e. main or upper autonomous) transaction to be suspended 
while the inner autonomous transaction is running.
5.Outer transaction should not see data of inner till inner is 
committed (serializable upper transaction should not see even after inner 
transaction commit).

How to Use Autonomous Transaction:
1. We can issue explicit command to start an Autonomous transaction as below:
BEGIN AUTONOMOUS TRANSACTION  (Don't worry about keywords at 
this point.)
Do you work.
COMMIT/ROLLBACK   (Will commit/rollback the autonomous 
transaction and will return to main transaction or upper autonomous 
transaction).

2. The above commands can be issued either inside the procedure to make few 
statements of procedure inside autonomous transaction or even in stand-alone 
query execution.
3. We can make whole procedure itself as autonomous, which will be similar to 
start autonomous transaction in the beginning of the procedure and 
commit/rollback at the end of the procedure.

There was another discussion in Unconference Day to decide whether to implement 
COMMIT/ROLLBACK inside the procedure or autonomous transaction. So our opinion 
about this is that
COMMIT/ROLLBACK inside procedure will be somewhat different 
from Autonomous Transaction as incase of first, once we commit inside the 
procedure,
it commits everything done before call of procedure. This is the behavior of 
Oracle.
So in this case user required to be very careful to not do any operation before 
call of procedure, which is not yet intended to be committed inside procedure.

So we can prefer to implement Autonomous Transaction, which will not only be 
compatible with Oracle but also gives really strong required features.

I have not put the use-cases here as already we agree about its strong 
use-cases.

Requesting for everyone's opinion regarding this based on which we can proceed 
to enhance/tune/re-write our design.

Thanks and Regards,
Kumar Rajeev Rastogi



Re: [HACKERS] A little RLS oversight?

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 5:17 PM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 There's another issue here though -- just adding filters to the
 pg_stats view won't prevent a determined user from seeing the contents
 of the underlying table. For that, the view needs to have the
 security_barrier property. Arguably the fact that pg_stats isn't a
 security barrier view is a long-standing information leak allowing
 users to see values from tables for which they don't have any
 permissions. Is anyone concerned about that?

Hrm.  There's no help for that in the back-branches, but we should
probably change it in 9.5+.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Reload SSL certificates on SIGHUP

2015-07-22 Thread Michael Paquier
On Wed, Jul 22, 2015 at 9:52 AM, Andreas Karlsson andr...@proxel.se wrote:
 On 07/02/2015 06:13 PM, Peter Eisentraut wrote:

 I think this would be a useful feature, and the implementation looks
 sound.  But I don't like how the reload is organized.  Reinitializing
 the context in the sighup handler, aside from questions about how much
 work one should do in a signal handler, would cause SSL reinitialization
 for unrelated reloads.  We have the GUC assign hook mechanism for
 handling this sort of thing.  The trick would be that when multiple
 SSL-related settings change, you only want to do one reinitialization.
 You could either have the different assign hooks communicate with each
 other somehow, or have them set a need SSL init flag that is checked
 somewhere else.


 It is not enough to just add a hook to the GUCs since I would guess most
 users would expect the certificate to be reloaded if just the file has been
 replaced and no GUC was changed.

Why? It seems to me that the assign hook gets called once per process
at reload for a SIGHUP parameter even if its value is not changed, no?

 To support this we would need to also check
 the mtimes of the SSL files, would that complexity really be worth it?

Or we could reload the SSL context unconditionally once per reload
loop. I am wondering how costly that may prove to be though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-22 Thread Kyotaro HORIGUCHI
Hello,

  2015-07-19 20:54 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:
  I am sending updated version. It implements new long option
  --strict-names. If this option is used, then for any entered name
  (without any wildcard char) must be found least one object. This option has
  not impact on patters (has wildcards chars).

I share the same option with Tom that it should behave in same
way regardless of the appearance of wildcards.

You replied Tom as this

 the behave is same - only one real identifier is allowed

But the description above about this patch apparently says that
they are differently handled. And
expand_(schema|table)_name_patterns does the further differently
thing from both of Tom's suggestion and what mentioned in your
reply to that. I will mention for this topic again in this mail.

# Might only one real ident is allowed mean at most one
# match, but not exactly one match?

They do like this when strict-names.

  - Not allow no match for non-wildcarded names.

  - Allow no match for any wildcarded name spec and finally
allowing *all* of them don't match anyting.

This looks to me quite confusing.

   When this option is not used,
  then behave is 100% same as before (with same numbers of SQL queries for -t
  option). It is based on Oleksandr's documentation (and lightly modified
  philosophy), and cleaned my previous patch. A test on wildchard existency
  strcspn(cell-val, ?*) cannot be used, because it doesn't calculate
  quotes (but a replace has few lines only).

The new name processSQLName looks a bit
bogus. processSQLNameIntenral would be a name commonly seen in
such cases.

  There is a possibility to remove a wildcard char test and require least
  one entry for patters too. But I am thinking, when somebody explicitly uses
  any wildcard, then he calculate with a possibility of empty result.

Why do you think so? Wild cards are usually used to glob multiple
names at once. One or more matches are expected for many or
perhaps most cases, I think. Since so, if someone anticipate that
some of his patterns have no match, I think he shouldn't specify
--strict-names option at all.

Furthermore, I don't think no one wants to use both wildcarded
and non-wildcarded name specs at once but this is a little out of
scope.

I'd like to have opinions from others about this point.

  other variant is using --strict-names behave as default (and implement
  negative option like --disable-strict-names or some similar).

This contradicts Josh's request. (which I'm not totally agree:p)

 Note: originally I though, we have to fix it and change the default behave.
 But with special option, we don't need it. This option in help is signal
 for user, so some is risky.

regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Kyotaro HORIGUCHI
Hi, I forgot to mention a significant point.

 At Wed, 22 Jul 2015 17:50:35 +0300, Ildus Kurbangaliev 
 i.kurbangal...@postgrespro.ru wrote in 55afadbb.9090...@postgrespro.ru
  On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:
   Hello,
  
   At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev
   i.kurbangal...@postgrespro.ru wrote in
   55ae2cd9.4050...@postgrespro.ru
   On 07/21/2015 01:18 PM, Andres Freund wrote:
   I'd very much like to avoid increasing the size of struct LWLock. We
   have a lot of those and I'd still like to inline them with the buffer
   descriptors. Why do we need a separate group and can't reuse the
   tranche?  That might require creating a few more tranches, but ...?
  
   Andres
   Do you mean moving LWLocks defined by offsets and with dynamic sizes
   to separate tranches?
   I think it is too much for the purpose. Only two new tranches and
   maybe one or some new members (maybe representing the group) of
   trances will do, I suppose.
  
  Can you explain why only two new tranches?
  There is 13 types of lwlocks (besides individual), and we need
  separate them somehow.
 
 Sorry, I minunderstood about tranche.
 
 Currently tranches other than main are used by WALInsertLocks and
 ReplicationOrigins. Other dynamic locks are defined as parts of
 main LWLokcs since they have the same shape with individual
 lwlocks. Leaving the individual locks, every lock groups may have
 their own tranche if we allow lwlocks to have own tranche even if
 it is in MainLWLockArray. New 13-16 trances will be added but no
 need to register their name in LWLOCK_GROUPS[]. After all, this
 array would be renamed such as IndividualLWLockNames and the
 name-lookup can be done by the follwoing simple steps.
 
 - If the the lock is in main tranche, lookup the individual name
   array for its name.

 This lookup is doable by calculation and no need to scan.

 - Elsewise, use the name of its tranche.
 
 Does this make sense?
 
   It sounds like an good option, but it will require refactoring of
   current tranches. In current implementation
   i tried not change current things.
   Now one of the most controversial points of this patch is the
   details of the implement, which largely affects performance drag,
   maybe.
  
  
From the viewpoint of performance, I have some comments on the
feature:
  
 - LWLockReportStat runs a linear search loop which I suppose
   should be avoided even if the loop count is rather small for
   LWLocks, as Fujii-san said upthread or anywhere.
  It runs only one time in first acquirement. In previous patch
  it was much heavier. Anyway this code will be removed if we
  split main tranche to smaller tranches.
 
 Ah, this should be the same with what I wrote above, isn't it?
 
 - Currently pg_stat_activity view is the only API, which would
   be a bit heavy for sampling use. It'd be appreciated to have a
   far lighter means to know the same informtion.
  Yes, pg_stat_activity is just information about current wait,
  and it's too heavy for sampling.  Main goal of this patch was
  creating base structures that can be used later.
 
 Ok, I see it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread Michael Paquier
On Thu, Jul 23, 2015 at 10:56 AM, dinesh kumar dineshkuma...@gmail.com wrote:
 Hi All,

 On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jim Nasby jim.na...@bluetreble.com writes:
  On 7/13/15 3:39 PM, dinesh kumar wrote:
  Ah. It's' my bad interpretation. Let me work on it, and will send a new
  patch as a wrapper sql function for ereport.

  You might want to present a plan for that; it's not as trivial as it
  sounds due to how ereport works. In particular, I'd want to see (at
  minimum) the same functionality that plpgsql's RAISE command now
  provides (errdetail, errhint, etc).


 Jim,

 For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In our
 to do item description, I found this wrapper needs to return Anyelement.
 But, I believe, return VOID is enough for this function. Let me know if I
 erred here.

 In design phase,

 1. I took a CustomDataType with the elevel code, elevel text

 2. Populated this CDT with all existing pre-processors, except {FATAL,
 PANIC}. Since, we don't expose these to client.

 3. By matching the user elevel text, processing the report log function.

 Find the attached patch with implementation.

Btw, if you want to get more attention for your patch as well as
reviews, you should consider registering to the next commit fest of
September:
https://commitfest.postgresql.org/6/
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Proposing COPY .. WITH PERMISSIVE

2015-07-22 Thread dinesh kumar
Hi All,

Would like to propose PERMISSIVE mode for the COPY created out files.
I mean, at this moment, if do COPY as postgres instance owner, i can able
to read the file as non instance user as well, and would like to restrict
this to
instance owner WITH PERMISSIVE option.

Let me know your thoughts on this.

Regards,
Dinesh
manojadinesh.blogspot.com


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread David Rowley
On 23 June 2015 at 05:37, Robert Haas robertmh...@gmail.com wrote:

 When a PostgreSQL system wedges, or when it becomes dreadfully slow
 for some reason, I often find myself relying on tools like strace,
 gdb, or perf to figure out what is happening.  This doesn't tend to
 instill customers with confidence; they would like (quite
 understandably) a process that doesn't require installing developer
 tools on their production systems, and doesn't require a developer to
 interpret the results, and perhaps even something that they could
 connect up to PEM or Nagios or whatever alerting system they are
 using.

 There are obviously many ways that we might think about improving
 things here, but what I'd like to do is try to put some better
 information in pg_stat_activity, so that when a process is not
 running, users can get some better information about *why* it's not
 running.  The basic idea is that pg_stat_activity.waiting would be
 replaced by a new column pg_stat_activity.wait_event, which would
 display the reason why that backend is waiting.  This wouldn't be a
 free-form text field, because that would be too expensive to populate.


I've not looked into the feasibility of it, but if it were also possible to
have a waiting_for column which would store the process ID of the process
that's holding a lock that this process is waiting on, then it would be
possible for some smart guy to write some code which draws beautiful
graphs, perhaps in Pg Admin 4 of which processes are blocking other
processes. I imagine this as a chart with an icon for each process.
Processes waiting on locks being released would have an arrow pointing to
their blocking process, if we clicked on that blocking process we could see
the query that it's running and various other properties that are existing
columns in pg_stat_activity.

Obviously this is blue-skies stuff, but if we had a few to provide that
information it would be a great step forward towards that.

Regards

David Rowley


--
 David Rowley   http://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] [PATCH] SQL function to report log message

2015-07-22 Thread dinesh kumar
On Wed, Jul 22, 2015 at 8:56 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Thu, Jul 23, 2015 at 10:56 AM, dinesh kumar dineshkuma...@gmail.com
 wrote:
  Hi All,
 
  On Mon, Jul 13, 2015 at 2:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Jim Nasby jim.na...@bluetreble.com writes:
   On 7/13/15 3:39 PM, dinesh kumar wrote:
   Ah. It's' my bad interpretation. Let me work on it, and will send a
 new
   patch as a wrapper sql function for ereport.
 
   You might want to present a plan for that; it's not as trivial as it
   sounds due to how ereport works. In particular, I'd want to see (at
   minimum) the same functionality that plpgsql's RAISE command now
   provides (errdetail, errhint, etc).
 
 
  Jim,
 
  For now, I  worked on (ERROR Level, ERROR Message, HIDE ERROR Stmt). In
 our
  to do item description, I found this wrapper needs to return
 Anyelement.
  But, I believe, return VOID is enough for this function. Let me know
 if I
  erred here.
 
  In design phase,
 
  1. I took a CustomDataType with the elevel code, elevel text
 
  2. Populated this CDT with all existing pre-processors, except {FATAL,
  PANIC}. Since, we don't expose these to client.
 
  3. By matching the user elevel text, processing the report log function.
 
  Find the attached patch with implementation.


Thanks Michael.

Uploaded my patch there.


Regards,
Dinesh
manojadinesh.blogspot.com


 Btw, if you want to get more attention for your patch as well as
 reviews, you should consider registering to the next commit fest of
 September:
 https://commitfest.postgresql.org/6/
 Regards,
 --
 Michael



Re: [HACKERS] Queries runs slow on GPU with PG-Strom

2015-07-22 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Josh Berkus
 Sent: Thursday, July 23, 2015 2:49 AM
 To: YANG; pgsql-hackers@postgresql.org; KaiGai Kohei
 Subject: Re: [HACKERS] Queries runs slow on GPU with PG-Strom
 
 On 07/22/2015 08:16 AM, YANG wrote:
 
  Hello,
 
  I've performed some tests on pg_strom according to the wiki. But it seems 
  that
  queries run slower on GPU than CPU. Can someone shed a light on what's wrong
  with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
  Ubuntu 15.04. And the results was
 
 I believe that pgStrom has its own mailing list.  KaiGai?

Sorry, I replied to this earlier.

We have no own mailing list, but issue tracker on GitHub is a proper
way to report problems to the developers.

  https://github.com/pg-strom/devel/issues

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-07-22 Thread Michael Paquier
On Thu, Jul 23, 2015 at 9:55 AM, Fabrízio de Royes Mello wrote:
 Thank you for the review.

+   /* skipp if the name already exists and if_not_exists is true */
s/skipp/skip.

Except that this looks in good shape to me (see attached for a version
fixing the typo) so switched to Ready for committer.
-- 
Michael
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 207fec1..5a71c3c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
-ADD [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable [ SET DATA ] TYPE replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ USING replaceable class=PARAMETERexpression/replaceable ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable SET DEFAULT replaceable class=PARAMETERexpression/replaceable
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
   variablelist
varlistentry
-termliteralADD COLUMN/literal/term
+termliteralADD COLUMN [ IF NOT EXISTS ]/literal/term
 listitem
  para
   This form adds a new column to the table, using the same syntax as
-  xref linkend=SQL-CREATETABLE.
+  xref linkend=SQL-CREATETABLE. If literalIF NOT EXISTS/literal
+  is specified and the column already exists, no error is thrown.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c7eded..92b2662 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,9 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname,
+bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2304,7 +2305,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	(void) check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy((attform-attname), newattname);
@@ -3455,11 +3456,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	  false, false, false, lockmode);
+	  false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	  false, true, false, lockmode);
+	  false, true, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_ColumnDefault:	/* ALTER COLUMN DEFAULT */
 			address = ATExecColumnDefault(rel, cmd-name, cmd-def, lockmode);
@@ -3572,14 +3573,14 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 			if (cmd-def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	true, false, false, lockmode);
+	true, false, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_AddOidsRecurse:	/* SET WITH OIDS */
 			/* Use the ADD COLUMN code, unless prep decided to do nothing */
 			if (cmd-def != NULL)
 address =
 	ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	true, true, false, lockmode);
+	true, true, false, cmd-missing_ok, lockmode);
 			break;
 		case AT_DropOids:		/* SET WITHOUT OIDS */
 
@@ -4677,7 +4678,7 @@ 

Re: [HACKERS] pg_dump quietly ignore missing tables - is it bug?

2015-07-22 Thread Kyotaro HORIGUCHI
Sorry for the bogus on bogus.

At Thu, 23 Jul 2015 14:22:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
horiguchi.kyot...@lab.ntt.co.jp wrote in 
20150723.142259.200902861.horiguchi.kyot...@lab.ntt.co.jp
 Hello,
 
   2015-07-19 20:54 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:
   I am sending updated version. It implements new long option
   --strict-names. If this option is used, then for any entered name
   (without any wildcard char) must be found least one object. This option 
   has
   not impact on patters (has wildcards chars).
 
 I share the same option with Tom that it should behave in same
 way regardless of the appearance of wildcards.
 
 You replied Tom as this
 
  the behave is same - only one real identifier is allowed
 
 But the description above about this patch apparently says that
 they are differently handled. And
 expand_(schema|table)_name_patterns does the further differently
 thing from both of Tom's suggestion and what mentioned in your
 reply to that. I will mention for this topic again in this mail.
 
 # Might only one real ident is allowed mean at most one
 # match, but not exactly one match?
 
 They do like this when strict-names.
 
   - Not allow no match for non-wildcarded names.
 
   - Allow no match for any wildcarded name spec and finally
 allowing *all* of them don't match anyting.
 
 This looks to me quite confusing.
 
When this option is not used,
   then behave is 100% same as before (with same numbers of SQL queries for 
   -t
   option). It is based on Oleksandr's documentation (and lightly modified
   philosophy), and cleaned my previous patch. A test on wildchard existency
   strcspn(cell-val, ?*) cannot be used, because it doesn't calculate
   quotes (but a replace has few lines only).
 
 The new name processSQLName looks a bit
 bogus. processSQLNameIntenral would be a name commonly seen in
 such cases.

The ocrrent name is not that but processSQLNamePatternInternal.

   There is a possibility to remove a wildcard char test and require least
   one entry for patters too. But I am thinking, when somebody explicitly 
   uses
   any wildcard, then he calculate with a possibility of empty result.
 
 Why do you think so? Wild cards are usually used to glob multiple
 names at once. One or more matches are expected for many or
 perhaps most cases, I think. Since so, if someone anticipate that
 some of his patterns have no match, I think he shouldn't specify
 --strict-names option at all.
 
 Furthermore, I don't think no one wants to use both wildcarded
 and non-wildcarded name specs at once but this is a little out of
 scope.
 
 I'd like to have opinions from others about this point.
 
   other variant is using --strict-names behave as default (and implement
   negative option like --disable-strict-names or some similar).
 
 This contradicts Josh's request. (which I'm not totally agree:p)
 
  Note: originally I though, we have to fix it and change the default behave.
  But with special option, we don't need it. This option in help is signal
  for user, so some is risky.
 
 regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] postgres_fdw extension support

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 2:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 21, 2015 at 2:27 PM, Andres Freund and...@anarazel.de wrote:
 But I'm not going to complain too loudly if we don't do invalidation.

 Not doing invalidation seems silly to me.  But I don't want to bend
 Paul too far around the axle, either.

 Just to be clear here: the case we are concerned about is, given that
 we have determined that function X is or is not a member of one of the
 extensions marked shippable for a given connection, is it likely that
 that status will change (while the function continues to exist with
 the same OID) during the lifespan of the connection?  If it did change,
 how critical would it be for us to honor the new shippability criterion
 on that connection?  My answer to both is not very.  So I'm not excited
 about expending lots of code or cycles to check for such changes.

 If we were trying to cache things across more than a connection lifespan,
 the answer might be different.

We've had a few cases like this at EnterpriseDB where we've not done
invalidations in situations like this, and customers have reported
them as bugs.  We've also had cases where PostgreSQL doesn't do this
that have been reported to EnterpriseDB as bugs:

http://www.postgresql.org/message-id/ca+tgmoydf7dkxhktk7u_ynafst47sgk5j8gd8c1jfsiouu1...@mail.gmail.com

If you know what's happening, these kinds of problems are often no big
deal: you just reconnect and it starts working.  The problem is that
customers often DON'T know what is happening, leading to a lot of
frustration and head-banging.  Oh, let me see if reconnecting fixes
it is just not something that tends to occur to people, at least IME.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MultiXact member wraparound protections are now enabled

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 4:11 PM, Peter Eisentraut pete...@gmx.net wrote:
 Why is this message logged by default in a fresh installation?  The
 technicality of that message doesn't seem to match the kinds of messages
 that we normally print at startup.

It seems nobody likes that message.

I did it that way because I wanted to provide an easy way for users to
know whether they had those protections enabled.  If you don't display
the message when things are already OK at startup, users have to make
a negative inference, like this: let's see, I'm on a version that is
new enough that it would have printed a message if the protections had
not been enabled, so the absence of the message must mean things are
OK.

But it seemed to me that this could be rather confusing.  I thought it
would be better to be explicit about whether the protections are
enabled in all cases.  That way, (1) if you see the message saying
they are enabled, they are enabled; (2) if you see the message saying
they are disabled, they are disabled; and (3) if you see neither
message, your version does not have those protections.

You are not the first person to dislike this, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh
On Wed, Jul 22, 2015 at 7:34 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas robertmh...@gmail.com
 wrote:
  Notice that the collation specifier is gone.  Oops.
 
  As it is not possible to specify directly a constraint for a PRIMARY
  KEY expression, what about switching dumpConstraint to have it use
  first a CREATE INDEX query with the collation and then use ALTER TABLE
  to attach the constraint to it? I am noticing that we already fetch
  the index definition in indxinfo via pg_get_indexdef. Thoughts?

 I guess the questions on my mind is Why is it that you can't do this
 directly when creating a primary key, but you can do it when turning
 an existing index into a primary key?

 If there's a good reason for prohibiting doing this when creating a
 primary key, then presumably it shouldn't be allowed when turning an
 index into a primary key either.  If there's not, then why not extend
 the PRIMARY KEY syntax to allow it?


+1. I think in the short term we can treat this as a bug, and fix the bug
by disallowing an index with attributes that cannot be present in an index
created by PRIMARY KEY constraint. The collation attribute on one of the
keys may be just one of many such attributes.

In the long term, we may want to allow collation in primary key, but that
will be a feature ideally suited for a major version release.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/


Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Gurjeet Singh
On Tue, Jul 21, 2015 at 9:23 AM, Robert Haas robertmh...@gmail.com wrote:

 rhaas=# create unique index on foo (a collate C);
 CREATE INDEX
 rhaas=# alter table foo add primary key using index foo_a_idx;
 ALTER TABLE



 Now dump and restore this database.  Then:



 Notice that the collation specifier is gone.  Oops.


The intent of the 'USING INDEX' feature was to work around the problem that
PRIMARY KEY indexes cannot be reindexed concurrently to reduce bloat.

Considering that the feature doesn't work [1] (at least as shown in example
in the docs [2]) if there are any foreign keys referencing the table,
there's scope for improvement.

There was proposal to support reindexing the primary key index, but I am
not sure where that stands. If that's already in, or soon to be in core, we
may put a deprecation notice around USING INDEX. OTOH, if reindexing PKeys
is too difficult, we may want to support index-replacement via a top-level
ALTER TABLE subcommand that works in CONCURRENT mode, and not recommend the
method shown in the docs (i.e deprecate the USING INDEX syntax).

[1]: The DROP CONSTRAINT part of the command fails if there are any FKeys
pointing to this table.
[2]: http://www.postgresql.org/docs/9.4/static/sql-altertable.html

-- 
Gurjeet Singh http://gurjeet.singh.im/


Re: [HACKERS] TABLESAMPLE patch is really in pretty sad shape

2015-07-22 Thread Tom Lane
I wrote:
 We could alternatively provide two scan-initialization callbacks,
 one that analyzes the parameters before we do heap_beginscan,
 and another that can do additional setup afterwards.  Actually,
 that second call would really not be meaningfully different from
 the ReScan call, so another solution would be to just automatically
 invoke ReScan after we've created the HeapScanDesc.  TSMs could work
 around not having this by complicating their NextBlock function a bit
 (so that it would do some initialization on first call) but perhaps
 it would be easier to have the additional init call.

Actually, there's another reason why this has to be redesigned: evaluating
the parameter expressions of TABLESAMPLE during ExecInitSampleScan is
not OK.  Consider this:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select 
count(*) from tenk1 tablesample bernoulli (pct)) ss;
 pct | count 
-+---
   0 | 0
 100 | 0
(2 rows)

Surely the second execution of the subquery should've given me 1.
It doesn't because the TABLESAMPLE parameters aren't recomputed during a
rescan.  Even if you're minded to claim that that's all right, this is
definitely not acceptable:

regression=# select * from (values (0::float4),(100)) v(pct), lateral (select * 
from tenk1 tablesample bernoulli (pct)) ss;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

That one's crashing because pct is a Var that refers to an outer scan
tuple that doesn't exist yet.  I'm not real sure how come the case with
an aggregate manages not to crash, actually.

This needs to work more like LIMIT, which doesn't try to compute the
limit parameters until the first fetch.  So what we need is an Init
function that does very darn little indeed (maybe we don't even need
it at all), and then a ParamInspect function that is called at first fetch
or during a ReScan, and that one is the one that gets to look at the
evaluated parameter values.

If we wanted to let the method inspect the HeapScanDesc during setup
it would need still a third callback.  I'm not exactly sure if that's
worth the trouble.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CINE for ALTER TABLE ... ADD COLUMN

2015-07-22 Thread Fabrízio de Royes Mello
On Thu, Jul 16, 2015 at 10:36 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 I had a look at this patch, and here are some minor comments:
 1) In alter_table.sgml, you need a space here:
 [ IF NOT EXISTS ]replaceable

Fixed.


 2)
 +   check_for_column_name_collision(targetrelation, newattname,
false);
 (void) needs to be added in front of check_for_column_name_collision
 where its return value is not checked or static code analyzers are
 surely going to complain.

Fixed.


 3) Something minor, some lines of codes exceed 80 characters, see
 declaration of check_for_column_name_collision for example...

Fixed.


 4) This comment needs more precisions?
 /* new name should not already exist */
 -   check_for_column_name_collision(rel, colDef-colname);
 +   if (!check_for_column_name_collision(rel, colDef-colname,
 if_not_exists))
 The new name can actually exist if if_not_exists is true.


Improved the comment.


 Except that the implementation looks sane to me.


Thank you for the review.

Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 207fec1..5a71c3c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -36,7 +36,7 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
 phrasewhere replaceable class=PARAMETERaction/replaceable is one of:/phrase
 
-ADD [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
+ADD [ COLUMN ] [ IF NOT EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ replaceable class=PARAMETERcolumn_constraint/replaceable [ ... ] ]
 DROP [ COLUMN ] [ IF EXISTS ] replaceable class=PARAMETERcolumn_name/replaceable [ RESTRICT | CASCADE ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable [ SET DATA ] TYPE replaceable class=PARAMETERdata_type/replaceable [ COLLATE replaceable class=PARAMETERcollation/replaceable ] [ USING replaceable class=PARAMETERexpression/replaceable ]
 ALTER [ COLUMN ] replaceable class=PARAMETERcolumn_name/replaceable SET DEFAULT replaceable class=PARAMETERexpression/replaceable
@@ -96,11 +96,12 @@ ALTER TABLE ALL IN TABLESPACE replaceable class=PARAMETERname/replaceable
 
   variablelist
varlistentry
-termliteralADD COLUMN/literal/term
+termliteralADD COLUMN [ IF NOT EXISTS ]/literal/term
 listitem
  para
   This form adds a new column to the table, using the same syntax as
-  xref linkend=SQL-CREATETABLE.
+  xref linkend=SQL-CREATETABLE. If literalIF NOT EXISTS/literal
+  is specified and the column already exists, no error is thrown.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 1c7eded..05fbe51 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -328,8 +328,9 @@ static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recu
 bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
 Relation rel, ColumnDef *colDef, bool isOid,
-bool recurse, bool recursing, LOCKMODE lockmode);
-static void check_for_column_name_collision(Relation rel, const char *colname);
+bool recurse, bool recursing, bool if_not_exists, LOCKMODE lockmode);
+static bool check_for_column_name_collision(Relation rel, const char *colname,
+bool if_not_exists);
 static void add_column_datatype_dependency(Oid relid, int32 attnum, Oid typid);
 static void add_column_collation_dependency(Oid relid, int32 attnum, Oid collid);
 static void ATPrepAddOids(List **wqueue, Relation rel, bool recurse,
@@ -2304,7 +2305,7 @@ renameatt_internal(Oid myrelid,
 		oldattname)));
 
 	/* new name should not already exist */
-	check_for_column_name_collision(targetrelation, newattname);
+	(void) check_for_column_name_collision(targetrelation, newattname, false);
 
 	/* apply the update */
 	namestrcpy((attform-attname), newattname);
@@ -3455,11 +3456,11 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_AddColumnToView:		/* add column via CREATE OR REPLACE
 		 * VIEW */
 			address = ATExecAddColumn(wqueue, tab, rel, (ColumnDef *) cmd-def,
-	  false, false, false, lockmode);
+	  false, false, false, false, lockmode);
 			break;
 		case AT_AddColumnRecurse:
 			address = 

Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Michael Paquier
On Thu, Jul 23, 2015 at 5:47 AM, Gurjeet Singh gurj...@singh.im wrote:
 On Wed, Jul 22, 2015 at 7:34 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas robertmh...@gmail.com
  wrote:
  Notice that the collation specifier is gone.  Oops.
 
  As it is not possible to specify directly a constraint for a PRIMARY
  KEY expression, what about switching dumpConstraint to have it use
  first a CREATE INDEX query with the collation and then use ALTER TABLE
  to attach the constraint to it? I am noticing that we already fetch
  the index definition in indxinfo via pg_get_indexdef. Thoughts?

 I guess the questions on my mind is Why is it that you can't do this
 directly when creating a primary key, but you can do it when turning
 an existing index into a primary key?

Sure. This does not seem to be complicated.

 If there's a good reason for prohibiting doing this when creating a
 primary key, then presumably it shouldn't be allowed when turning an
 index into a primary key either.  If there's not, then why not extend
 the PRIMARY KEY syntax to allow it?

Yeah, I think we should be able to define a collation in this case.
For example it is as well possible to pass a WITH clause with storage
parameters, though we do not document it in
table_constraint_using_index
(http://www.postgresql.org/docs/devel/static/sql-altertable.html).

 +1. I think in the short term we can treat this as a bug, and fix the bug
 by disallowing an index with attributes that cannot be present in an index
 created by PRIMARY KEY constraint. The collation attribute on one of the
 keys may be just one of many such attributes.

Er, pushing such a patch on back-branches may break applications that
do exactly what Robert did in his test case (using a unique index as
primary key with a collation), no? I am not sure that this is
acceptable. Taking the problem at its root by swtiching pg_dump to
define an index and then use it looks a more solid approach on back
branches.

 In the long term, we may want to allow collation in primary key, but that
 will be a feature ideally suited for a major version release.

Yep. Definitely.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Kouhei Kaigai
 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
 Sent: Wednesday, July 22, 2015 11:19 PM
 To: Kaigai Kouhei(海外 浩平)
 Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ 
 testing,
 doesn't it?
 
 On Wed, Jul 22, 2015 at 8:13 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
  No, what I'm concerned about is the case when scanrelid  0.
 
  Hmm. if scanrelid  0, then fdw_scan_tlist should be NIL.
  I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
  the GetForeignPlan() in createplan.c.
 
  I'm curious why you tried to put valid fdw_scan_tlist for scanrelid  0.
  It's unusual.
 
 Allowing that was part of the point of Tom Lane's commit
 1a8a4e5cde2b7755e11bde2ea7897bd650622d3e.  See the second bullet
 point, after the comma.

Indeed, this commit allows ForeignScan to have fdw_scan_tlist, even if
scanrelid  0, however, I'm uncertain about its reason/intention.
Does it a preparation for the upcoming target-list-pushdown??

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2015-07-22 Thread Ildus Kurbangaliev

On 07/22/2015 09:10 AM, Kyotaro HORIGUCHI wrote:

Hello,

At Tue, 21 Jul 2015 14:28:25 +0300, Ildus Kurbangaliev 
i.kurbangal...@postgrespro.ru wrote in 55ae2cd9.4050...@postgrespro.ru

On 07/21/2015 01:18 PM, Andres Freund wrote:

On 2015-07-21 13:11:36 +0300, Ildus Kurbangaliev wrote:

 /*
* Top-level transactions are identified by VirtualTransactionIDs
* comprising
diff --git a/src/include/storage/lwlock.h
b/src/include/storage/lwlock.h
index cff3b99..55b0687 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -58,6 +58,9 @@ typedef struct LWLock
   #ifdef LOCK_DEBUG
struct PGPROC *owner;   /* last exlusive owner of the lock */
   #endif
+
+ /* LWLock group, initialized as -1, calculated in first acquire */
+   int group;
   } LWLock;

I'd very much like to avoid increasing the size of struct LWLock. We
have a lot of those and I'd still like to inline them with the buffer
descriptors. Why do we need a separate group and can't reuse the
tranche?  That might require creating a few more tranches, but ...?

Andres

Do you mean moving LWLocks defined by offsets and with dynamic sizes
to separate tranches?

I think it is too much for the purpose. Only two new tranches and
maybe one or some new members (maybe representing the group) of
trances will do, I suppose.


Can you explain why only two new tranches?
There is 13 types of lwlocks (besides individual), and we need separate 
them somehow.

It sounds like an good option, but it will require refactoring of
current tranches. In current implementation
i tried not change current things.

Now one of the most controversial points of this patch is the
details of the implement, which largely affects performance drag,
maybe.


 From the viewpoint of performance, I have some comments on the feature:

  - LWLockReportStat runs a linear search loop which I suppose
should be avoided even if the loop count is rather small for
LWLocks, as Fujii-san said upthread or anywhere.

It runs only one time in first acquirement. In previous patch
it was much heavier. Anyway this code will be removed if we
split main tranche to smaller tranches.

  - Currently pg_stat_activity view is the only API, which would
be a bit heavy for sampling use. It'd be appreciated to have a
far lighter means to know the same informtion.

Yes, pg_stat_activity is just information about current wait,
and it's too heavy for sampling.  Main goal of this patch was
creating base structures that can be used later.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 8:24 AM, Simon Riggs si...@2ndquadrant.com wrote:
 * An estimate of the estimated time of completion - I liked your view that
 this prediction may be costly to request

I'm saying it may be massively unreliable, not that it may be costly.
(Someone else may have said that it would be costly, but I don't think
it was me.)

 Most of the progress estimators I have seen over the ~30 years that
 I've been playing with computers have been unreliable, and many of
 those have been unreliable to the point of being annoying.  I think
 that's likely to happen with what you are proposing too, though of
 course like all predictions of the future it could turn out to be
 wrong.

 Almost like an Optimizer then. Important, often annoyingly wrong, needs more
 work.

Yes, but with an important difference.  If the optimizer mis-estimates
the row count by 3x or 10x or 1000x, but the plan is OK anyway, it's
often the case that no one cares.  Except when the plan is bad, people
don't really care about the method used to derive it.  The same is not
true here: people will rely on the progress estimates directly, and
they will really care if they are not right.

 I'm not proposing this feature, I'm merely asking for it to be defined in a
 way that makes it work for more than just VACUUM. Once we have a way of
 reporting useful information, other processes can be made to follow that
 mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
 important, even if we never get such information for user queries. But I
 hope we do.

 I won't get in the way of your search for detailed information in more
 complex forms. Both things are needed.

OK.

One idea I have is to create a system where we expose a command tag
(i.e. VACUUM) plus a series of generic fields whose specific meanings
are dependent on the command tag.  Say, 6 bigint counters, 6 float8
counters, and 3 strings up to 80 characters each.  So we have a
fixed-size chunk of shared memory per backend, and each backend that
wants to expose progress information can fill in those fields however
it likes, and we expose the results.

This would be sorta like the way pg_statistic works: the same columns
can be used for different purposes depending on what estimator will be
used to access them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 8:13 AM, Kouhei Kaigai kai...@ak.jp.nec.com wrote:
 No, what I'm concerned about is the case when scanrelid  0.

 Hmm. if scanrelid  0, then fdw_scan_tlist should be NIL.
 I want to put Assert(scanrelid==0 || fdw_scan_tlist == NIL) just after
 the GetForeignPlan() in createplan.c.

 I'm curious why you tried to put valid fdw_scan_tlist for scanrelid  0.
 It's unusual.

Allowing that was part of the point of Tom Lane's commit
1a8a4e5cde2b7755e11bde2ea7897bd650622d3e.  See the second bullet
point, after the comma.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade fails when postgres/template1 isn't in default tablespace

2015-07-22 Thread Marti Raudsepp
On Wed, Jul 22, 2015 at 5:00 PM, Robert Haas robertmh...@gmail.com wrote:
 +1.  I would recommend adding it to the CF *immediately* to have it
 get attention.   The CF app is basically our patch tracker.

Thanks, I have done so now: https://commitfest.postgresql.org/6/313/

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of global temporary tables?

2015-07-22 Thread Robert Haas
On Wed, Jul 15, 2015 at 11:52 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 15 July 2015 at 16:44, Andres Freund and...@anarazel.de wrote:
 On 2015-07-15 16:36:12 +0100, Simon Riggs wrote:
  On 15 July 2015 at 16:28, Andres Freund and...@anarazel.de wrote:
   I think that's generally a fair point. But here we're discussing to
   add
   a fair amount of wrinkles with the copy approach. The fact alone that
   the oid is different will have some ugly consequences.
  
 
  Why? We are creating a local temp table LIKE the global temp table. That
  is
  already a supported operation. So there is no different oid.

 Then your locking against ALTER, DROP etc. isn't going to work.

 There would be two objects, both locked. The temp table is just nice and
 simple. No problem.

 Your optimization may work; I hope it does. My approach definitely will. So
 we could choose either.

It's not really an optimization; it's a whole different approach.  I
looked at the create-a-temp-table-on-the-fly idea back when I
implemented unlogged tables and concluded it was an unworkable mess.
Deep down in the guts of name resolution code is not the place where
you want to suddenly decide that you need to run some DDL.  So I
believe in what Andres is proposing.  I'm not necessarily going to
shout it down if somebody finds a way to make the
temp-table-on-the-fly approach work, but my view is that making that
work, although it may look superficially appealing, will eventually
make whoever has to do it hate their life; and that even if they get
it to where it sorta works, it's going to have ugly corner cases that
are almost impossible to file down.

Another advantage of Andres's approach, BTW, is that it could
potentially eventually be extended to work on Hot Standby machines.
For that to work, we'd need a separate XID space for temporary tables,
but Noah proposed that before, and I don't think it's a completely
crazy idea (just mostly crazy).  Now, maybe nobody's going to care
about that any more in 5 years if we have full-blown logical
replication deeply integrated into core, but there's a lot to like
about a design that keeps our options in that area open.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stringify MAKE_SQLSTATE()

2015-07-22 Thread Teodor Sigaev

#define MAKE_SQLSTATE(ch1,ch2,ch3,ch4,ch5) \
((char[]){(char)(ch1),(char)(ch2),(char)(ch3),(char)(ch4),(char)(ch5),(char)'\0'})


I'm pretty sure that's a gcc-ism, not standard C.
Hmm, after some digging: the feature is called compound literals and it was 
introduced in c99 although gcc has support in c90. To disable this support it's 
needed a flag -pedantic. Sorry.



The real question is do we want to.  What's your use-case for this?
Just do not have a hardcoded values. It is too easy to make a mistake in 
hardcoded value.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Alpha2/Beta1

2015-07-22 Thread Robert Haas
On Wed, Jul 22, 2015 at 2:50 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 On 2015/07/22 12:40, Noah Misch wrote:
 I vote for alpha2.  Comparing the Open Issues and resolved after
 9.5alpha1
 sections of https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items,
 we've
 not transitioned to a qualitatively different level of API stability.

 +1 for alpha2.

+1 for alpha2 from me as well.  We have made some progress but there
is an awful lot of stuff left to get fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE .. ADD PRIMARY KEY .. USING INDEX has dump-restore hazard

2015-07-22 Thread Robert Haas
On Tue, Jul 21, 2015 at 8:34 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Wed, Jul 22, 2015 at 1:23 AM, Robert Haas robertmh...@gmail.com wrote:
 Notice that the collation specifier is gone.  Oops.

 As it is not possible to specify directly a constraint for a PRIMARY
 KEY expression, what about switching dumpConstraint to have it use
 first a CREATE INDEX query with the collation and then use ALTER TABLE
 to attach the constraint to it? I am noticing that we already fetch
 the index definition in indxinfo via pg_get_indexdef. Thoughts?

I guess the questions on my mind is Why is it that you can't do this
directly when creating a primary key, but you can do it when turning
an existing index into a primary key?

If there's a good reason for prohibiting doing this when creating a
primary key, then presumably it shouldn't be allowed when turning an
index into a primary key either.  If there's not, then why not extend
the PRIMARY KEY syntax to allow it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers