Re: [HACKERS] Combining hash values

2016-08-01 Thread Greg Stark
Surely combining multiple hashes is the same problem as hashing a block of
memory? Shouldn't we just use the same algorithm as hash_any()?

I was originally going to suggest using a crc  to combine but iirc we
changed hash_any() a while back and decided against using crc. I don't know
if that was wise but wouldn't want to suggest relitigating that.


Re: [HACKERS] Slowness of extended protocol

2016-08-01 Thread Shay Rojansky
Greg wrote:
> I think you're looking at this the wrong way around. 30% of what?
> You're doing these simple read-only selects on a database that
> obviously is entirely in RAM. If you do the math on the numbers you
> gave above the simple protocol took 678 microseconds per transaction
> and the extended protocol took 876 microseconds. The difference is 198
> microseconds. I'm not sure exactly where those 200us are going and
> perhaps it could be lower but in what real-world query is it going to
> have a measurable impact on the total time?

That's a valid question, but as Andres said, it may not matter for most
workloads but I think such a significant difference would matter to some...
Keep in mind that I'm writing from the point of view of a driver developer,
and not of a specific app - I know there are some point executing against a
local database and trying to get extremely high throughput, for RAM reading
queries or otherwise.

> The other danger in unrealistic test cases is that you're probably
> measuring work that doesn't scale and in fact optimizing based on it
> could impose a cost that *does* scale. For example if 150us of that
> time is being spent in the prepare and we cut that down by a factor of
> 10 to 15us then it would be only a 10% penalty over the simple
> protocol in your test. But if that optimization added any overhead to
> the execute stage then when it's executed thousands of times that
> could add milliseconds to the total runtime.

I think it's a bit too early to say that... We're not discussing any
proposed optimizations yet, just discussing what may or may not be a
problem... Of course any proposed optimization would have to be carefully
studied to make sure it doesn't cause performance degradation elsewhere.

Andres wrote:
> Shay, are you using unnamed or named portals? There's already a shortcut
> path for the former in some places.

The benchmarks I posted are simply pgbench doing SELECT 1 with extended vs.
simple, so I'm assuming unnamed portals. Npgsql itself only uses the
unnamed portal, I think it's documented somewhere that this is better for
performance.

Tom wrote:
> In hindsight it seems clear that what a lot of apps want out of extended
> protocol is only the ability to send parameter values out-of-line instead
> of having to quote/escape them into SQL literals.  Maybe an idea for the
> fabled V4 protocol update is some compromise query type that corresponds
> precisely to PQexecParams's feature set: you can send parameter values
> out-of-line, and you can specify text or binary results, but there's no
> notion of any persistent state being created and no feedback about
> parameter data types.

That seems like a good way forward. It may be possible to generalize this
into a more "pay-per-play" protocol. You currently have a binary choice
between a simple but fast protocol supporting very little, and an extended
but slow protocol supporting everything. Making things more "pick and
choose" could help here: if you want to actually use plan reuse, you pay
for that. If you actually send parameters, you pay for that. It would be a
pretty significant protocol change but it would make things more modular
that way.

>  I think the tie-in to the plan cache is a
> significant part of the added overhead, and so is the fact that we have to
> iterate the per-message loop in PostgresMain five times not once, with
> overheads like updating the process title incurred several times in that.

I was thinking that something like that may be the cause. Is it worth
looking into the loop and trying to optimize? For example, updating the
process title doesn't seem to make sense for every single extended
message...


Re: [HACKERS] Changed SRF in targetlist handling

2016-08-01 Thread Andres Freund
On 2016-05-25 16:55:23 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-05-25 15:20:03 -0400, Tom Lane wrote:
> >> We could certainly make a variant behavior in nodeFunctionscan.c that
> >> emulates that, if we feel that being exactly bug-compatible on the point
> >> is actually what we want.  I'm dubious about that though, not least
> >> because I don't think *anyone* actually believes that that behavior isn't
> >> broken.  Did you read my upthread message suggesting assorted compromise
> >> choices?
>
> > You mean 
> > https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us ?
> > If so, yes.
>
> > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by
> > option 1), that'd keep most of the functionality, and would break
> > visibly rather than invisibly in the cases where not.
>
> 2.5 would be okay with me.
>
> > I guess you're not planning to work on this?
>
> Well, not right now, as it's clearly too late for 9.6.  I might hack on
> it later if nobody beats me to it.

FWIW, as it's blocking my plans for executor related rework (expression
evaluation, batch processing) I started to hack on this.

I've an implementation that

1) turns all targetlist SRF (tSRF from now on) into ROWS FROM
   expressions. If there's tSRFs in the argument of a tSRF those becomes
   a separate, lateral, ROWS FROM expression.

2) If grouping/window functions are present, the entire query is wrapped
   in a subquery RTE, except for the set-returning function. All
   referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the
   original targetlist are made to reference that subquery, which gets a
   TargetEntry for them.

3) If sortClause does *not* reference any tSRFs the sorting is evaluated
   in a subquery, to preserve the output ordering of SRFs in queries
   like
   SELECT id, generate_series(1,3) FROM (VALUES(1),(2)) d(id) ORDER BY id DESC;
   if in contrast sortClause does reference the tSRF output, it's
   evaluated in the outer SRF.

this seems to generally work, and allows to remove considerable amounts
of code.

So far I have one problem without an easy solution: Historically queries
like
=# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id);
┌┬─┐
│ id │ generate_series │
├┼─┤
│  1 │   1 │
│  1 │   2 │
│  2 │   1 │
│  2 │   2 │
└┴─┘
have preserved the SRF output ordering. But by turning the SRF into a
ROWS FROM, there's no guarantee that the cross join between "few" and
generate_series(1,3) above is implemented in that order. I.e. we can get
something like
┌┬─┐
│ id │ generate_series │
├┼─┤
│  1 │   1 │
│  2 │   1 │
│  1 │   2 │
│  2 │   2 │
└┴─┘
because it's implemented as
┌──┐
│  QUERY PLAN  │
├──┤
│ Nested Loop  (cost=0.00..35.03 rows=2000 width=8)│
│   ->  Function Scan on generate_series  (cost=0.00..10.00 rows=1000 width=4) │
│   ->  Materialize  (cost=0.00..0.04 rows=2 width=4)  │
│ ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)  │
└──┘

I right now see no easy and nice-ish way to constrain that.


Besides that I'm structurally wondering whether turning the original
query into a subquery is the right thing to do. It requires some kind of
ugly munching of Query->*, and has the above problem. One alternative
would be to instead perform the necessary magic in grouping_planner(),
by "manually" adding nestloop joins before/after create_ordered_paths()
(depending on SRFs being referenced in the sort clause).  That'd create
plans we'd not have created so far, by layering NestLoop and
FunctionScan nodes above the normal query - that'd allow us to to easily
force the ordering of SRF evaluation.


If we go the subquery route, I'm wondering about where to tackle the
restructuring. So far I'm doing it very early in subquery_planner() -
otherwise the aggregation/sorting/... behaviour is easier to handle.
Perhaps doing it in standard_planner() itself would be better though.
An alternative approach would be to do this during parse-analysis, but I
think that might end up being confusing, because stored rules would
suddenly have a noticeably different structure, and it'd tie us a lot
more to the details of that transformation than I'd like.


Besides the removal of the least-common-multiple behaviour of tSRF queries,
there's some other consequences that using function scans have:
Previously if a tSRF was never evaluated, it didn't cause the number of
rows from being increased. E.g.
SELECT id, C

Re: [HACKERS] pg_basebackup wish list

2016-08-01 Thread Fujii Masao
On Fri, Jul 29, 2016 at 11:01 PM, Amit Kapila  wrote:
> On Thu, Jul 28, 2016 at 7:34 PM, Fujii Masao  wrote:
>> On Thu, Jul 28, 2016 at 10:16 PM, Amit Kapila  
>> wrote:
>>
>>> I think there is some value in providing
>>> .tar for -Z 0,
>>
>> I was thinking that "-Ft -Z0" is something like an alias of "-Ft".
>> That is, the backup is taken in uncompressed tar format.
>>
>>> however in that case how should we define usage of -F p
>>> -Z 0?  Shall we say with plain format -Z 0 gets ignored or throw error
>>> or do something else?  If first, then I think it is better to mention
>>> the same in docs.
>>
>> ISTM that it's better to ignore that case, like pg_dump -Ft -Z0
>> doesn't throw an error.
>>
>
> Okay, then you can go ahead with your patch.

Thanks for the comment! I pushed the patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Aleksander Alekseev
Hello.

Thanks everyone for great comments!

> Well, jokes aside, that's a pretty lousy excuse for not writing any 
> docs

I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.

> IMHO if this patch gets in, we should use it as the only temp table 
> implementation (Or can you think of cases where keeping rows in
> pg_class has advantages?). That'd also eliminate the need for FAST
> keyword in the CREATE TABLE command.

> Probably has zero value to have slow and fast temp tables (from
> catalogue cost perspective). So the FAST implementation should be used
> everywhere.

If there are no objections I see no reason no to do it in a next
version of a patch.

> I'm getting failures in the regression suite

I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?

> This should to work on slaves - it is one of ToDo

Glad you noticed! In fact I'm currently researching a possibility of
using the same approach for creating writable temporary tables on
replicas.

> The key reason why I don't think that's negotiable is that if there
> aren't (apparently) catalog entries corresponding to the temp tables,
> that will almost certainly break many things in the backend and
> third-party extensions, not only user code patterns like this one.

> In short, I think that the way to make something like this work is to
> figure out how to have "virtual" catalog rows describing a temp table.

I'm afraid once again I put it in a wrong way. What I meant by
"Information about these tables is stored not in a catalog but in
backend's memory" is in fact that _records_ of pg_class, pg_type and
other catalog relations are stored in-memory. Naturally this records
are visible to the user (otherwise \d or \d+ would not work) and you
can do queries like ` select * from pg_class where relname = 'tt1' `.
In other words part of the catalog is indeed "virtual".

> I didn't see support for memory store for column's statistics. Some
> separate questions is about production statistics -
> pg_stat_user_table, ..

> That seems to work (both analyze and pg_stat_user_tables). Not sure 
> where it's in the code, and I'm not willing to reverse engineer it.

Right, `ANALYZE temp_table;` and everything else works. Besides
pg_class, pg_type, pg_attribute and other relations pg_statistic
records for temp tables are stored in-memory as well. IIRC a lot of
pg_stat* relations are in fact views and thus don't require any special
support. If you see that some statistics are broken please don't
hesitate to report it and I will fix it.

Hope I answered all questions so far. I look forward to receive more
comments and questions regarding this patch!

-- 
Best regards,
Aleksander Alekseev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Etsuro Fujita

On 2016/07/29 13:05, Etsuro Fujita wrote:

In a foreign-join case,
however, we can't see such relations from the EXPLAIN printed *by core*.
 postgres_fdw avoids this issue by adding such relations to the EXPLAIN
using ExplainForeignScan as shown in the below example, but since such
relations are essential, I think that information should be shown by
core itself.

postgres=# explain select * from (select ft1.a from ft1 left join ft2 on
ft1.a = ft2.a where ft1.b = 1) ss1(a) full join (select ft3.a from ft3
left join ft4 on ft3.a = ft4.a where ft3.b = 1) ss2(a) on ss1.a = ss2.a;
   QUERY PLAN

 Hash Full Join  (cost=202.06..204.12 rows=1 width=8)
   Hash Cond: (ft1.a = ft3.a)
   ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
 Relations: (public.ft1) LEFT JOIN (public.ft2)
   ->  Hash  (cost=102.05..102.05 rows=1 width=4)
 ->  Foreign Scan  (cost=100.00..102.05 rows=1 width=4)
   Relations: (public.ft3) LEFT JOIN (public.ft4)
(7 rows)

From the Relations line shown by postgres_fdw, we can see which foreign
join joins which foreign tables, but if no such lines, we couldn't.


I thought about the Relations line a bit more and noticed that there are  
cases where the table reference names for joining relations in the  
Relations line are printed incorrectly.  Here is an example:


postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,  
ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2  
where t1.a = t2.a) as t(t1a, t2a);

 QUERY PLAN

 Unique  (cost=204.12..204.13 rows=2 width=8)
   Output: t1.a, t2.a
   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a
 Sort Key: t1.a, t2.a
 ->  Append  (cost=100.00..204.11 rows=2 width=8)
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1.a, t2.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1  
INNER JOIN public.t2 r2 ON (((r1.a = r2.a

   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1_1.a, t2_1.a
 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1  
INNER JOIN public.t2 r2 ON (((r1.a = r2.a

(14 rows)

The table reference names for ft1 and ft2 in the Relations line for the  
second Foreign Scan should be t1_1 and t2_1 respectively.


Another concern about the Relations line is, that represents just an  
internal representation of a pushed-down join, so that would not  
necessarily match a deparsed query shown in the Remote SQL line.  Here  
is an example, which I found when working on supporting pushing down  
full outer join a lot more, by improving the deparsing logic so that  
postgres_fdw can build a remote query that involves subqueries [1],  
which I'll work on for 10.0:


+ -- full outer join with restrictions on the joining relations
+ EXPLAIN (COSTS false, VERBOSE)
+ SELECT t1.c1, t2.c1 FROM (SELECT c1 FROM ft4 WHERE c1 BETWEEN 50 AND  
60) t1 FULL JOIN (SELECT c1 FROM ft5 WHERE c1 BETWEEN 50 AND 60) t2 ON  
(t1.c1 = t2.c1) ORDER BY t1.c1, t2.c1;
+  
QUERY  
PLAN  

+  
---

+  Foreign Scan
+Output: ft4.c1, ft5.c1
+Relations: (public.ft4) FULL JOIN (public.ft5)
+Remote SQL: SELECT ss1.c1, ss2.c1 FROM ((SELECT c1 FROM "S 1"."T 3"  
WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss1(c1) FULL JOIN (SELECT c1 FROM  
"S 1"."T 4" WHERE ((c1 >= 50)) AND ((c1 <= 60))) ss2(c1) ON (((ss1.c1 =  
ss2.c1 ORDER BY ss1.c1 ASC NULLS LAST, ss2.c1 ASC NULLS LAST

+ (4 rows)

"(public.ft4) FULL JOIN (public.ft5)" in the Relations line does not  
exactly match the deparsed query in the Remote SQL line, which I think  
would be rather confusing for users.  (We may be able to print more  
exact information in the Relations line so as to match the depaserd  
query, but I think that that would make the Relations line redundant.)


Would we really need the Relations line?  If joining relations are  
printed by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1"  
as proposed upthread, we can see those relations from that, not the  
Relations line.  Also we can see the join tree structure from the  
deparsed query in the Remote SQL line.  The Relations line seems to be  
not that usef

Re: [HACKERS] Combining hash values

2016-08-01 Thread Dean Rasheed
On 1 August 2016 at 08:19, Greg Stark  wrote:
> Surely combining multiple hashes is the same problem as hashing a block of
> memory? Shouldn't we just use the same algorithm as hash_any()?
>

Yes, I imagine that should work well. I suspect that Thomas's proposal
would also probably work well, as would a number of other hashing
algorithms with reasonable pedigree, such as the one used for array
hashing. I don't have any particular preference, but I do know that
what usually turns out to be disastrous is an arbitrary made-up
formula like rotating and xor'ing. The last thing we should attempt to
do is invent our own hashing algorithms.

On that subject, while looking at hashfunc.c, I spotted that
hashint8() has a very obvious deficiency, which causes disastrous
performance with certain inputs:

create table foo (a bigint);
insert into foo select (x*2^32)+x from generate_series(1,100) g(x);
analyse foo;
select * from foo f1, foo f2 where f1.a=f2.a;

With random values that query (using a hash join) takes around 2
seconds on my machine, but with the values above I estimate it will
take 20 hours (although I didn't wait that long!). The reason is
pretty obvious -- all the bigint values above hash to the same value.
I'd suggest using hash_uint32() for values that fit in a 32-bit
integer and hash_any() otherwise.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Etsuro Fujita

On 2016/07/29 13:28, Ashutosh Bapat wrote:

I wrote:

Probably something like this:

   Foreign Processing
 Remote Operations: ...

In the Remote Operations line, the FDW/extension could print
any info
about remote operations, eg, "Scan/Join + Aggregate".


I wrote:

I intentionally chose that word and thought we could leave detailed
descriptions about remote operations to the FDW/extension; a broader
word like "Processing" seems to work well because we allow various
kinds of operations to the remote side, in addition to scans/joins,
to be performed in that one Foreign Scan node indicated by "Foreign
Processing", such as aggregation, window functions, distinct, order
by, row locking, table modification, or combinations of them.



"Scan" is a better word than "Processing". From plan's perspective it's
ultimately a Scan (on the data produced by the foreign server) and not
processing.


Exactly, but one thing I'm concerned about is the table modification 
case; the EXPLAIN output would be something like this:


  Foreign Scan
Remote SQL: INSERT INTO remote_table VALUES ...

That would be strange, so I think a more broader word is better.  I 
don't think "Foreign Processing" is best.  "Foreign Upper" might be much 
better because the corresponding path is created by calling 
GetForeignUpperPaths.


Also for a Foreign Scan representing a foreign join, I think "Foreign 
Join" is better than "Foreign Scan".  Here is an example:


  Foreign Join on foreign_table1, foreign_table2
Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2 
WHERE ...


I think "Foreign Join" better indicates that foreign tables listed after 
that (ie, foreign_table1 and foreign_table2 in the example) are joining 
(or component) tables of this join, than "Foreign Scan".


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Ashutosh Bapat
> I thought about the Relations line a bit more and noticed that there are
> cases where the table reference names for joining relations in the
> Relations line are printed incorrectly.  Here is an example:
>
> postgres=# explain verbose select * from (select t1.a, t2.a from ft1 t1,
> ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1, ft2 t2 where
> t1.a = t2.a) as t(t1a, t2a);
>  QUERY PLAN
>
> 
>  Unique  (cost=204.12..204.13 rows=2 width=8)
>Output: t1.a, t2.a
>->  Sort  (cost=204.12..204.12 rows=2 width=8)
>  Output: t1.a, t2.a
>  Sort Key: t1.a, t2.a
>  ->  Append  (cost=100.00..204.11 rows=2 width=8)
>->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>  Output: t1.a, t2.a
>  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>  Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> INNER JOIN public.t2 r2 ON (((r1.a = r2.a
>->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
>  Output: t1_1.a, t2_1.a
>  Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
>  Remote SQL: SELECT r1.a, r2.a FROM (public.t1 r1
> INNER JOIN public.t2 r2 ON (((r1.a = r2.a
> (14 rows)
>
> The table reference names for ft1 and ft2 in the Relations line for the
> second Foreign Scan should be t1_1 and t2_1 respectively.
>

Relations line prints the names of foreign tables that are being joined and
the type of join. I find t1_1 and t2_1 more confusing since the query that
user has provided does not mention t1_1 and t2_1.

>
>
> Would we really need the Relations line?  If joining relations are printed
> by core like "Foreign Join on public.ft1 t1_1, public.ft2 t2_1" as proposed
> upthread, we can see those relations from that, not the Relations line.


The join type is missing in that description.


> Also we can see the join tree structure from the deparsed query in the
> Remote SQL line.


The remote SQL has the names of the table on the foreign server. It does
not help to identify the local names.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


[HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-01 Thread Christoph Berg
Re: Bruce Momjian 2016-07-30 <20160730181643.gd22...@momjian.us>
> I also just applied a doc patch that increases case and spacing
> consistency in the use of kB/MB/GB/TB.

Hi,

PostgreSQL uses the spaces inconsistently, though. pg_size_pretty uses spaces:

# select pg_size_pretty((2^20)::bigint);
 pg_size_pretty

 1024 kB

SHOW does not:

# show work_mem;
 work_mem
──
 1MB

The SHOW output is formatted by _ShowOption() using 'INT64_FORMAT "%s"',
via convert_from_base_unit(). The latter has a comment attached...
/*
 * Convert a value in some base unit to a human-friendly unit.  The output
 * unit is chosen so that it's the greatest unit that can represent the value
 * without loss.  For example, if the base unit is GUC_UNIT_KB, 1024 is
 * converted to 1 MB, but 1025 is represented as 1025 kB.
 */
... where the spaces are present again.

General typesetting standard seems to be "1 MB", i.e. to include a
space between value and unit. (This would also be my preference.)

Opinions? (I'd opt to insert spaces in the docs now, and then see if
inserting a space in the SHOW output is acceptable for 10.0.)

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Etsuro Fujita

On 2016/08/01 20:31, Ashutosh Bapat wrote:

I thought about the Relations line a bit more and noticed that there
are cases where the table reference names for joining relations in
the Relations line are printed incorrectly.  Here is an example:

postgres=# explain verbose select * from (select t1.a, t2.a from ft1
t1, ft2 t2 where t1.a = t2.a union select t1.a, t2.a from ft1 t1,
ft2 t2 where t1.a = t2.a) as t(t1a, t2a);
 QUERY PLAN


 Unique  (cost=204.12..204.13 rows=2 width=8)
   Output: t1.a, t2.a
   ->  Sort  (cost=204.12..204.12 rows=2 width=8)
 Output: t1.a, t2.a
 Sort Key: t1.a, t2.a
 ->  Append  (cost=100.00..204.11 rows=2 width=8)
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1.a, t2.a
 Relations: (public.ft1 t1) INNER JOIN
(public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1
r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a
   ->  Foreign Scan  (cost=100.00..102.04 rows=1 width=8)
 Output: t1_1.a, t2_1.a
 Relations: (public.ft1 t1) INNER JOIN
(public.ft2 t2)
 Remote SQL: SELECT r1.a, r2.a FROM (public.t1
r1 INNER JOIN public.t2 r2 ON (((r1.a = r2.a
(14 rows)

The table reference names for ft1 and ft2 in the Relations line for
the second Foreign Scan should be t1_1 and t2_1 respectively.



Relations line prints the names of foreign tables that are being joined
and the type of join. I find t1_1 and t2_1 more confusing since the
query that user has provided does not mention t1_1 and t2_1.


Please look at the Output line for the second Foreign Scan in the 
EXPLAIN.  (The reason for that is because postgres_fdw gets table 
reference names directly from rte->eref->aliasname, while EXPLAIN gets 
those through select_rtable_names_for_explain.)



Would we really need the Relations line?  If joining relations are
printed by core like "Foreign Join on public.ft1 t1_1, public.ft2
t2_1" as proposed upthread, we can see those relations from that,
not the Relations line.



The join type is missing in that description.



Also we can see the join tree structure from the deparsed query in
the Remote SQL line.



The remote SQL has the names of the table on the foreign server. It does
not help to identify the local names.


We can see the remote names of the foreign tables from the catalogs, so 
we would easily identify the local names of foreign tables in the remote 
SQL and thus the join type (or join tree structure) from the SQL.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Confusing docs about GetForeignUpperPaths in fdwhandler.sgml

2016-08-01 Thread Etsuro Fujita

Hi,

I noticed that the following note about direct modification via  
GetForeignUpperPaths in fdwhandler.sgml is a bit confusing.  We have  
another approach using PlanDirectModify, so that should be reflected in  
the note as well.  Please find attached a patch.


 PlanForeignModify and the other callbacks described in
  are designed around the  
assumption

 that the foreign relation will be scanned in the usual way and then
 individual row updates will be driven by a local  
ModifyTable

 plan node.  This approach is necessary for the general case where an
 update requires reading local tables as well as foreign tables.
 However, if the operation could be executed entirely by the foreign
 server, the FDW could generate a path representing that and insert it
 into the UPPERREL_FINAL upper relation, where it would
 compete against the ModifyTable approach.

Best regards,
Etsuro Fujita


fdw-query-planning.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Tomas Vondra

On 08/01/2016 11:45 AM, Aleksander Alekseev wrote:

Hello.

Thanks everyone for great comments!


Well, jokes aside, that's a pretty lousy excuse for not writing any
docs


I think maybe I put it in a wrong way. There are currently a lot of
comments in a code, more then enough to understand how this feature
works. What I meant is that this is not a final version of a patch and
a few paragraphs are yet to be written. At least it's how I see it. If
you believe that some parts of the code are currently hard to understand
and some comments could be improved, please name it and I will be happy
to fix it.


I don't think there's "a lot of comments in the code", not even 
remotely. At least not in the files I looked into - heapam, indexam, 
xact etc. There are a few comments in general, and most of them only 
comment obvious facts, like "ignore in-memory tuples" right before a 
trivial if statement.


What is needed is an overview of the approach, so that the reviewers can 
read that first, instead of assembling the knowledge from pieces 
scattered over comments in many pieces. But I see the fasttab.c contains 
this:


/* TODO TODO comment the general idea - in-memory tuples and indexes, 
hooks principle, FasttabSnapshots, etc */


The other thing that needs to happen is you need to modify comments in 
front of some of the modified methods - e.g. the comments may need a 
paragraph "But when the table is fast temporary, what happens is ..."





IMHO if this patch gets in, we should use it as the only temp table
implementation (Or can you think of cases where keeping rows in
pg_class has advantages?). That'd also eliminate the need for FAST
keyword in the CREATE TABLE command.



Probably has zero value to have slow and fast temp tables (from
catalogue cost perspective). So the FAST implementation should be used
everywhere.


If there are no objections I see no reason no to do it in a next
version of a patch.


I believe there will be a lot of discussion about this.




I'm getting failures in the regression suite


I've run regression suite like 10 times in a row in different
environments with different build flags but didn't manage to reproduce
it. Also our DBAs are testing this feature for weeks now on real-world
applications and they didn't report anything like this. Could you
please describe how to reproduce this issue?



Nothing special:

$ ./configure --prefix=/home/user/pg-temporary --enable-debug \
  --enable-cassert

$ make -s clean && make -s -j4 install

$ export PATH=/home/user/pg-temporary/bin:$PATH

$ pg_ctl -D ~/tmp/data-temporary init

$ pg_ctl -D ~/tmp/data-temporary -l ~/temporary.log start

$ make installcheck

I get the failures every time - regression diff attached. The first 
failure in "rolenames" is expected, because of clash with existing user 
name. The remaining two failures are not.


I only get the failure for "installcheck" but not "check" for some reason.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
*** /home/user/work/postgres/src/test/regress/expected/rolenames.out
2016-07-18 20:48:39.421444507 +0200
--- /home/user/work/postgres/src/test/regress/results/rolenames.out 
2016-08-01 15:07:04.59700 +0200
***
*** 42,47 
--- 42,48 
  CREATE ROLE "current_user";
  CREATE ROLE "session_user";
  CREATE ROLE "user";
+ ERROR:  role "user" already exists
  CREATE ROLE current_user; -- error
  ERROR:  CURRENT_USER cannot be used as a role name here
  LINE 1: CREATE ROLE current_user;
***
*** 956,958 
--- 957,960 
  DROP OWNED BY regress_testrol0, "Public", "current_user", regress_testrol1, 
regress_testrol2, regress_testrolx CASCADE;
  DROP ROLE regress_testrol0, regress_testrol1, regress_testrol2, 
regress_testrolx;
  DROP ROLE "Public", "None", "current_user", "session_user", "user";
+ ERROR:  current user cannot be dropped

==


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-01 Thread Hannu Krosing
On 07/27/2016 12:07 AM, Tom Lane wrote:
>
>> 4. Now, update that small table 500 times per second.
>> That's a recipe for runaway table bloat; VACUUM can't do much because
>> there's always some minutes-old transaction hanging around (and SNAPSHOT
>> TOO OLD doesn't really help, we're talking about minutes here), and
>> because of all of the indexes HOT isn't effective.
> Hm, I'm not following why this is a disaster.  OK, you have circa 100%
> turnover of the table in the lifespan of the slower transactions, but I'd
> still expect vacuuming to be able to hold the bloat to some small integer
> multiple of the minimum possible table size.  (And if the table is small,
> that's still small.)  I suppose really long transactions (pg_dump?) could
> be pretty disastrous, but there are ways around that, like doing pg_dump
> on a slave.
Is there any theoretical obstacle which would make it impossible to
teach VACUUM not to hold back the whole vacuum horizon, but just
to leave a single transaction alone in case of a long-running
REPEATABLE READ transaction ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic Ltd



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Kouhei Kaigai
> -Original Message-
> From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp]
> Sent: Monday, August 01, 2016 8:26 PM
> To: Ashutosh Bapat
> Cc: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown 
> plans
> 
> On 2016/07/29 13:28, Ashutosh Bapat wrote:
> 
> I wrote:
> > Probably something like this:
> >
> >Foreign Processing
> >  Remote Operations: ...
> >
> > In the Remote Operations line, the FDW/extension could print
> > any info
> > about remote operations, eg, "Scan/Join + Aggregate".
> 
> I wrote:
> > I intentionally chose that word and thought we could leave detailed
> > descriptions about remote operations to the FDW/extension; a broader
> > word like "Processing" seems to work well because we allow various
> > kinds of operations to the remote side, in addition to scans/joins,
> > to be performed in that one Foreign Scan node indicated by "Foreign
> > Processing", such as aggregation, window functions, distinct, order
> > by, row locking, table modification, or combinations of them.
> 
> > "Scan" is a better word than "Processing". From plan's perspective it's
> > ultimately a Scan (on the data produced by the foreign server) and not
> > processing.
> 
> Exactly, but one thing I'm concerned about is the table modification
> case; the EXPLAIN output would be something like this:
> 
>Foreign Scan
>  Remote SQL: INSERT INTO remote_table VALUES ...
> 
> That would be strange, so I think a more broader word is better.  I
> don't think "Foreign Processing" is best.  "Foreign Upper" might be much
> better because the corresponding path is created by calling
> GetForeignUpperPaths.
>
Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
the ForeignScan node actually insert tuples.
From the standpoint of users, it looks "Foreign Insert".

> Also for a Foreign Scan representing a foreign join, I think "Foreign
> Join" is better than "Foreign Scan".  Here is an example:
> 
>Foreign Join on foreign_table1, foreign_table2
>  Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2
> WHERE ...
> 
> I think "Foreign Join" better indicates that foreign tables listed after
> that (ie, foreign_table1 and foreign_table2 in the example) are joining
> (or component) tables of this join, than "Foreign Scan".
>
Postgres_fdw knows it is remote join. It is quite easy to tell the core
this ForeignScan node is "Join". Also, it knows which tables are involved in.
We can add hint information to control relations name to be printed.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why we lost Uber as a user

2016-08-01 Thread Stephen Frost
* Hannu Krosing (hkros...@gmail.com) wrote:
> On 07/27/2016 12:07 AM, Tom Lane wrote:
> >
> >> 4. Now, update that small table 500 times per second.
> >> That's a recipe for runaway table bloat; VACUUM can't do much because
> >> there's always some minutes-old transaction hanging around (and SNAPSHOT
> >> TOO OLD doesn't really help, we're talking about minutes here), and
> >> because of all of the indexes HOT isn't effective.
> > Hm, I'm not following why this is a disaster.  OK, you have circa 100%
> > turnover of the table in the lifespan of the slower transactions, but I'd
> > still expect vacuuming to be able to hold the bloat to some small integer
> > multiple of the minimum possible table size.  (And if the table is small,
> > that's still small.)  I suppose really long transactions (pg_dump?) could
> > be pretty disastrous, but there are ways around that, like doing pg_dump
> > on a slave.
> Is there any theoretical obstacle which would make it impossible to
> teach VACUUM not to hold back the whole vacuum horizon, but just
> to leave a single transaction alone in case of a long-running
> REPEATABLE READ transaction ?

I've looked into this a couple of times and I believe it's possible to
calculate what records have to remain available for the long-running
transaction, but it's far from trivial.

I do think that's a direction which we really need to go in, however.
Having a single horizon which is dictated by the oldest running
transaction isn't a tenable solution in environments with a lot of
churn.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Why we lost Uber as a user

2016-08-01 Thread Tom Lane
Stephen Frost  writes:
> * Hannu Krosing (hkros...@gmail.com) wrote:
>> Is there any theoretical obstacle which would make it impossible to
>> teach VACUUM not to hold back the whole vacuum horizon, but just
>> to leave a single transaction alone in case of a long-running
>> REPEATABLE READ transaction ?

> I've looked into this a couple of times and I believe it's possible to
> calculate what records have to remain available for the long-running
> transaction, but it's far from trivial.

I think it'd become a lot easier if we went over to representing snapshots
as LSN positions (and, concomitantly, had an inexpensive way to translate
XIDs to their commit LSNs).  That would mean that

(1) a backend's snapshot state could be fully exposed in PGPROC, at least
up to some small number of active snapshots;

(2) it'd be fairly cheap for VACUUM to detect that a dead tuple's XMIN and
XMAX are either both before or both after each live snapshot.

Someone (Heikki, I think) has been working on this but I've not seen
any patch yet.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-01 Thread Stephen Frost
Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Yeah.  I'm not very impressed by the underlying assumption that it's
> >> okay for client-side code to hard-wire knowledge about what indoption
> >> bits mean, but not okay for it to hard-wire knowledge about which index
> >> AMs use which indoption bits.  There's something fundamentally wrong
> >> in that.  We don't let psql or pg_dump look directly at indoption, so
> >> why would we think that third-party client-side code should do so?
> 
> > For my 2c, I'd like to see pg_dump able to use the catalog tables to
> > derive the index definition, just as they manage to figure out table
> > definitions without (for the most part) using functions.  More
> > generally, I believe we should be working to reach a point where we can
> > reconstruct all objects in the database using just the catalog, without
> > any SQL bits being provided from special functions which access
> > information that isn't available at the SQL level.
> 
> No, I reject that entirely.  It would be insane for example to expect that
> random client-side code should be able to interpret the node trees stored
> in places like pg_index.indexprs.  It's barely possible that we could
> maintain such logic in pg_dump, though having to maintain a different
> version for each supported server branch would be a giant PITA.  But do
> you also want to maintain translated-into-Java copies of each of those
> libraries for the benefit of JDBC?  Or any other language that client
> code might be written in?

Honestly, I anticipated the focus on the pg_get_expr() and should have
explicitly commented on it.  I agree that we shouldn't look to have
pg_dump or client utilities be able to understand node trees and that,
instead, we should continue to provide a way for those to be
reconstructed into SQL expressions.

> Now, obviously knowing which bit in pg_index.indoption does what would be
> a few orders of magnitude less of a maintenance hazard than knowing what
> expression node trees contain.  But that doesn't make it a good
> future-proof thing for clients to be doing.  If the answer to the question
> "why do you need access to pg_am.amcanorder?" is "so I can interpret the
> bits in pg_index.indoption", I think it's clear that we've got an
> abstraction failure that is not going to be fixed by just exposing
> something equivalent to the old pg_am definition.

I agree- asking clients to interpret the bits in pg_index.indoption
isn't the right answer either.

> Building on the has-property approach Andrew suggested, I wonder if
> we need something like pg_index_column_has_property(indexoid, colno,
> propertyname) with properties like "sortable", "desc", "nulls first".

Right, this makes sense to me.  The point which I was trying to get at
above is that we should be able to replace most of what is provided in
pg_get_indexdef() by using this function to rebuild the CREATE INDEX
command- again, similar to how we build a CREATE TABLE command rather
than simply provide a 'pg_get_tabledef()'.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-01 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Building on the has-property approach Andrew suggested, I wonder if
>> we need something like pg_index_column_has_property(indexoid, colno,
>> propertyname) with properties like "sortable", "desc", "nulls first".

> Right, this makes sense to me.  The point which I was trying to get at
> above is that we should be able to replace most of what is provided in
> pg_get_indexdef() by using this function to rebuild the CREATE INDEX
> command- again, similar to how we build a CREATE TABLE command rather
> than simply provide a 'pg_get_tabledef()'.

Uh, what would be the point?  You're just replacing a call to one backend
function with calls to N backend functions, and creating version
dependencies and opportunities for errors of omission on the client side.
(That is, there's exactly no chance that the set of functions you'd need
to call to construct a CREATE INDEX command would stay static forever.
We keep adding new index features.)

As far as I understood Andrew's use case, he was specifically *not*
interested in a complete representation of an index definition, but
rather about whether it had certain properties that would be of
interest to query-constructing applications.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-01 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Building on the has-property approach Andrew suggested, I wonder if
> >> we need something like pg_index_column_has_property(indexoid, colno,
> >> propertyname) with properties like "sortable", "desc", "nulls first".
> 
> > Right, this makes sense to me.  The point which I was trying to get at
> > above is that we should be able to replace most of what is provided in
> > pg_get_indexdef() by using this function to rebuild the CREATE INDEX
> > command- again, similar to how we build a CREATE TABLE command rather
> > than simply provide a 'pg_get_tabledef()'.
> 
> Uh, what would be the point?  You're just replacing a call to one backend
> function with calls to N backend functions, and creating version
> dependencies and opportunities for errors of omission on the client side.
> (That is, there's exactly no chance that the set of functions you'd need
> to call to construct a CREATE INDEX command would stay static forever.
> We keep adding new index features.)

We also keep adding table-level options too, and is therefore hardly a
reason to argue that we shouldn't provide the information through the
catalog for a client-side application to rebuild a table, as pg_dump
does.

> As far as I understood Andrew's use case, he was specifically *not*
> interested in a complete representation of an index definition, but
> rather about whether it had certain properties that would be of
> interest to query-constructing applications.

I'm not convinced that the two are actually different.  As we add new
index features, query-constructing applications may be interested in
those new features and therefore we should be exposing that information.
If we were using a capabilities function to build up the CREATE INDEX
command in pg_dump, we never would have ended up in the situation which
we find ourselves now- specifically, that we've removed information that
applications were using.

Consider the RLS case.  If we were using some hypothetical
pg_get_tabledef() in pg_dump, and that function handled everything
about building the table definition, we might not have considered how to
expose the policy information for RLS and could have stored things like
"what command is this policy for?" as an opaque column that clients
wouldn't easily understand.  That would have been unfortunate, as there
are clients which are definitely interested in the policies that have
been defined on tables, for auditing purposes.

In other words, for my 2c, pg_dump provides a great definition of what
we should provide in the way of database introspection and we should try
to minimize the cases where we're providing special server-side
functions that pg_dump needs to perform its role.  That this information
is needed by client applications and we don't provide an easy way to
programatically access it demonstrates how pg_get_indexdef() really went
too far in the direction of the server handing opaque SQL commands for
the client to run to recreate the object, without giving the client any
understanding of the definition of the object.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-08-01 Thread David G. Johnston
On Mon, Aug 1, 2016 at 10:19 AM, Tom Lane  wrote:

> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Building on the has-property approach Andrew suggested, I wonder if
> >> we need something like pg_index_column_has_property(indexoid, colno,
> >> propertyname) with properties like "sortable", "desc", "nulls first".
>
> > Right, this makes sense to me.  The point which I was trying to get at
> > above is that we should be able to replace most of what is provided in
> > pg_get_indexdef() by using this function to rebuild the CREATE INDEX
> > command- again, similar to how we build a CREATE TABLE command rather
> > than simply provide a 'pg_get_tabledef()'.
>
> As far as I understood Andrew's use case, he was specifically *not*
> interested in a complete representation of an index definition, but
> rather about whether it had certain properties that would be of
> interest to query-constructing applications.
>
>
+1

I guess it might matter whether the query be constructed is CREATE INDEX or
SELECT

The later seems to be more useful.  The former should probably be something
the server provides as a whole when requested.

David J.
​


Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-08-01 Thread Stephen Frost
Robbie, all,

* Robbie Harwood (rharw...@redhat.com) wrote:
> Michael Paquier  writes:
> > On Wed, Jul 27, 2016 at 12:22 AM, Robbie Harwood  
> > wrote:
> >> Michael Paquier  writes:
> >>
> >> So there's a connection setting `sslmode` that we'll want something
> >> similar to here (`gssapimode` or so).  `sslmode` has six settings, but I
> >> think we only need three for GSSAPI: "disable", "allow", and "prefer"
> >> (which presumably would be the default).
> >
> > Seeing the debate regarding sslmode these days, I would not say that
> > "prefer" would be the default, but that's an implementation detail.
> >
> >> Lets suppose we're working with "prefer".  GSSAPI will itself check two
> >> places for credentials: client keytab and ccache.  But if we don't find
> >> credentials there, we as the client have two options on how to proceed.
> >>
> >> - First, we could prompt for a password (and then call
> >>   gss_acquire_cred_with_password() to get credentials), presumably with
> >>   an empty password meaning to skip GSSAPI.  My memory is that the
> >>   current behavior for GSSAPI auth-only is to prompt for password if we
> >>   don't find credentials (and if it isn't, there's no reason not to
> >>   unless we're opposed to handling the password).
> >>
> >> - Second, we could skip GSSAPI and proceed with the next connection
> >>   method.  This might be confusing if the user is then prompted for a
> >>   password and expects it to be for GSSAPI, but we could probably make
> >>   it sensible.  I think I prefer the first option.
> >
> > Ah, right. I completely forgot that GSSAPI had its own handling of
> > passwords for users registered to it...
> >
> > Isn't this distinction a good point for not implementing "prefer",
> > "allow" or any equivalents? By that I mean that we should not have any
> > GSS connection mode that fallbacks to something else if the first one
> > fails. So we would live with the two following modes:
> > - "disable", to only try a non-GSS connection
> > - "enable", or "require", to only try a GSS connection.
> > That seems quite acceptable to me as a first implementation to just
> > have that.
> 
> If it is the password management that is scary here, we could have a
> prefer-type mode which does not prompt, but only uses existing
> credentials.  Or we could opt to never prompt, which is totally valid.

For my 2c, at least, I'd like the "prefer" option when it comes to
encryption where we try to use encryption if we're doing GSSAPI
authentication.  I'm not a big fan of having the GSSAPI layer doing
password prompts, but as long as the *only* thing that does is go
through the Kerberos library to acquire the tickets and still completely
talks GSSAPI with the server, it seems reasonable.

If we end up having to fall back to a non-encrypted GSSAPI
authenticated session then we should make noise about that.  If the
authentication is handled through GSSAPI but the connection is not
encrypted, but the user is told that immediately (eg: in the psql
startup), it seems like there's relativly little sensitive information
which has been exposed at that point and the user could destroy the
session then, if they're concerned about it.

Of course, that only works for user sessions, such as with psql, and
doesn't help with application connections, but hopefully application
authors who are using GSSAPI will read the docs sufficiently to know
that they should require an encrypted connection, if their environment
demands it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Broken order-of-operations in parallel query latch manipulation

2016-08-01 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Aug 1, 2016 at 1:58 AM, Tom Lane  wrote:
>> I believe this is wrong and the CHECK_FOR_INTERRUPTS needs to be before
>> or after the two latch operations.  As-is, if the reason somebody set
>> our latch was to get us to notice that a CHECK_FOR_INTERRUPTS condition
>> happened, there's a race condition where we'd fail to realize that.

> I could see that in nodeGather.c, it might fail to notice the SetLatch
> done by worker process or spuriously woken up due to SetLatch for some
> unrelated reason.  However, I don't see what problem it can cause
> apart from one extra loop cycle where it will try to process the tuple
> when actually there is no tuple in the queue.

Consider the following sequence of events:

1. gather_readnext reaches the WaitLatch, and is allowed to pass through
it for some unrelated reason, perhaps some long-since-handled SIGUSR1
from a worker process.

2. gather_readnext does CHECK_FOR_INTERRUPTS(), and sees nothing pending.

3. A SIGINT is received.  StatementCancelHandler sets QueryCancelPending
and does SetLatch(MyLatch).

4. gather_readnext does ResetLatch(MyLatch).

5. gather_readnext runs through its loop again, finds nothing to do, and
reaches the WaitLatch.

6. The process is now sleeping on its latch, and might sit there a long
time before noticing the pending query cancel.

Obviously the window for this race condition is pretty tight --- there's
not many instructions between steps 2 and 4.  But it can happen.  If
memory serves, we've had actual field reports for race condition bugs
where the window that was being hit wasn't much more than a single
instruction.

Also, it's entirely possible that the bug could be masked, if there was
another CHECK_FOR_INTERRUPTS lurking anywhere in the code called within
the loop.  That doesn't excuse this coding practice, though.

BTW, now that I look at it, CHECK_FOR_INTERRUPTS subsumes
HandleParallelMessages(), which means the direct call to the latter
at the top of gather_readnext's loop is pretty bogus.  I now think
the right fix in gather_readnext is to move the CHECK_FOR_INTERRUPTS
macro to the top of the loop, replacing that call.  The places in
shm_mq.c that have this issue should probably look like
ProcWaitForSignal, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Greg Stark  writes:
> I was originally going to suggest using a crc  to combine but iirc we
> changed hash_any() a while back and decided against using crc. I don't know
> if that was wise but wouldn't want to suggest relitigating that.

Nah, CRCs are designed to solve a different problem, ie detecting
single-bit and burst errors with high probability.  In particular, I don't
think they make any particular promises with respect to spreading changes
into all bits of the result.  That's important for our hash functions
because we usually take just the lowest N bits of the result as being
adequately randomized.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostgreSQL 10 kick-off

2016-08-01 Thread Peter Eisentraut
A few notes on the next PostgreSQL development cycle (the one after 9.6):

- After several rounds of consultations, it has been decided to change
the version numbering scheme.  Instead of X.Y.Z, future releases will
have a two-part version number M.N, where M is the major version number,
which changes with a major release, every year or so, and N is the minor
version number, which changes with a minor release, every few months or
so.  This does not change in any way development practices or approaches
to backward compatibility.  It merely makes the version numbering scheme
match existing development practices better.

- The next major release of PostgreSQL (after 9.6) will be known as
PostgreSQL 10.  (The actual version number of the first production
release will as before have a minor version number of .0, so the actual
release number will be 10.0.)  Again, this jump does not change any
policies or conventions on backward compatibility.

- Per [1], the first commit fest of the version 10 development cycle
starts on September 1.  In spite of 124 patches already showing at
, it seems like we have had a
good couple of months focusing on release work.  If you have been
hesitating because of that, now is surely the time to start submitting
any patches you want for consideration in September.

- We need a commit fest manager.  Volunteers, step forward!

- The branching of the Git tree is tentatively scheduled for August 15.
(There will be minor releases and a beta release on August 11.)


[1]:
https://wiki.postgresql.org/wiki/PgCon_2016_Developer_Meeting#9:55_-_10:05_.09Next_Release_Schedule_.09All

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Dean Rasheed  writes:
> On that subject, while looking at hashfunc.c, I spotted that
> hashint8() has a very obvious deficiency, which causes disastrous
> performance with certain inputs:

Well, if you're trying to squeeze 64 bits into a 32-bit result, there
are always going to be collisions somewhere.

> I'd suggest using hash_uint32() for values that fit in a 32-bit
> integer and hash_any() otherwise.

Perhaps, but this'd break existing hash indexes.  That might not be
a fatal objection, but if we're going to put users through that
I'd like to think a little bigger in terms of the benefits we get.
I've thought for some time that we needed to move to 64-bit hash function
results, because the size of problem that's reasonable to use a hash join
or hash aggregation for keeps increasing.  Maybe we should do that and fix
hashint8 as a side effect.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New version numbering practices

2016-08-01 Thread Tom Lane
As Peter mentioned in
https://www.postgresql.org/message-id/ba76aeb0-2f84-d180-268f-ea0f5ace4...@2ndquadrant.com
the decision has been taken to simplify our user-facing version numbering
system to be a two-component number.  Since there have been questions
about the details of that, I wanted to emphasize that we are not breaking
compatibility with code-facing version numbering.  In particular,
PG_VERSION_NUM and related representations will look like 1000xx, 1100xx,
etc in future branches, as though the second component were zero in an
old-style version number.

Somebody needs to come up with a patch implementing this changeover.
I will work on it if no one else feels motivated to (but I'd be just as
happy to let someone else do it).  If we do not have such a patch ready
to go when the 9.6 branch is made on Aug 15, I will probably transiently
stamp HEAD as 9.7 rather than have a situation where "version 10" appears
in a three-part version number.  (External code will need some cue as
to how to format displays from PG_VERSION_NUM, so we should have a hard
and fast rule that major >= 10 means new style.)

Also, it strikes me that we need a new convention for how we talk about
release branches informally.  Up to now, mentioning say "9.5" without
any further qualification in a PG-list message was usually sufficient
to indicate a branch number, but I do not think that will work so well
if one just writes "10".  I'm tempted to start writing branch numbers
as something like "PG10" or "v10".  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 kick-off

2016-08-01 Thread Simon Riggs
On 1 August 2016 at 16:25, Peter Eisentraut
 wrote:

> - The next major release of PostgreSQL (after 9.6) will be known as
> PostgreSQL 10.  (The actual version number of the first production
> release will as before have a minor version number of .0, so the actual
> release number will be 10.0.)  Again, this jump does not change any
> policies or conventions on backward compatibility.

My understanding is that for 10.0 we will have this define

#define PG_VERSION_NUM 10

and for 10.1, which is the first maintenance release this will change to

#define PG_VERSION_NUM 11

effectively allowing many future versions.

Can we confirm/refute these details now to make sure we are all in tune?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 kick-off

2016-08-01 Thread Simon Riggs
On 1 August 2016 at 17:04, Tom Lane  wrote:
> Simon Riggs  writes:
>> Can we confirm/refute these details now to make sure we are all in tune?
>
> See the other thread I started about that; please reserve this thread
> for discussions of general actions around starting the new development
> cycle.

Sure, no problem, our posts crossed. I see your other post confirms this anyway.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-01 Thread Bruce Momjian
On Mon, Aug  1, 2016 at 01:35:53PM +0200, Christoph Berg wrote:
> Re: Bruce Momjian 2016-07-30 <20160730181643.gd22...@momjian.us>
> > I also just applied a doc patch that increases case and spacing
> > consistency in the use of kB/MB/GB/TB.
> 
> Hi,
> 
> PostgreSQL uses the spaces inconsistently, though. pg_size_pretty uses spaces:
> 
> # select pg_size_pretty((2^20)::bigint);
>  pg_size_pretty
> 
>  1024 kB
> 
> SHOW does not:
> 
> # show work_mem;
>  work_mem
> ──
>  1MB

Yes, that is inconsistent.  I have updated my attached patch to remove
spaces between the number and the units --- see below.

> The SHOW output is formatted by _ShowOption() using 'INT64_FORMAT "%s"',
> via convert_from_base_unit(). The latter has a comment attached...
> /*
>  * Convert a value in some base unit to a human-friendly unit.  The output
>  * unit is chosen so that it's the greatest unit that can represent the value
>  * without loss.  For example, if the base unit is GUC_UNIT_KB, 1024 is
>  * converted to 1 MB, but 1025 is represented as 1025 kB.
>  */
> ... where the spaces are present again.
> 
> General typesetting standard seems to be "1 MB", i.e. to include a
> space between value and unit. (This would also be my preference.)
> 
> Opinions? (I'd opt to insert spaces in the docs now, and then see if
> inserting a space in the SHOW output is acceptable for 10.0.)

I went through the docs a few days ago and committed a change to removed
spaces between the number and units in the few cases that had them ---
the majority didn't have spaces.

Looking at the Wikipedia article I posted earlier, that also doesn't use
spaces:

https://en.wikipedia.org/wiki/Binary_prefix

I think the only argument _for_ spaces is the output of pg_size_pretty()
now looks odd, e.g.:

   10 | 10 bytes   | -10 bytes
 1000 | 1000 bytes | -1000 bytes
  100 | 977KB  | -977KB
   10 | 954MB  | -954MB
1 | 931GB  | -931GB
 1000 | 909TB  | -909TB
^ ^

The issue is that we output "10 bytes", not "10bytes", but for units we
use "977KB".  That seems inconsistent, but it is the normal policy
people use.  I think this is because "977KB" is really "977K bytes", but
we just append the "B" after the "K" for bevity.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
diff --git a/configure b/configure
new file mode 100755
index b49cc11..8466e5a
*** a/configure
--- b/configure
*** Optional Packages:
*** 1502,1511 
--with-libs=DIRSalternative spelling of --with-libraries
--with-pgport=PORTNUM   set default port number [5432]
--with-blocksize=BLOCKSIZE
!   set table block size in kB [8]
--with-segsize=SEGSIZE  set table segment size in GB [1]
--with-wal-blocksize=BLOCKSIZE
!   set WAL block size in kB [8]
--with-wal-segsize=SEGSIZE
set WAL segment size in MB [16]
--with-CC=CMD   set compiler (deprecated)
--- 1502,1511 
--with-libs=DIRSalternative spelling of --with-libraries
--with-pgport=PORTNUM   set default port number [5432]
--with-blocksize=BLOCKSIZE
!   set table block size in KB [8]
--with-segsize=SEGSIZE  set table segment size in GB [1]
--with-wal-blocksize=BLOCKSIZE
!   set WAL block size in KB [8]
--with-wal-segsize=SEGSIZE
set WAL segment size in MB [16]
--with-CC=CMD   set compiler (deprecated)
*** case ${blocksize} in
*** 3550,3557 
   32) BLCKSZ=32768;;
*) as_fn_error $? "Invalid block size. Allowed values are 1,2,4,8,16,32." "$LINENO" 5
  esac
! { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${blocksize}kB" >&5
! $as_echo "${blocksize}kB" >&6; }
  
  
  cat >>confdefs.h <<_ACEOF
--- 3550,3557 
   32) BLCKSZ=32768;;
*) as_fn_error $? "Invalid block size. Allowed values are 1,2,4,8,16,32." "$LINENO" 5
  esac
! { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${blocksize}KB" >&5
! $as_echo "${blocksize}KB" >&6; }
  
  
  cat >>confdefs.h <<_ACEOF
*** case ${wal_blocksize} in
*** 3638,3645 
   64) XLOG_BLCKSZ=65536;;
*) as_fn_error $? "Invalid WAL block size. Allowed values are 1,2,4,8,16,32,64." "$LINENO" 5
  esac
! { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${wal_blocksize}kB" >&5
! $as_echo "${wal_blocksize}kB" >&6; }
  
  
  cat >>confdefs.h <<_ACEOF
--- 3638,3645 
   64) XLOG_BLCKSZ=65536;;
*) as_fn_error $? "Invalid WAL block size. Allowed values are 1,2,4,8,16,32,64." "$LINENO" 5
  esac
! { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${wal_blocksize}KB" >&5
! $as_echo "${

Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-08-01 Thread Bruce Momjian
On Thu, Jul 21, 2016 at 09:49:33AM -0400, Tom Lane wrote:
> David Fetter  writes:
> > Please find attached a patch which makes it possible to disallow
> > UPDATEs and DELETEs which lack a WHERE clause.  As this changes query
> > behavior, I've made the new GUCs PGC_SUSET.
> 
> > What say?
> 
> -1.  This is an express violation of the SQL standard, and at least the
> UPDATE case has reasonable use-cases.  Moreover, if your desire is to have
> training wheels for SQL, there are any number of other well-known gotchas
> that are just as dangerous, for example ye olde unintentionally-correlated
> subselect:
> https://www.postgresql.org/message-id/20160714135233.1410.92538%40wrigleys.postgresql.org

I am hoping for a "novice" mode that issues warnings about possible
bugs, e.g. unintentionally-correlated subselect, and this could be part
of that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables

2016-08-01 Thread Robert Haas
On Thu, Jul 28, 2016 at 3:44 AM, Fujii Masao  wrote:
 Sure looks that way from here.  Copy-and-paste from the previous
 line in pg_proc.h, perhaps?
>>
>>> Yes, that's clearly wrong.
>
> Attached patch (pg_replication_origin_xact_reset_9.6.patch) fixes this.
> We need to apply this at least before RC1 of PostgreSQL9.6 will be released
> because the patch needs the change of catalog version.
>
>>> Damn.  Can't fix that for 9.5 anymore. The
>>> function isn't all that important (especially not from SQL), but still,
>>> that's annoying.  I'm inclined to just remove the args in 9.6. We could
>>> also add a note to the 9.5 docs, adding that the arguments are there by
>>> error?
>
> What about the attched patch (pg_replication_origin_xact_reset_9.5.patch)?

I think you should apply these ASAP.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 10 kick-off

2016-08-01 Thread Fabrízio de Royes Mello
On Mon, Aug 1, 2016 at 12:25 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:
>
> - We need a commit fest manager.  Volunteers, step forward!
>

What knowledge is expected for a CFM? I'm really would like to help and
also learn more about our development.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] MSVC pl-perl error message is not verbose enough

2016-08-01 Thread Robert Haas
On Tue, Jul 26, 2016 at 9:44 PM, Michael Paquier
 wrote:
> On Wed, Jul 27, 2016 at 12:41 AM, John Harvey
>  wrote:
>> Because of this, I've submitted a small patch which fixes the verbosity of
>> the error message to actually explain what's missing.  I hope that this
>> patch will be considered for the community, and it would be nice if it was
>> back-patched.
>
> Improving this error message a bit looks like a good idea to me.
> Looking at the code to figure out what build.pl is looking for is a
> bit a pain for users just willing to compile the code, so if we can
> avoid such lookups with a cheap way, let's do it.
>
> Instead of your patch, I'd suggest saving $solution->{options}->{perl}
> . '\lib\CORE\perl*.lib' in a variable and then raise an error based on
> that. See attached.

Did you add this to the next CommitFest?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Constraint merge and not valid status

2016-08-01 Thread Robert Haas
On Fri, Jul 22, 2016 at 1:10 AM, Amit Langote
 wrote:
> On 2016/07/22 0:38, Robert Haas wrote:
>> On Wed, Jul 13, 2016 at 5:22 AM, Amit Langote
>>  wrote:
>>> Consider a scenario where one adds a *valid* constraint on a inheritance
>>> parent which is then merged with a child table's *not valid* constraint
>>> during inheritance recursion.  If merged, the constraint is not checked
>>> for the child data even though it may have some.  Is that an oversight?
>>
>> Seems like it.  I'd recommend we just error out in that case and tell
>> the user that they should validate the child's constraint first.
>
> Agreed.
>
> Patch attached.  In addition to the recursion from parent case, this seems
> to be broken for the alter table child inherit parent case as well. So,
> fixed both MergeWithExistingConstraint (called from
> AddRelationNewConstraints) and MergeConstraintsIntoExisting (called from
> ATExecAddInherit). I had to add a new argument is_not_valid to the former
> to signal whether the constraint being propagated itself is declared NOT
> VALID, in which we can proceed with merging.  Also added some tests for
> both cases.

Well, on second thought, I'm no longer sure that this approach makes
sense.  I mean, it's obviously wrong for constraint-merging to change
the validity marking on a constraint, but that does not necessarily
imply that we shouldn't merge the constraints, does it?  I see the
downthread discussion saying that it's a problem if the parent's
constraint is marked valid while the child's constraint isn't, but I
don't quite understand why that situation would cause trouble.  In
other words, I see that the current situation is not good, but I'm not
sure I understand what's going on here well enough to be confident
that any of the proposed fixes are correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread David Fetter
On Mon, Aug 01, 2016 at 11:49:41AM -0400, Tom Lane wrote:
> As Peter mentioned in
> https://www.postgresql.org/message-id/ba76aeb0-2f84-d180-268f-ea0f5ace4...@2ndquadrant.com
> the decision has been taken to simplify our user-facing version numbering
> system to be a two-component number.  Since there have been questions
> about the details of that, I wanted to emphasize that we are not breaking
> compatibility with code-facing version numbering.  In particular,
> PG_VERSION_NUM and related representations will look like 1000xx, 1100xx,
> etc in future branches, as though the second component were zero in an
> old-style version number.
> 
> Somebody needs to come up with a patch implementing this changeover.
> I will work on it if no one else feels motivated to (but I'd be just as
> happy to let someone else do it).  If we do not have such a patch ready
> to go when the 9.6 branch is made on Aug 15, I will probably transiently
> stamp HEAD as 9.7 rather than have a situation where "version 10" appears
> in a three-part version number.  (External code will need some cue as
> to how to format displays from PG_VERSION_NUM, so we should have a hard
> and fast rule that major >= 10 means new style.)
> 
> Also, it strikes me that we need a new convention for how we talk about
> release branches informally.  Up to now, mentioning say "9.5" without
> any further qualification in a PG-list message was usually sufficient
> to indicate a branch number, but I do not think that will work so well
> if one just writes "10".  I'm tempted to start writing branch numbers
> as something like "PG10" or "v10".  Thoughts?

I don't see 10 as ambiguous.  It's clear what's being talked about,
now that the decision has been made.

Best,
David.

"This one goes up to 11."
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-01 Thread Karan Sikka
Hi pgsql-hackers,

Following the patch to implement strpos with Boyer-Moore-Horspool,
it was proposed we bring BMH to LIKE as well.

Original thread:
https://www.postgresql.org/message-id/flat/27645.1220635769%40sss.pgh.pa.us#27645.1220635...@sss.pgh.pa.us

I'm a first time hacker and I found this task on the TODO list. It seemed
interesting and isolated enough. But after looking at the code in
like_match.c, it's clearly a non-trivial task.

How desirable is this feature? To begin answering that question,
I did some initial benchmarking with an English text corpus to find how much
faster BMH (Boyer-Moore-Horspool) would be compared to LIKE, and the results
were as follows: BMH can be up to 20% faster than LIKE, but for short
substrings, it's roughly comparable or slower.

Here are the results visualized:

http://ctrl-c.club/~ksikka/pg/like-strpos-short-1469975400.png
http://ctrl-c.club/~ksikka/pg/like-strpos-long-1469975400.png

Data attached, and description of the benchmark below.

I'd love to hear your thoughts:
- is the benchmark valid? am I missing anything?
- what conclusions can we draw from the results?
- what action should we take from here?
- is this the right mailing list for this discussion?

Thank you!
- Karan, pg community newbie


--- Benchmark Details ---

The easiest way to approximate the potential speedup from BMH,
is to compare the performance of the following queries:

1. SELECT count(*) FROM test_table WHERE text LIKE '%substring%';
2. SELECT count(*) FROM test_table WHERE strpos('substring', text) > 0;

We expect the strpos query to outperform the LIKE query since strpos is
implemented with BMH.

I loaded up the database with chunks of english text from the bible, a
commonly
used corpus for simple text analysis. The exact procedure is described in
more
detail at the end of the email. I ran a few queries, using short substrings
and
long substrings, the choice of which is discussed in more detail below.

## Choice of test data

BMH is known to be particularly fast on english text, so I loaded the test
table
with 5k-character chunks of text from the bible. BMH is expected to be
slower
for small substrings where the overhead of creating a skip table may not be
justified. For larger substrings, BMH may outperform LIKE due to the skip
table.
In the best case, if a text character does not exist in the substring, the
substring can jump length-of-substring characters, skipping what would be a
lot
of work.

I chose small (< 15 character) substrings to be the most popular bigrams and
trigrams in the text.  I chose long (10-250 character) substrings at
random, and
took varied-length prefixes and suffixes to see how it affected algorithm
performance. The reason that prefixes were not sufficient is that BMH
compares
from the right end of the substring, unlike the LIKE algorithm.  The full
strings are included in the attached excel file.

## Database setup

Download the corpus (englishTexts) from
http://www.dmi.unict.it/~faro/smart/corpus.php

Create the table:

CREATE TABLE test_table (text text);

Generate the rows for insertion (generates chunks of 5k characters):

wget http://ctrl-c.club/~ksikka/pg/gen_rows.py
python gen_rows.py path/to/englishTexts/bible.txt > path/to/output/file

Load the table:

COPY test_table (text) from '/absolute/path/to/previous/output/file'
WITH ENCODING 'UTF-8';

I ran the COPY command 21 times to exacerbate performance.

Queries were timed with psql's \timing feature. The mean of five queries was
reported.

## Hardware

Apple Macbook Air (Mid 2012)
CPU: 1.8 GHz Intel Core i5
RAM: 4 GB 1600 MHz DDR3
import io
import re
import sys
import random

## Configuration ##
CHUNK_LEN = 5000

# *fix, ie prefix, suffix.
starfix_lengths = [
10,
50,
90,
130,
170,
210,
250
]

def copyEscape(ustring):
"""
From the COPY docs:
Backslash characters (\) can be used in the COPY data to
quote data characters that might otherwise be taken as
row or column delimiters.
In particular, the following characters must be preceded
by a backslash if they appear as part of a column value:
backslash itself, newline, carriage return, and the
current delimiter character.

:LINK: https://www.postgresql.org/docs/9.1/static/sql-copy.html
"""
return re.sub(r'([\\\r\n\t])', r'\\\1', ustring, flags=re.UNICODE)

if __name__ == '__main__':
input_filename = sys.argv[1]
input_stream = io.open(input_filename, mode='r', encoding='Latin-1')

prefix_mode = False
suffix_mode = False
if sys.argv[2] == '--prefix':
prefix_mode = True
if sys.argv[2] == '--suffix':
suffix_mode = True

chunks = []

chunk = None
while chunk is None or len(chunk) == CHUNK_LEN:
chunk = input_stream \
.read(CHUNK_LEN)

if chunk == '':
break

chunks.append(chunk)

if prefix_mode or suffix_mode:
   

Re: [HACKERS] PostgreSQL 10 kick-off

2016-08-01 Thread Tom Lane
Simon Riggs  writes:
> Can we confirm/refute these details now to make sure we are all in tune?

See the other thread I started about that; please reserve this thread
for discussions of general actions around starting the new development
cycle.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Alvaro Herrera
Tom Lane wrote:

> Also, it strikes me that we need a new convention for how we talk about
> release branches informally.  Up to now, mentioning say "9.5" without
> any further qualification in a PG-list message was usually sufficient
> to indicate a branch number, but I do not think that will work so well
> if one just writes "10".  I'm tempted to start writing branch numbers
> as something like "PG10" or "v10".  Thoughts?

I agree that writing just "10" might be confusing in some places, though
I also agree with dfetter than it might be obvious in other contexts.
Either "pg10" or "v10" look good to me.  Capitalizing it as PG10 is okay
though I'm not sure that most would bother (I probably wouldn't).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Any need of GRANT/REVOKE CREATE TABLE | POLICY | ETC

2016-08-01 Thread Robert Haas
On Tue, Jul 19, 2016 at 2:59 AM, Haribabu Kommi
 wrote:
> During the discussion of supporting multi tenancy with the help of
> row level security, because of some problems of executing any
> policy that was created by an unprivileged user [1].
>
> To avoid that problem, If we have some kind of new mechanism to
> GRANT/REVOKE only CREATE POLICY from all the unprivileged
> users by providing other CREATE access to them.
>
> I just want to know is there any other such requirements that if such
> option is available, it would be good or not? I don't know whether
> this option is possible or not? If any such requirements are present
> other than POLICY, i would like to analyze and propose a patch for
> the same.

Well, there are lots of things that users can do that might cause
security exposures for other users.  I think it's fair to say that
PostgreSQL - like many other systems, really - is going to work best
when either all of the users trust each other or when they are
well-isolated from each other (i.e. in separate databases).  If one
user can get another to execute code, then that second user can usurp
the privileges of the first user.  CREATE POLICY provides one way of
accomplishing that, but it's not necessarily qualitatively different
from any other mechanisms that we've had for a long time: CREATE
TRIGGER, CREATE FUNCTION, CREATE AGGREGATE, CREATE OPERATOR, and
CREATE VIEW are all plausible ways of inducing another user to run
your evil, malicious code.

So I don't think that the right solution is to restrict CREATE POLICY
in particular.  It might be useful to have some kind of general system
for limiting the ability of certain users to run certain commands, but
I'm not sure how much demand there really is for such a thing.  I
think it's already possible using ProcessUtilityHook, if you're
willing to write a small amount of C code.  Do our users want more?
Very possibly.  Do I know exactly what they want?  No, I don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-01 Thread Tom Lane
Tomas Vondra  writes:
> What is needed is an overview of the approach, so that the reviewers can 
> read that first, instead of assembling the knowledge from pieces 
> scattered over comments in many pieces. But I see the fasttab.c contains 
> this:

> /* TODO TODO comment the general idea - in-memory tuples and indexes, 
> hooks principle, FasttabSnapshots, etc */

A fairly common answer when some feature needs an implementation overview
is to create a README file for it, or add a new section in an existing
README file.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Constraint merge and not valid status

2016-08-01 Thread Tom Lane
Robert Haas  writes:
> Well, on second thought, I'm no longer sure that this approach makes
> sense.  I mean, it's obviously wrong for constraint-merging to change
> the validity marking on a constraint, but that does not necessarily
> imply that we shouldn't merge the constraints, does it?  I see the
> downthread discussion saying that it's a problem if the parent's
> constraint is marked valid while the child's constraint isn't, but I
> don't quite understand why that situation would cause trouble.  In
> other words, I see that the current situation is not good, but I'm not
> sure I understand what's going on here well enough to be confident
> that any of the proposed fixes are correct.

The point I think is that a valid CHECK constraint on a parent table
should imply that all rows fetched by "SELECT * FROM table" will pass
the check.  Therefore, a situation with valid parent constraint and
not-valid child constraint is bad because it might allow some rows
fetched by an inheritance scan to not pass the check.  Quite aside from
any user-level expectations, this could break planner optimizations.

I'd be satisfied with the upthread proposal "throw error if the child has
a matching not-valid constraint".  Allowing the merge if both child
and new parent constraint are not-valid is all right as an extension,
but it seems like a feature with a mighty narrow use case, and I would
not go far out of our way to support it.  Causing the command to not
merge but instead create a new duplicate child constraint seems like a
seriously bad idea (though I'm not sure that anyone was advocating for
that).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Tom Lane
David Fetter  writes:
> On Mon, Aug 01, 2016 at 11:49:41AM -0400, Tom Lane wrote:
>> Also, it strikes me that we need a new convention for how we talk about
>> release branches informally.  Up to now, mentioning say "9.5" without
>> any further qualification in a PG-list message was usually sufficient
>> to indicate a branch number, but I do not think that will work so well
>> if one just writes "10".  I'm tempted to start writing branch numbers
>> as something like "PG10" or "v10".  Thoughts?

> I don't see 10 as ambiguous.  It's clear what's being talked about,
> now that the decision has been made.

It's clear what's being talked about as long as you already know that
it is a version number.  But it seems to me that we have often relied
on the "x.y" notation itself to indicate that a version number is meant.
Consider someone writing "I'm doing that in 10."  Did he mean he's
writing a patch for version 10, or he's going to do that 10 minutes from
now, or what?  Over the past couple of months I have already found myself
writing "10.0" or "9.7^H^H^H10" to make it clear that I meant the next
release version, because just "10" seemed too ambiguous.  Maybe I'm
worried about nothing and the ambiguity mostly came from our not having
settled the two-or-three-part-version-number question, but I'm not sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables

2016-08-01 Thread Andres Freund
On 2016-07-28 16:44:37 +0900, Fujii Masao wrote:
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index d6ed0ce..0a3d3de 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -17519,7 +17519,7 @@ postgres=# SELECT * FROM 
> pg_xlogfile_name_offset(pg_stop_backup());
>  
>   pg_replication_origin_xact_reset
>  
> -
> pg_replication_origin_xact_reset()
> +
> pg_replication_origin_xact_reset(origin_lsn
>  pg_lsn, origin_timestamp 
> timestamptz)
> 
> 
>  void
> @@ -17527,6 +17527,12 @@ postgres=# SELECT * FROM 
> pg_xlogfile_name_offset(pg_stop_backup());
> 
>  Cancel the effects of
>  pg_replication_origin_xact_setup().
> +Note that two arguments were introduced by mistake
> +during the PostgreSQL 9.5 development cycle while
> +pg_replication_origin_xact_reset() actually
> +doesn't use them at all. Therefore, any dummy values like
> +NULL can be safely specified as the arguments.
> +This mistake will be fixed in a future release.
> 
>

I don't think NULL works, the function is marked as strict. Otherwise
that looks right.

Thanks,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables

2016-08-01 Thread Andres Freund
Hi Fujii,

On 2016-07-28 16:44:37 +0900, Fujii Masao wrote:
> On Sat, Jul 2, 2016 at 7:01 AM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> On 2016-06-30 10:14:04 -0400, Tom Lane wrote:
> >>> Fujii Masao  writes:
>  As far as I read the code of the function, those arguments don't seem to
>  be necessary. So I'm afraid that the pg_proc entry for the function might
>  be incorrect and those two arguments should be removed from the 
>  definition.
> >
> >>> Sure looks that way from here.  Copy-and-paste from the previous
> >>> line in pg_proc.h, perhaps?
> >
> >> Yes, that's clearly wrong.
> 
> Attached patch (pg_replication_origin_xact_reset_9.6.patch) fixes this.
> We need to apply this at least before RC1 of PostgreSQL9.6 will be released
> because the patch needs the change of catalog version.
> 
> >> Damn.  Can't fix that for 9.5 anymore. The
> >> function isn't all that important (especially not from SQL), but still,
> >> that's annoying.  I'm inclined to just remove the args in 9.6. We could
> >> also add a note to the 9.5 docs, adding that the arguments are there by
> >> error?
> 
> What about the attched patch (pg_replication_origin_xact_reset_9.5.patch)?

except for the strictness remark in the other email, these look sane to
me.  Do you want to push them?  I'll do so by Wednesday otherwise, to
leave some room before the next RC.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_replication_origin_xact_reset() and its argument variables

2016-08-01 Thread Fujii Masao
On Tue, Aug 2, 2016 at 2:48 AM, Andres Freund  wrote:
> Hi Fujii,
>
> On 2016-07-28 16:44:37 +0900, Fujii Masao wrote:
>> On Sat, Jul 2, 2016 at 7:01 AM, Tom Lane  wrote:
>> > Andres Freund  writes:
>> >> On 2016-06-30 10:14:04 -0400, Tom Lane wrote:
>> >>> Fujii Masao  writes:
>>  As far as I read the code of the function, those arguments don't seem to
>>  be necessary. So I'm afraid that the pg_proc entry for the function 
>>  might
>>  be incorrect and those two arguments should be removed from the 
>>  definition.
>> >
>> >>> Sure looks that way from here.  Copy-and-paste from the previous
>> >>> line in pg_proc.h, perhaps?
>> >
>> >> Yes, that's clearly wrong.
>>
>> Attached patch (pg_replication_origin_xact_reset_9.6.patch) fixes this.
>> We need to apply this at least before RC1 of PostgreSQL9.6 will be released
>> because the patch needs the change of catalog version.
>>
>> >> Damn.  Can't fix that for 9.5 anymore. The
>> >> function isn't all that important (especially not from SQL), but still,
>> >> that's annoying.  I'm inclined to just remove the args in 9.6. We could
>> >> also add a note to the 9.5 docs, adding that the arguments are there by
>> >> error?
>>
>> What about the attched patch (pg_replication_origin_xact_reset_9.5.patch)?
>
> except for the strictness remark in the other email,

Yes, you're right. My careless mistake... :(

> these look sane to
> me.  Do you want to push them?  I'll do so by Wednesday otherwise, to
> leave some room before the next RC.

Could you do that if possible?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread David G. Johnston
On Mon, Aug 1, 2016 at 1:41 PM, Tom Lane  wrote:

> Over the past couple of months I have already found myself
> writing "10.0" or "9.7^H^H^H10" to make it clear that I meant the next
> release version, because just "10" seemed too ambiguous.


​I thought that was just (and maybe some instances were) humor regarding
the general indecisiveness on the issue.​


>   Maybe I'm
> worried about nothing and the ambiguity mostly came from our not having
> settled the two-or-three-part-version-number question, but I'm not sure.
>

​I think this dynamic will sort itself out.
​
I suspect I'll end up using 10.x somewhat frequently though I'm mostly on
the lists.  I suspect the choice will be dependent on context and channel.

David J.


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Tom Lane
"David G. Johnston"  writes:
> I suspect I'll end up using 10.x somewhat frequently though I'm mostly on
> the lists.  I suspect the choice will be dependent on context and channel.

Hmm, that seems like a workable answer as well, and one that's traceable
to our past habits.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HandleParallelMessages contains CHECK_FOR_INTERRUPTS?

2016-08-01 Thread Tom Lane
$SUBJECT seems like a pretty bad idea, because it implies a recursive
entry to ProcessInterrupts and thence to HandleParallelMessages itself.
By what reasoning is that call necessary where it's placed?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup wish list

2016-08-01 Thread Jeff Janes
On Thu, Jul 28, 2016 at 4:44 AM, Amit Kapila  wrote:
> On Tue, Jul 26, 2016 at 11:58 AM, Fujii Masao  wrote:
>> On Wed, Jul 13, 2016 at 3:06 AM, Jeff Janes  wrote:
>>> On Tue, Jul 12, 2016 at 10:48 AM, Peter Eisentraut
>>>  wrote:
 On 7/12/16 12:53 PM, Jeff Janes wrote:
> The --help message for pg_basebackup says:
>
> -Z, --compress=0-9 compress tar output with given compression level
>
> But -Z0 is then rejected as 'invalid compression level "0"'.  The real
> docs do say 1-9, only the --help message has this bug.  Trivial patch
> attached.

 pg_dump --help and man page say it supports 0..9.  Maybe we should make
 that more consistent.
>>>
>>> pg_dump actually does support -Z0, though.  Well, sort of.  It outputs
>>> plain text.  Rather than plain text wrapped in some kind of dummy gzip
>>> header, which is what I had naively expected.
>>>
>>> Is that what -Z0 in pg_basebackup should do as well, just output
>>> uncompressed tar data, and not add the ".gz" to the "base.tar" file
>>> name?
>>
>> Yes, I think. What about the attached patch?
>>
>
> What if user tries to use -Z 0 with format as tar, won't it generate
> base.tar without any compression?  I am not sure if that is what Jeff
> intends to say in his proposal.

My initial proposal was just to change the "usage" message to match reality.

I think the current proposal is to make -Z0 be identical to having no
-Z specified at all, in other words produce a .tar file, not a .tar.gz
file.

I had thought we could make a .gz file which didn't actually use
compression, just packaged up the data behind a gzip header, but after
looking at it I don't think libz actually supports that.  Plus, it
would be pretty silly to have uncompressed data that would then have
to be "uncompressed" merely to unwrap it.  It could be useful for
testing where you don't want to write for special cases in your shell
script (which is where I discovered this, I wanted to test all values
between 0 and 9 and see which was fastest given my combination of CPU,
network, data and disks), but not useful for practical use.

I am happy with the code as currently committed.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-01 Thread Peter Eisentraut
On 7/30/16 2:16 PM, Bruce Momjian wrote:
> The second patch does what Tom suggests above by outputting only "KB",
> and it supports "kB" for backward compatibility.  What it doesn't do is
> to allow arbitrary case, which I think would be a step backward.  The
> second patch actually does match the JEDEC standard, except for allowing
> "kB".

If we're going to make changes, why not bite the bullet and output KiB?

I have never heard of JEDEC, so I'm less inclined to accept their
"standard" at this point.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-01 Thread Peter Eisentraut
On 7/30/16 1:18 AM, Pavel Stehule wrote:
> We talked about this issue, when I wrote function pg_size_bytes. It is
> hard to fix these functions after years of usage. The new set of
> functions can be better
> 
> pg_iso_size_pretty();
> pg_iso_size_bytes();

One thing that would actually be nice for other reasons as well is a
version of pg_size_pretty() that lets you specify the output unit, say,
as a second argument.  Because sometimes you want to compare two tables
or something, and tells you one is 3GB and the other is 783MB, which
doesn't really help.  If I tell it to use 'MB' as the output unit, I
could get comparable output.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> "David G. Johnston"  writes:
> > I suspect I'll end up using 10.x somewhat frequently though I'm mostly on
> > the lists.  I suspect the choice will be dependent on context and channel.
> 
> Hmm, that seems like a workable answer as well, and one that's traceable
> to our past habits.

For my 2c, I'd kind of prefer v10, but I could live with 10.x.

Not sure that I have any real reason for that preference other than
'v10' is slightly shorter and seems more 'right', to me.  Perhaps
because '10.x' implies a *released* version to me (10.1, 10.2, 10.3),
whereas you asked about a *branch*, which would generally include some
patches past the latest point release.

In other words, "are you going to back-patch this to 10.x?" doesn't seem
quite right, whereas "are you going to back-patch this to v10?" lines up
correctly in my head, but I don't hold that distinction very closely and
either would work.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-01 Thread Bruce Momjian
On Mon, Aug  1, 2016 at 02:48:55PM -0400, Peter Eisentraut wrote:
> On 7/30/16 2:16 PM, Bruce Momjian wrote:
> > The second patch does what Tom suggests above by outputting only "KB",
> > and it supports "kB" for backward compatibility.  What it doesn't do is
> > to allow arbitrary case, which I think would be a step backward.  The
> > second patch actually does match the JEDEC standard, except for allowing
> > "kB".
> 
> If we're going to make changes, why not bite the bullet and output KiB?
> 
> I have never heard of JEDEC, so I'm less inclined to accept their
> "standard" at this point.

I already address this.  While I have never heard of JEDEC either, I
have seen KB, and have never seen KiB, hence my argument that KiB would
lead to more confusion than we have now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_size_pretty, SHOW, and spaces

2016-08-01 Thread Peter Eisentraut
On 8/1/16 7:35 AM, Christoph Berg wrote:
> PostgreSQL uses the spaces inconsistently, though. pg_size_pretty uses spaces:
> 
> # select pg_size_pretty((2^20)::bigint);
>  pg_size_pretty
> 
>  1024 kB

because it's "pretty" :)

> SHOW does not:
> 
> # show work_mem;
>  work_mem
> ──
>  1MB

The original idea might have been to allow that value to be passed back
into the settings system, without having to quote the space.  I'm not
sure, but I think changing that might cause some annoyance.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread David Fetter
On Mon, Aug 01, 2016 at 02:52:04PM -0400, Stephen Frost wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > "David G. Johnston"  writes:
> > > I suspect I'll end up using 10.x somewhat frequently though I'm mostly on
> > > the lists.  I suspect the choice will be dependent on context and channel.
> > 
> > Hmm, that seems like a workable answer as well, and one that's traceable
> > to our past habits.
> 
> For my 2c, I'd kind of prefer v10, but I could live with 10.x.
> 
> Not sure that I have any real reason for that preference other than
> 'v10' is slightly shorter and seems more 'right', to me.

10 is even shorter, and when we get to 15, it seems like it'll be
pretty silly still to be referring to the 9.x series.

> In other words, "are you going to back-patch this to 10.x?" doesn't
> seem quite right, whereas "are you going to back-patch this to v10?"
> lines up correctly in my head, but I don't hold that distinction
> very closely and either would work.

What's wrong with, "Are you going to back-patch this to 10?"

Bear in mind that this sentence first makes sense once we've got a new
branch for 11, gets more likely as we have 12 and 13, then drops,
after that, all the way to 0 when we hit 16, which by my calculation
should be in the 2020s.  Some of the people who will be our major
contributors then are in high school now, and will just be puzzled and
vaguely annoyed by references to the old system.

Now, when we're changing the visible version number, seems like the
time to break fully with the idea that our major version numbers have
two parts.  We'll still be referring, with decreasing frequency, to
9.6, 9.5, 9.4, etc., but there's good reason not to carry that idea
forward now that we're no longer doing it.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Stephen Frost
David,

* David Fetter (da...@fetter.org) wrote:
> On Mon, Aug 01, 2016 at 02:52:04PM -0400, Stephen Frost wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > "David G. Johnston"  writes:
> > > > I suspect I'll end up using 10.x somewhat frequently though I'm mostly 
> > > > on
> > > > the lists.  I suspect the choice will be dependent on context and 
> > > > channel.
> > > 
> > > Hmm, that seems like a workable answer as well, and one that's traceable
> > > to our past habits.
> > 
> > For my 2c, I'd kind of prefer v10, but I could live with 10.x.
> > 
> > Not sure that I have any real reason for that preference other than
> > 'v10' is slightly shorter and seems more 'right', to me.
> 
> 10 is even shorter, and when we get to 15, it seems like it'll be
> pretty silly still to be referring to the 9.x series.
> 
> > In other words, "are you going to back-patch this to 10.x?" doesn't
> > seem quite right, whereas "are you going to back-patch this to v10?"
> > lines up correctly in my head, but I don't hold that distinction
> > very closely and either would work.
> 
> What's wrong with, "Are you going to back-patch this to 10?"

It can end up being ambiguous, as Tom already pointed out.

> Bear in mind that this sentence first makes sense once we've got a new
> branch for 11, gets more likely as we have 12 and 13, then drops,
> after that, all the way to 0 when we hit 16, which by my calculation
> should be in the 2020s.  Some of the people who will be our major
> contributors then are in high school now, and will just be puzzled and
> vaguely annoyed by references to the old system.

I don't see referring to a single-digit version number as 'v11' or 'v15'
instead of '15' to be some kind of reference to the "old system" but
rather a way of distinguishing a version or branch identifier from being
some other value.  This discussion about "v10" vs. "10.x" hasn't
actually got anything to do with the prior three-digit "9.4.x" or "9.4"
system but has everything to do with what we're going to say going
forward.

> Now, when we're changing the visible version number, seems like the
> time to break fully with the idea that our major version numbers have
> two parts.  We'll still be referring, with decreasing frequency, to
> 9.6, 9.5, 9.4, etc., but there's good reason not to carry that idea
> forward now that we're no longer doing it.

The notion of "10.x" doesn't refer to a two-digit major version, it
refers to a single-digit major version with multiple minor releases,
which we will certainly have, so I don't understand where you're coming
from here.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Andrew Dunstan



On 08/01/2016 11:49 AM, Tom Lane wrote:


Also, it strikes me that we need a new convention for how we talk about
release branches informally.  Up to now, mentioning say "9.5" without
any further qualification in a PG-list message was usually sufficient
to indicate a branch number, but I do not think that will work so well
if one just writes "10".  I'm tempted to start writing branch numbers
as something like "PG10" or "v10".  Thoughts?





Somewhat related is how we name the git branches. It would help me from 
a buildfarm POV if we kept lexically them sortable, which could be done 
at least for the next 90 major releases :-) by adding an underscore 
after the REL piece, thus: REL_10_STABLE. I realise that's a way off, 
but it's worth bringing up while we're discussing the topic.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Tom Lane
I noticed $subject while fooling around with the tqueue.c memory leak
issues.  This does not seem like a good idea to me.  At the very least,
it's a waste of space that could be used for something else, and at the
worst, it might be a security issue because it leaves security-sensitive
pg_hba and pg_ident information laying about in places where it might be
recoverable (if only through memory-disclosure bugs, which we've had
before and no doubt will have again).

The reason is that the parallel worker launch path contains no equivalent
of PostgresMain's stanza

if (PostmasterContext)
{
MemoryContextDelete(PostmasterContext);
PostmasterContext = NULL;
}

Now, I'm undecided whether to flush that context only in parallel workers,
or to try to make it go away for all bgworkers of any stripe.  The latter
seems a little better from a security standpoint, but I wonder if anyone
has a use-case where that'd be a bad idea?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HandleParallelMessages contains CHECK_FOR_INTERRUPTS?

2016-08-01 Thread Alvaro Herrera
Tom Lane wrote:
> $SUBJECT seems like a pretty bad idea, because it implies a recursive
> entry to ProcessInterrupts and thence to HandleParallelMessages itself.
> By what reasoning is that call necessary where it's placed?

I notice you just removed the CHECK_FOR_INTERRUPTS in
HandleParallelMessages().  Did you notice that HandleParallelMessages
calls shm_mq_receive(), which calls shm_mq_receive_bytes(), which
contains a CHECK_FOR_INTERRUPTS() call?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Tom Lane
Andrew Dunstan  writes:
> Somewhat related is how we name the git branches. It would help me from 
> a buildfarm POV if we kept lexically them sortable, which could be done 
> at least for the next 90 major releases :-) by adding an underscore 
> after the REL piece, thus: REL_10_STABLE. I realise that's a way off, 
> but it's worth bringing up while we're discussing the topic.

Hmm, sounds a bit C-locale-centric, but I have no objection to inserting
an underscore there if it seems helpful.

What I thought would be worth discussing is whether to continue using the
"_STABLE" suffix.  It seems rather like a noise word for our purposes.
OTOH, dropping it might be a headache for scripts that deal with branch
names --- any thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HandleParallelMessages contains CHECK_FOR_INTERRUPTS?

2016-08-01 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> $SUBJECT seems like a pretty bad idea, because it implies a recursive
>> entry to ProcessInterrupts and thence to HandleParallelMessages itself.
>> By what reasoning is that call necessary where it's placed?

> I notice you just removed the CHECK_FOR_INTERRUPTS in
> HandleParallelMessages().  Did you notice that HandleParallelMessages
> calls shm_mq_receive(), which calls shm_mq_receive_bytes(), which
> contains a CHECK_FOR_INTERRUPTS() call?

I figured there were probably other cases of recursion; it's unlikely
we could prevent them altogether.  But a check in a function that's
called *only* from ProcessInterrupts seems pretty dubious.

I wonder whether we should make use of HOLD_INTERRUPTS/RESUME_INTERRUPTS
to avoid the recursion scenario here.  It's not really clear to me whether
this stack of function would behave sanely if it's invoked recursively
when the outer level is partway through reading a message.  At the very
least, it seems like there might be a risk for out-of-order message
processing (inner recursion reads and processes next message while outer
level has read but not yet processed previous message).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 4:18 PM, Tom Lane  wrote:
> I noticed $subject while fooling around with the tqueue.c memory leak
> issues.  This does not seem like a good idea to me.  At the very least,
> it's a waste of space that could be used for something else, and at the
> worst, it might be a security issue because it leaves security-sensitive
> pg_hba and pg_ident information laying about in places where it might be
> recoverable (if only through memory-disclosure bugs, which we've had
> before and no doubt will have again).
>
> The reason is that the parallel worker launch path contains no equivalent
> of PostgresMain's stanza
>
> if (PostmasterContext)
> {
> MemoryContextDelete(PostmasterContext);
> PostmasterContext = NULL;
> }
>
> Now, I'm undecided whether to flush that context only in parallel workers,
> or to try to make it go away for all bgworkers of any stripe.  The latter
> seems a little better from a security standpoint, but I wonder if anyone
> has a use-case where that'd be a bad idea?

I think it would be better to get rid of it in all bgworkers.

(Also vaguely on the list of things to clean up: can't we make it so
that bgworkers aren't launched from inside a signal handler?  Blech.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 1:19 PM, Karan Sikka  wrote:
> Following the patch to implement strpos with Boyer-Moore-Horspool,
> it was proposed we bring BMH to LIKE as well.
>
> Original thread:
> https://www.postgresql.org/message-id/flat/27645.1220635769%40sss.pgh.pa.us#27645.1220635...@sss.pgh.pa.us
>
> I'm a first time hacker and I found this task on the TODO list. It seemed
> interesting and isolated enough. But after looking at the code in
> like_match.c, it's clearly a non-trivial task.
>
> How desirable is this feature? To begin answering that question,
> I did some initial benchmarking with an English text corpus to find how much
> faster BMH (Boyer-Moore-Horspool) would be compared to LIKE, and the results
> were as follows: BMH can be up to 20% faster than LIKE, but for short
> substrings, it's roughly comparable or slower.
>
> Here are the results visualized:
>
> http://ctrl-c.club/~ksikka/pg/like-strpos-short-1469975400.png
> http://ctrl-c.club/~ksikka/pg/like-strpos-long-1469975400.png

Based on these results, this looks to me like a pretty unexciting
thing upon which to spend time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Andres Freund
On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> (Also vaguely on the list of things to clean up: can't we make it so
> that bgworkers aren't launched from inside a signal handler?  Blech.)

Isn't pretty much everything on-demand below postmaster started from a
signal handler?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Tom Lane
Robert Haas  writes:
> (Also vaguely on the list of things to clean up: can't we make it so
> that bgworkers aren't launched from inside a signal handler?  Blech.)

So are other postmaster children, I believe.  We could probably try
to rewrite the postmaster to not do useful work in signal handlers,
but rely on a lot of volatile flags set by the handlers.  Not convinced
this would be anything but a cosmetic improvement, though.  And it
could create new portability problems to replace any that it removed;
we'd have to be *absolutely* certain that the main select() call would
return with EINTR rather than resuming after any interrupt.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-08-01 Thread Peter Geoghegan
As some of you know, I've been working on parallel sort. I think I've
gone as long as I can without feedback on the design (and I see that
we're accepting stuff for September CF now), so I'd like to share what
I came up with. This project is something that I've worked on
inconsistently since late last year. It can be thought of as the
Postgres 10 follow-up to the 9.6 work on external sorting.

Attached WIP patch series:

* Adds a parallel sorting capability to tuplesort.c.

* Adds a new client of this capability: btbuild()/nbtsort.c can now
create B-Trees in parallel.

Most of the complexity here relates to the first item; the tuplesort
module has been extended to support sorting in parallel. This is
usable in principle by every existing tuplesort caller, without any
restriction imposed by the newly expanded tuplesort.h interface. So,
for example, randomAccess MinimalTuple support has been added,
although it goes unused for now.

I went with CREATE INDEX as the first client of parallel sort in part
because the cost model and so on can be relatively straightforward.
Even CLUSTER uses the optimizer to determine if a sort strategy is
appropriate, and that would need to be taught about parallelism if its
tuplesort is to be parallelized. I suppose that I'll probably try to
get CLUSTER (with a tuplesort) done in the Postgres 10 development
cycle too, but not just yet.

For now, I would prefer to focus discussion on tuplesort itself. If
you can only look at one part of this patch, please look at the
high-level description of the interface/caller contract that was added
to tuplesort.h.

Performance
===

Without further ado, I'll demonstrate how the patch series improves
performance in one case. This benchmark was run on an AWS server with
many disks. A d2.4xlarge instance was used, with 16 vCPUs, 122 GiB
RAM, 12 x 2 TB HDDs, running Amazon Linux. Apparently, this AWS
instance type can sustain 1,750 MB/second of I/O, which I was able to
verify during testing (when a parallel sequential scan ran, iotop
reported read throughput slightly above that for multi-second bursts).
Disks were configured in software RAID0. These instances have disks
that are optimized for sequential performance, which suits the patch
quite well. I don't usually trust AWS EC2 for performance testing, but
it seemed to work well here (results were pretty consistent).

Setup:

CREATE TABLE parallel_sort_test AS
SELECT hashint8(i) randint,
md5(i::text) collate "C" padding1,
md5(i::text || '2') collate "C" padding2
FROM generate_series(0, 1e9::bigint) i;

CHECKPOINT;

This leaves us with a parallel_sort_test table that is 94 GB in size.

SET maintenance_work_mem = '8GB';

-- Serial case (external sort, should closely match master branch):
CREATE INDEX serial_idx ON parallel_sort_test (randint) WITH
(parallel_workers = 0);

Total time: 00:15:42.15

-- Patch with 8 tuplesort "sort-and-scan" workers (leader process
participates as a worker here):
CREATE INDEX patch_8_idx ON parallel_sort_test (randint) WITH
(parallel_workers = 7);

Total time: 00:06:03.86

As you can see, the parallel case is 2.58x faster (while using more
memory, though it's not the case that a higher maintenance_work_mem
setting speeds up the serial/baseline index build). 8 workers are a
bit faster than 4, but not by much (not shown). 16 are a bit slower,
but not by much (not shown).

trace_sort output for "serial_idx" case:
"""
begin index sort: unique = f, workMem = 8388608, randomAccess = f
switching to external sort with 501 tapes: CPU 7.81s/25.54u sec
elapsed 33.95 sec
*** SNIP ***
performsort done (except 7-way final merge): CPU 53.52s/666.89u sec
elapsed 731.67 sec
external sort ended, 2443786 disk blocks used: CPU 74.40s/854.52u sec
elapsed 942.15 sec
"""

trace_sort output for "patch_8_idx" case:
"""
begin index sort: unique = f, workMem = 8388608, randomAccess = f
*** SNIP ***
sized memtuples 1.62x from worker's 130254158 (3052832 KB) to
210895910 (4942873 KB) for leader merge (0 KB batch memory conserved)
*** SNIP ***
tape -1/7 initially used 411907 KB of 430693 KB batch (0.956) and
26361986 out of 26361987 slots (1.000)
performsort done (except 8-way final merge): CPU 12.28s/101.76u sec
elapsed 129.01 sec
parallel external sort ended, 2443805 disk blocks used: CPU
30.08s/318.15u sec elapsed 363.86 sec
"""

This is roughly the degree of improvement that I expected when I first
undertook this project late last year. As I go into in more detail
below, I believe that we haven't exhausted all avenues to make
parallel CREATE INDEX faster still, but I do think what's left on the
table is not enormous.

There is less benefit when sorting on a C locale text attribute,
because the overhead of merging dominates parallel sorts, and that's
even more pronounced with text. So, many text cases tend to work out
at about only 2x - 2.2x faster. We could work on this indirectly.

I've seen cases where a CREATE INDEX ended up more than 3x faster,
though. I benchmarked 

Re: [HACKERS] Combining hash values

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 11:27 AM, Tom Lane  wrote:
> Dean Rasheed  writes:
>> On that subject, while looking at hashfunc.c, I spotted that
>> hashint8() has a very obvious deficiency, which causes disastrous
>> performance with certain inputs:
>
> Well, if you're trying to squeeze 64 bits into a 32-bit result, there
> are always going to be collisions somewhere.
>
>> I'd suggest using hash_uint32() for values that fit in a 32-bit
>> integer and hash_any() otherwise.
>
> Perhaps, but this'd break existing hash indexes.  That might not be
> a fatal objection, but if we're going to put users through that
> I'd like to think a little bigger in terms of the benefits we get.
> I've thought for some time that we needed to move to 64-bit hash function
> results, because the size of problem that's reasonable to use a hash join
> or hash aggregation for keeps increasing.  Maybe we should do that and fix
> hashint8 as a side effect.

Well, considering that Amit is working on makes hash indexes
WAL-logged in v10[1], this seems like an awfully good time to get any
breakage we want to do out of the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] 
https://www.postgresql.org/message-id/caa4ek1lfzczyxloxs874ad0+s-zm60u9bwcyiuzx9mhz-kc...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining hash values

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 7:24 AM, Dean Rasheed  wrote:
> On 1 August 2016 at 08:19, Greg Stark  wrote:
>> Surely combining multiple hashes is the same problem as hashing a block of
>> memory? Shouldn't we just use the same algorithm as hash_any()?
>
> Yes, I imagine that should work well. I suspect that Thomas's proposal
> would also probably work well, as would a number of other hashing
> algorithms with reasonable pedigree, such as the one used for array
> hashing. I don't have any particular preference, but I do know that
> what usually turns out to be disastrous is an arbitrary made-up
> formula like rotating and xor'ing. The last thing we should attempt to
> do is invent our own hashing algorithms.

+1.  (x << 1) | y isn't the stupidest way of combining hash values
anybody's ever invented, but there are surely others that are better.
I don't much care whether we adopt Thomas's proposal or Greg's or
something else, but I can't see why we'd stick with (x << 1) | y when
better approaches are known.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Tom Lane
Andres Freund  writes:
> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
>> (Also vaguely on the list of things to clean up: can't we make it so
>> that bgworkers aren't launched from inside a signal handler?  Blech.)

> Isn't pretty much everything on-demand below postmaster started from a
> signal handler?

I think it depends.  As an example, maybe_start_bgworker is called
from PostmasterMain, *and* from ServerLoop, *and* from reaper,
*and* from sigusr1_handler.  That's likely excessive, but it's what
we've got at the moment.

I'm pretty sure regular backends are only launched from ServerLoop,
but for anything else it's uncertain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combining hash values

2016-08-01 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 1, 2016 at 11:27 AM, Tom Lane  wrote:
>> Perhaps, but this'd break existing hash indexes.  That might not be
>> a fatal objection, but if we're going to put users through that
>> I'd like to think a little bigger in terms of the benefits we get.
>> I've thought for some time that we needed to move to 64-bit hash function
>> results, because the size of problem that's reasonable to use a hash join
>> or hash aggregation for keeps increasing.  Maybe we should do that and fix
>> hashint8 as a side effect.

> Well, considering that Amit is working on makes hash indexes
> WAL-logged in v10[1], this seems like an awfully good time to get any
> breakage we want to do out of the way.

Yeah, the compatibility stakes would go up quite a bit as soon as that
happens, because then users might start using hash indexes without the
expectation of having to rebuild them at random.

(BTW, this line of reasoning also says that if Amit finds he needs to
twiddle the on-disk format a bit to make WAL logging work, it would not
be a problem.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >> (Also vaguely on the list of things to clean up: can't we make it so
> >> that bgworkers aren't launched from inside a signal handler?  Blech.)
> 
> > Isn't pretty much everything on-demand below postmaster started from a
> > signal handler?
> 
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

As I recall, each of those calls correspond to some particular need;
keep in mind that bgworkers can request to be started at a few different
points: either right at postmaster start, or when consistent state is
reached on a standby, or when recovery is finished.
maybe_start_bgworker only starts one bgworker, and it also sets a flag
so that ServerLoop will call it another time if there are pending
workers for the same timing event.  That explains the four calls to
maybe_start_bgworker, and as I recall removing any of these would break
some possible usage.  And yes, I did put two of these in signal handlers
precisely because that is postmaster's longstanding practice.

(It's perhaps possible to remove the call from ServerLoop if you make
maybe_start_bgworker process all of them at once instead, but as I
recall we decided to do only one at a time so that ServerLoop had a
chance to run other control logic in between, just in case.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Andres Freund
On 2016-08-01 18:28:44 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >> (Also vaguely on the list of things to clean up: can't we make it so
> >> that bgworkers aren't launched from inside a signal handler?  Blech.)
> 
> > Isn't pretty much everything on-demand below postmaster started from a
> > signal handler?
> 
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

Personally I think the whole logic should be reworked so we do most of
that that only from one place. Especially the signal handler stuff
should imo just be replaced by setting latch, which then does the work
inside the normal main loop.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Tom Lane
Andres Freund  writes:
> Personally I think the whole logic should be reworked so we do most of
> that that only from one place.

Alvaro already mentioned a couple of reasons why that might not be so
easy.

> Especially the signal handler stuff
> should imo just be replaced by setting latch, which then does the work
> inside the normal main loop.

Of course, then we're utterly dependent on the latch logic to be
zero-failure, as any bug in it takes out the postmaster.  That idea
would've been rejected out of hand earlier in the development of
the latch code.  Maybe it's safe enough today, but I still wonder what
it is that we're buying after we do such a massive rewrite of the
postmaster.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Thomas Munro
On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
>>> (Also vaguely on the list of things to clean up: can't we make it so
>>> that bgworkers aren't launched from inside a signal handler?  Blech.)
>
>> Isn't pretty much everything on-demand below postmaster started from a
>> signal handler?
>
> I think it depends.  As an example, maybe_start_bgworker is called
> from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> *and* from sigusr1_handler.  That's likely excessive, but it's what
> we've got at the moment.

I found this apparently unresolved bug report about glibc fork()
inside a signal handler deadlocking:

https://sourceware.org/bugzilla/show_bug.cgi?id=4737

I wonder if that could bite postmaster.  It's interesting because
comments 16 and 19 and 22 suggest that it may not be fixed.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Michael Paquier
On Tue, Aug 2, 2016 at 5:25 AM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Somewhat related is how we name the git branches. It would help me from
>> a buildfarm POV if we kept lexically them sortable, which could be done
>> at least for the next 90 major releases :-) by adding an underscore
>> after the REL piece, thus: REL_10_STABLE. I realise that's a way off,
>> but it's worth bringing up while we're discussing the topic.
>
> Hmm, sounds a bit C-locale-centric, but I have no objection to inserting
> an underscore there if it seems helpful.
>
> What I thought would be worth discussing is whether to continue using the
> "_STABLE" suffix.  It seems rather like a noise word for our purposes.
> OTOH, dropping it might be a headache for scripts that deal with branch
> names --- any thoughts?

I would have thought that REL10_STABLE is the best balance between
what we have now and the future numbering system.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Andres Freund
On 2016-08-02 11:27:25 +1200, Thomas Munro wrote:
> On Tue, Aug 2, 2016 at 10:28 AM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> On 2016-08-01 18:09:03 -0400, Robert Haas wrote:
> >>> (Also vaguely on the list of things to clean up: can't we make it so
> >>> that bgworkers aren't launched from inside a signal handler?  Blech.)
> >
> >> Isn't pretty much everything on-demand below postmaster started from a
> >> signal handler?
> >
> > I think it depends.  As an example, maybe_start_bgworker is called
> > from PostmasterMain, *and* from ServerLoop, *and* from reaper,
> > *and* from sigusr1_handler.  That's likely excessive, but it's what
> > we've got at the moment.
> 
> I found this apparently unresolved bug report about glibc fork()
> inside a signal handler deadlocking:
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
> 
> I wonder if that could bite postmaster.  It's interesting because
> comments 16 and 19 and 22 suggest that it may not be fixed.

Moreover the solution appears to be to define the problem away:
http://www.opengroup.org/austin/docs/austin_445.txt
https://www.opengroup.org/austin/docs/austin_446.txt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Tom Lane
Thomas Munro  writes:
> I found this apparently unresolved bug report about glibc fork()
> inside a signal handler deadlocking:
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737

> I wonder if that could bite postmaster.

I seriously doubt it.  The key thing about the postmaster is that
it runs with signals blocked practically everywhere.  So we're not
taking risks of a signal handler interrupting, say, malloc()
(which seemed to be the core of at least the first example given
in that report).  This is what makes me dubious that getting rid
of doing work in the postmaster's signal handlers is really going
to add any noticeable increment of safety.  It might make the
code look cleaner, but I'm afraid it's going to be a lot of churn
for not much gain.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-01 Thread Karan Sikka
I agree, should we remove it from the TODO list?

On Mon, Aug 1, 2016 at 6:13 PM, Robert Haas  wrote:

> On Mon, Aug 1, 2016 at 1:19 PM, Karan Sikka  wrote:
> > Following the patch to implement strpos with Boyer-Moore-Horspool,
> > it was proposed we bring BMH to LIKE as well.
> >
> > Original thread:
> >
> https://www.postgresql.org/message-id/flat/27645.1220635769%40sss.pgh.pa.us#27645.1220635...@sss.pgh.pa.us
> >
> > I'm a first time hacker and I found this task on the TODO list. It seemed
> > interesting and isolated enough. But after looking at the code in
> > like_match.c, it's clearly a non-trivial task.
> >
> > How desirable is this feature? To begin answering that question,
> > I did some initial benchmarking with an English text corpus to find how
> much
> > faster BMH (Boyer-Moore-Horspool) would be compared to LIKE, and the
> results
> > were as follows: BMH can be up to 20% faster than LIKE, but for short
> > substrings, it's roughly comparable or slower.
> >
> > Here are the results visualized:
> >
> > http://ctrl-c.club/~ksikka/pg/like-strpos-short-1469975400.png
> > http://ctrl-c.club/~ksikka/pg/like-strpos-long-1469975400.png
>
> Based on these results, this looks to me like a pretty unexciting
> thing upon which to spend time.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] New version numbering practices

2016-08-01 Thread Alvaro Herrera
Michael Paquier wrote:
> On Tue, Aug 2, 2016 at 5:25 AM, Tom Lane  wrote:
> > Andrew Dunstan  writes:
> >> Somewhat related is how we name the git branches. It would help me from
> >> a buildfarm POV if we kept lexically them sortable, which could be done
> >> at least for the next 90 major releases :-) by adding an underscore
> >> after the REL piece, thus: REL_10_STABLE. I realise that's a way off,
> >> but it's worth bringing up while we're discussing the topic.
> >
> > Hmm, sounds a bit C-locale-centric, but I have no objection to inserting
> > an underscore there if it seems helpful.
> >
> > What I thought would be worth discussing is whether to continue using the
> > "_STABLE" suffix.  It seems rather like a noise word for our purposes.
> > OTOH, dropping it might be a headache for scripts that deal with branch
> > names --- any thoughts?
> 
> I would have thought that REL10_STABLE is the best balance between
> what we have now and the future numbering system.

If we drop the STABLE then it's fairly easy to distinguish names from
the two-part majors era, and the one-part majors era (just check for
presence of the _STABLE suffix).  I don't see any value to the _STABLE
suffix, given the way we treat branches.

That said, I'm not opposed to REL_10 and so on.  In 89 years there will
be a problem with sorting REL_100 but I'm sure they can find a solution
then, if computers still need humans to write programs for them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Increasing timeout of poll_query_until for TAP tests

2016-08-01 Thread Michael Paquier
On Wed, Jul 27, 2016 at 10:00 AM, Michael Paquier
 wrote:
> On Mon, Jul 25, 2016 at 10:05 PM, Michael Paquier
>  wrote:
>> On Mon, Jul 25, 2016 at 2:52 PM, Michael Paquier
>>  wrote:
>>> Ah, yes, and that's a stupid mistake. We had better use
>>> replay_location instead of write_location. There is a risk that
>>> records have not been replayed yet even if they have been written on
>>> the standby, so it is possible that the query looking at tab_int may
>>> not see this relation.
>>
>> Or in short, the attached fixes 2) and will help providing input for 1)..
>
> Increasing the timeout had zero effect for test 003:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamster&dt=2016-07-26%2016%3A00%3A07
> So we're facing something else. I'll dig into that deeper using
> manually hamster.

And so I have been finally able to reproduce this one, and this is a
timing issue in the test.

First see the failure itself:
LOG:  recovery stopping after commit of transaction 548, time
2016-08-01 21:14:37.647104+09
LOG:  recovery has paused
HINT:  Execute pg_xlog_replay_resume() to continue.
LOG:  statement: SELECT '0/30122D0'::pg_lsn <= pg_last_xlog_replay_location()
[keeps waiting for this LSN to be replayed]

But by looking at the WAL records of this failed test I could see the following
rmgr: Transaction len (rec/tot):  8/34, tx:547, lsn:
0/03012248, prev 0/03012208, desc: COMMIT 2016-08-01 21:14:37.514805
JST
rmgr: Transaction len (rec/tot):  8/34, tx:548, lsn:
0/03012270, prev 0/03012248, desc: COMMIT 2016-08-01 21:14:37.647104
JST
rmgr: Standby len (rec/tot): 24/50, tx:  0, lsn:
0/03012298, prev 0/03012270, desc: RUNNING_XACTS nextXid 549
latestCompletedXid 548 oldestRunningXid 549
rmgr: Heaplen (rec/tot):  3/59, tx:549, lsn:
0/030122D0, prev 0/03012298, desc: INSERT off 193, blkref #0: rel
1663/12404/16384 blk 8
It is not a surprise if this keeps waiting forever. As the recovery is
paused before it cannot replay the wanted LSN.

Here using pg_xlog_replay_resume() is not the correct solution because
this would cause the node to finish recovery before we want it to, and
so is recovery_target_action = 'promote'. If we look at the test, it
is doing the following when getting the TXID that is used as recovery
target:
$node_master->safe_psql('postgres',
"INSERT INTO tab_int VALUES (generate_series(1001,2000))");
my $recovery_txid =
  $node_master->safe_psql('postgres', "SELECT txid_current()");
my $lsn2 =
  $node_master->safe_psql('postgres', "SELECT pg_current_xlog_location();");
What I think we had better do is reverse the calls
pg_current_xlog_location() and txid_current() so as we are sure that
the LSN we track for replay is lower than the real target LSN. The
same problem exists when defining the timestamp target.

The patch attached does that, and it fixes as well the issue with test
001 regarding write_location that should not be used. With this patch
I have let the test 003 run for two hours in a row and it did not
complain. Previously I was able to see a failure at the 3rd~4th
attempts, within 30 minutes.

There is still an issue with pg_basebackup when testing stream mode
and replication slots. I am digging into this one now..
-- 
Michael


recovery-test-fixes.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MSVC pl-perl error message is not verbose enough

2016-08-01 Thread Michael Paquier
On Tue, Aug 2, 2016 at 2:08 AM, Robert Haas  wrote:
> Did you add this to the next CommitFest?

I have added it here:
https://commitfest.postgresql.org/10/691/
John, it would be good if you could get a community account and add
your name to this patch as its author. I could not find you.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostmasterContext survives into parallel workers!?

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 7:42 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> I found this apparently unresolved bug report about glibc fork()
>> inside a signal handler deadlocking:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
>
>> I wonder if that could bite postmaster.
>
> I seriously doubt it.  The key thing about the postmaster is that
> it runs with signals blocked practically everywhere.  So we're not
> taking risks of a signal handler interrupting, say, malloc()
> (which seemed to be the core of at least the first example given
> in that report).  This is what makes me dubious that getting rid
> of doing work in the postmaster's signal handlers is really going
> to add any noticeable increment of safety.  It might make the
> code look cleaner, but I'm afraid it's going to be a lot of churn
> for not much gain.

It's not just a cosmetic issue.

See 
https://www.postgresql.org/message-id/CA+Tgmobr+Q2WgWeasdbDNefVwJkAGALxA=-vtegntqgl1v2...@mail.gmail.com
and d0410d66037c2f3f9bee45e0a2db9e47eeba2bb4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO item: Implement Boyer-Moore searching in LIKE queries

2016-08-01 Thread Robert Haas
On Mon, Aug 1, 2016 at 7:47 PM, Karan Sikka  wrote:
> I agree, should we remove it from the TODO list?

If nobody objects, sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] old_snapshot_threshold allows heap:toast disagreement

2016-08-01 Thread Robert Haas
On Sat, Jul 30, 2016 at 8:17 AM, Amit Kapila  wrote:
> On Fri, Jul 29, 2016 at 1:10 AM, Robert Haas  wrote:
>> On Wed, Jul 27, 2016 at 7:26 PM, Andres Freund  wrote:
>>
>> New version attached.
>
> +static inline void
> +InitToastSnapshot(Snapshot snapshot, XLogRecPtr lsn)
> +{
> + snapshot->satisfies = HeapTupleSatisfiesToast;
> + snapshot->lsn = lsn;
> +}
>
> Here, don't you need to initialize whenTaken as that is also used in
> TestForOldSnapshot_impl() to report error "snapshot too old".

Hmm, yeah.  This is actually a bit confusing.  We want the "oldest"
snapshot, but there are three different notions of "oldest":

1. Smallest LSN.
2. Smallest whenTaken.
3. Smallest xmin.

Which one do we use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Etsuro Fujita

On 2016/08/01 22:25, Kouhei Kaigai wrote:

I wrote:

a broader
word like "Processing" seems to work well because we allow various
kinds of operations to the remote side, in addition to scans/joins,
to be performed in that one Foreign Scan node indicated by "Foreign
Processing", such as aggregation, window functions, distinct, order
by, row locking, table modification, or combinations of them.


>> On 2016/07/29 13:28, Ashutosh Bapat wrote:

"Scan" is a better word than "Processing". From plan's perspective it's
ultimately a Scan (on the data produced by the foreign server) and not
processing.


I wrote:

Exactly, but one thing I'm concerned about is the table modification
case; the EXPLAIN output would be something like this:

   Foreign Scan
 Remote SQL: INSERT INTO remote_table VALUES ...

That would be strange, so I think a more broader word is better.  I
don't think "Foreign Processing" is best.  "Foreign Upper" might be much
better because the corresponding path is created by calling
GetForeignUpperPaths.



Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
the ForeignScan node actually insert tuples.
From the standpoint of users, it looks "Foreign Insert".


My concern here is EXPLAIN for foreign joins.  I think it's another 
problem how we handle Foreign Scan plan nodes representing 
post-scan/join operations in EXPLAIN, so I'd like to leave that for 
another patch.


I wrote:

Also for a Foreign Scan representing a foreign join, I think "Foreign
Join" is better than "Foreign Scan".  Here is an example:

   Foreign Join on foreign_table1, foreign_table2
 Remote SQL: SELECT ... FROM remote_table1 INNER JOIN remote_table2
WHERE ...

I think "Foreign Join" better indicates that foreign tables listed after
that (ie, foreign_table1 and foreign_table2 in the example) are joining
(or component) tables of this join, than "Foreign Scan".



Postgres_fdw knows it is remote join. It is quite easy to tell the core
this ForeignScan node is "Join". Also, it knows which tables are involved in.


Yeah, we can do that.


We can add hint information to control relations name to be printed.


For foreign joins, it would make sense to add such a functionality when 
necessary, for example when we extend the pushdown feature so that we 
can do what you proposed upthread.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wrong defeinition of pq_putmessage_noblock since 9.5

2016-08-01 Thread Kyotaro HORIGUCHI
At Fri, 29 Jul 2016 13:00:50 -0400, Tom Lane  wrote in 
<29430.1469811...@sss.pgh.pa.us>
> I wrote:
> > Kyotaro HORIGUCHI  writes:
> >> Any work in this area is likely 10.0 material at this point.
> 
> > I'm not really happy with that, since refactoring it again will create
> > back-patch hazards.  But I see that a lot of the mess here was created
> > in 9.5, which means we're probably stuck with back-patch hazards anyway.
> 
> I've pushed Kyotaro-san's original patch, which is clearly a bug fix.
> I think the additional changes discussed later in this thread are
> cosmetic, and probably should wait for a more general review of the
> layering decisions in pqcomm.c.

Agreed. It's not an abstraction or API, but it actually works
with no problem and changing it is an issue obviously for later
discussion.

Anyway thank you for committing this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Kouhei Kaigai
> On 2016/08/01 22:25, Kouhei Kaigai wrote:
> 
> I wrote:
> >>> a broader
> >>> word like "Processing" seems to work well because we allow various
> >>> kinds of operations to the remote side, in addition to scans/joins,
> >>> to be performed in that one Foreign Scan node indicated by "Foreign
> >>> Processing", such as aggregation, window functions, distinct, order
> >>> by, row locking, table modification, or combinations of them.
> 
>  >> On 2016/07/29 13:28, Ashutosh Bapat wrote:
> >>> "Scan" is a better word than "Processing". From plan's perspective it's
> >>> ultimately a Scan (on the data produced by the foreign server) and not
> >>> processing.
> 
> I wrote:
> >> Exactly, but one thing I'm concerned about is the table modification
> >> case; the EXPLAIN output would be something like this:
> >>
> >>Foreign Scan
> >>  Remote SQL: INSERT INTO remote_table VALUES ...
> >>
> >> That would be strange, so I think a more broader word is better.  I
> >> don't think "Foreign Processing" is best.  "Foreign Upper" might be much
> >> better because the corresponding path is created by calling
> >> GetForeignUpperPaths.
> 
> > Be "Foreign %s", and gives "Insert" label by postgres_fdw; which knows
> > the ForeignScan node actually insert tuples.
> > From the standpoint of users, it looks "Foreign Insert".
> 
> My concern here is EXPLAIN for foreign joins.  I think it's another
> problem how we handle Foreign Scan plan nodes representing
> post-scan/join operations in EXPLAIN, so I'd like to leave that for
> another patch.
>
What is the post-scan/join operations? Are you saying EPQ recheck and
alternative local join plan?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Comment typo in tuplesort.c

2016-08-01 Thread Amit Langote
Attached patch fixes a minor typo in tuplesort.c

s/child content/child context/g

Thanks,
Amit
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 4c502bb..6756f26 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -661,7 +661,7 @@ tuplesort_begin_common(int workMem, bool randomAccess)
 	/*
 	 * Caller tuple (e.g. IndexTuple) memory context.
 	 *
-	 * A dedicated child content used exclusively for caller passed tuples
+	 * A dedicated child context used exclusively for caller passed tuples
 	 * eases memory management.  Resetting at key points reduces
 	 * fragmentation. Note that the memtuples array of SortTuples is allocated
 	 * in the parent context, not this context, because there is no need to

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wrong defeinition of pq_putmessage_noblock since 9.5

2016-08-01 Thread Kyotaro HORIGUCHI
At Fri, 29 Jul 2016 11:58:04 -0400, Tom Lane  wrote in 
<14846.1469807...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > The many of the socket_* functions are required to be replaced
> > with mq_* functions for backgroud workers. So reverting the names
> > of socket_* functions should be accompanied by the need to, for
> > example, changing their callers to use them explicitly via
> > PqCommMethods, not hiding with macros, or renaming current pq_*
> > macros to, say, pqi_. (it stands for PQ-Indirect as a tentative)
> > I'm not confident on the new prefix since I'm not sure that what
> > the 'pq' stands for. (Postgres Query?)
> 
> Hmm.  Of course the problem with this approach is that we end up touching
> a whole bunch of call sites, which sort of negates the point of having
> installed a compatibility macro layer.
> 
> [ spends awhile longer looking around at this code ]
> 
> Maybe the real problem here is that the abstraction layer is so incomplete
> and apparently haphazard, so that it's not very obvious where you should
> use a pq_xxx name and where you should refer to socket_xxx.

Yes, 'haphazard' is the word I was seeking for.

> For some of
> the functions in pqcomm.c, such as internal_flush, it's impossible to tell
> which side of the abstraction boundary they're supposed to be on.
> (I suspect that that particular function has simply been ignored on the
> undocumented assumption that a bgworker could never call it, but that's
> the kind of half-baked work that I don't find acceptable.)
> 
> I think the core work that needs to be done here is to clearly identify
> each function in pqcomm.c as either above or below the PqCommMethods
> abstraction boundary.  Maybe we should split one set or the other out
> to a new source file.  In any case, the naming convention ought to
> reflect that hierarchy.

Agreed. I'm not sure if there's a clean boundary, though.

> I withdraw the idea that we should try to revert all these functions back
> to their old names, since that would obscure the layering distinction
> between socket-specific and general-usage functions.  I now think that
> the problem is that that distinction hasn't been drawn sharply enough.
> 
> > The set of functions in PQcommMethods doesn't seem clean. They
> > are chosen arbitrarily just so that other pq_* functions used in
> > parallel workers will work as expected. I suppose that it needs
> > some refactoring.
> 
> Yeah, exactly.
> 
> > Any work in this area is likely 10.0 material at this point.
> 
> I'm not really happy with that, since refactoring it again will create
> back-patch hazards.  But I see that a lot of the mess here was created
> in 9.5, which means we're probably stuck with back-patch hazards anyway.

I think we have an alternative not to backpatch that
refactoring. 9.5 works with it happily and won't get further
improovement in this area.

>   regards, tom lane
> 
> PS: "PQ" comes from PostQUEL, I believe.

Thanks, I've learned something new:)

https://en.wikipedia.org/wiki/QUEL_query_languages

> range of E is EMPLOYEE
> retrieve into W
> (COMP = E.Salary / (E.Age - 18))
> where E.Name = "Jones"

It looks more methematical or programming-language-like to me
than SQL.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reviewing freeze map code

2016-08-01 Thread Noah Misch
On Sat, Jul 23, 2016 at 01:25:55PM +0530, Amit Kapila wrote:
> On Mon, Jul 18, 2016 at 2:03 PM, Andres Freund  wrote:
> > On 2016-07-18 10:02:52 +0530, Amit Kapila wrote:
> >> Consider the below scenario.
> >>
> >> Vacuum
> >> a. acquires a cleanup lock for page - 10
> >> b. busy in checking visibility of tuples
> >> --assume, here it takes some time and in the meantime Session-1
> >> performs step (a) and (b) and start waiting in step- (c)
> >> c. marks the page as all-visible (PageSetAllVisible)
> >> d. unlockandrelease the buffer
> >>
> >> Session-1
> >> a. In heap_lock_tuple(), readbuffer for page-10
> >> b. check PageIsAllVisible(), found page is not all-visible, so didn't
> >> acquire the visbilitymap_pin
> >> c. LockBuffer in ExlusiveMode  - here it will wait for vacuum to
> >> release the lock
> >> d. Got the lock, but now the page is marked as all-visible, so ideally
> >> need to recheck the page and acquire the visibilitymap_pin
> >
> > So, I've tried pretty hard to reproduce that. While the theory above is
> > sound, I believe the relevant code-path is essentially dead for SQL
> > callable code, because we'll always hold a buffer pin before even
> > entering heap_update/heap_lock_tuple.
> >
> 
> It is possible that we don't hold any buffer pin before entering
> heap_update() and or heap_lock_tuple().  For heap_update(), it is
> possible when it enters via simple_heap_update() path.  For
> heap_lock_tuple(), it is possible for ON CONFLICT DO Update statement
> and may be others as well.

This is currently listed as a 9.6 open item.  Is it indeed a regression in
9.6, or do released versions have the same defect?  If it is a 9.6 regression,
do you happen to know which commit, or at least which feature, caused it?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Etsuro Fujita

On 2016/08/02 13:32, Kouhei Kaigai wrote:

I wrote:

My concern here is EXPLAIN for foreign joins.  I think it's another
problem how we handle Foreign Scan plan nodes representing
post-scan/join operations in EXPLAIN, so I'd like to leave that for
another patch.



What is the post-scan/join operations? Are you saying EPQ recheck and
alternative local join plan?


No.  I mean e.g., aggregation, window functions, sorting, or table 
modification.  In other words, Foreign Scan plan nodes created from 
ForeignPath paths created from GetForeignUpperPaths.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #14244: wrong suffix for pg_size_pretty()

2016-08-01 Thread Pavel Stehule
2016-08-01 20:51 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 7/30/16 1:18 AM, Pavel Stehule wrote:
> > We talked about this issue, when I wrote function pg_size_bytes. It is
> > hard to fix these functions after years of usage. The new set of
> > functions can be better
> >
> > pg_iso_size_pretty();
> > pg_iso_size_bytes();
>
> One thing that would actually be nice for other reasons as well is a
> version of pg_size_pretty() that lets you specify the output unit, say,
> as a second argument.  Because sometimes you want to compare two tables
> or something, and tells you one is 3GB and the other is 783MB, which
> doesn't really help.  If I tell it to use 'MB' as the output unit, I
> could get comparable output.
>

It is looks like some convert function

pg_size_to(size, unit, [others ... rounding, truncating]) returns numeric

select pg_size_to(1024*1024, 'KB')

Regards

Pavel


> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] Increasing timeout of poll_query_until for TAP tests

2016-08-01 Thread Michael Paquier
On Tue, Aug 2, 2016 at 10:28 AM, Michael Paquier
 wrote:
> There is still an issue with pg_basebackup when testing stream mode
> and replication slots. I am digging into this one now..

After 5 hours running this test in a row and 30 attempts torturing
hamster with a script running make check in an infinite loop with set
-e I am giving up on that for the time being... I have added the patch
to make the tests more stable to next CF so as it is not forgotten:
https://commitfest.postgresql.org/10/693/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-08-01 Thread Kouhei Kaigai
> -Original Message-
> From: Etsuro Fujita [mailto:fujita.ets...@lab.ntt.co.jp]
> Sent: Tuesday, August 02, 2016 2:45 PM
> To: Kaigai Kouhei(海外 浩平); Ashutosh Bapat
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown 
> plans
> 
> On 2016/08/02 13:32, Kouhei Kaigai wrote:
> 
> I wrote:
> >> My concern here is EXPLAIN for foreign joins.  I think it's another
> >> problem how we handle Foreign Scan plan nodes representing
> >> post-scan/join operations in EXPLAIN, so I'd like to leave that for
> >> another patch.
> 
> > What is the post-scan/join operations? Are you saying EPQ recheck and
> > alternative local join plan?
> 
> No.  I mean e.g., aggregation, window functions, sorting, or table
> modification.  In other words, Foreign Scan plan nodes created from
> ForeignPath paths created from GetForeignUpperPaths.
>
Why do you need to handle these upper paths individually?
FDW driver knows the remote query contains aggregation, window functions,
sorting, or table modification. It can give proper label.

If remote query contains partial aggregation and relations join, for
example, "Partial Agg + Scan" will be a proper label that shall be
followed by the "Foreign %s".

All you need to do are the two enhancements:
- Add "const char *explain_label" on the ForeignScanState or somewhere
  extension can set. It gives a label to be printed.
  If NULL string is set, EXPLAIN shows "Foreign Scan" as a default.
- Add "Bitmapset explain_rels" on the ForeignScanState or somewhere
  extension can set. It indicates the relations to be printed after
  the "Foreign %s" token. If you want to print all the relations names
  underlying this ForeignScan node, just copy scanrelids bitmap.
  If NULL bitmap is set, EXPLAIN shows nothing as a default.

Please note that the default does not change the existing behavior.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers