[GENERAL] Replication stops under certain circumstances

2017-10-20 Thread Kim Rose Carlsen
Hi

I have some strange issues with a postgresql read replica that seems to stop 
replicating under certain circumstances.

Whenever we have changes to our views we have script that drops all views and 
reload them from scratch with the new definitions. The reloading of the views 
happens in a transaction to avoid confusion for everyone using the database. 
When this update gets to the slave it seems there is a chance for a deadlock to 
occur that doesn't get detected. 

As I was trying to reproduce this behavior, I ran into another weird situation 
that I don't entirely understand. The symptom is the same that replication 
stops, but it looks quite different. This example won't reproduce the issue 
reliably, but after a few hours I get a slave that won't continue to replicate 
until I restart it. The queries in the example won't make much sense, and I 
don't know if they can be simplified further and still cause the "desired" 
effect.

Setup:
Launch a new RDS psql instance (9.6.2) on AWS (will be referred to as 
db-master) and create a read replica (will be referred to as db-slave). The 
following options are changed from AWS default:
  
max_standby_streaming_delay=-1
hot_standby_feedback=1


On the master create 2 dummy tables:
create table a  (id serial primary key);
create table b  (id serial primary key);

Setup thread 1 to do work on master:

while true; do psql -h db-master -U postgres db -c 'begin; drop view if exists 
view_a cascade; drop view if exists view_b; drop view if exists view_c; create 
view view_a as select * from a; create view view_b as select * from b; create 
view view_c as select * from view_a join view_b using (id); insert into a 
values (default); insert into b values (default); commit;'; done

Setup thread 2 to do work on Slave:
while true; do psql -h  db-slave -U postgres db -c 'begin; select * from view_c 
order by random() limit 10; select * from view_a order by random() limit 10;'; 
done

Setup thread 3 to do more work on slave:
while true; do psql -h  db-slave -U postgres db -c 'begin; select * from view_b 
order by random() limit 10; select * from view_a order by random() limit 10;'; 
done

Every now and then a deadlock is detected and one connection is aborted, this 
works as expected. But After a while(serveral hours) it becomes impossible to 
connect to db on db-slave and thread 2 and 3 stops producing output. When 
trying to connect the psql client just hangs. However it is possible connect to 
template1 database to get a look on what is going on.



template1=> select * from pg_stat_activity;
-[ RECORD 1 ]+
datid    | 16384
datname  | rdsadmin
pid  | 7891
usesysid | 10
usename  | rdsadmin
application_name | 
client_addr  | 
client_hostname  | 
client_port  | 
backend_start    | 
xact_start   | 
query_start  | 
state_change | 
wait_event_type  | 
wait_event   | 
state    | 
backend_xid  | 
backend_xmin | 
query    | 
-[ RECORD 2 ]+
datid    | 1
datname  | template1
pid  | 11949
usesysid | 16388
usename  | hiper
application_name | psql
client_addr  | 192.168.10.166
client_hostname  | 
client_port  | 41002
backend_start    | 2017-10-20 16:30:26.032745+02
xact_start   | 2017-10-20 16:30:34.306418+02
query_start  | 2017-10-20 16:30:34.306418+02
state_change | 2017-10-20 16:30:34.306421+02
wait_event_type  | 
wait_event   | 
state    | active
backend_xid  | 
backend_xmin | 26891
query    | select * from pg_stat_activity;





There are no active connection except rdsadmin from aws.



template1=> select * from pg_locks;
  locktype  | database | relation | page | tuple | virtualxid | transactionid | 
classid | objid | objsubid | virtualtransaction |  pid  |    mode | 
granted | fastpath 
+--+--+--+---++---+-+---+--++---+-+-+--
 virtualxid |  |  |  |   | 3/929  |   | 
    |   |  | 3/929  |  9640 | ExclusiveLock   | 
t   | t
 relation   |    16390 | 2659 |  |   |    |   | 
    |   |  | 4/829  |  9639 | AccessShareLock | 
t   | t
 relation   |    16390 | 1249 |  |   |    |   | 
    |   |  | 4/829  |  9639 | AccessShareLock | 
t   | t
 virtualxid |  |  |  |   | 4/829  |   | 
    |   |  | 4/829  |  9639 | ExclusiveLock   | 
t   | t
 relation   |    1 |    11695 |  |   |    |   | 
    |   |  | 5/148  | 11949 | AccessShareLock | 
t 

Re: [GENERAL] Equivalence Classes when using IN

2017-10-11 Thread Kim Rose Carlsen

> On 11 Oct 2017, at 21.46, David Rowley <david.row...@2ndquadrant.com> wrote:
> 
>> On 12 October 2017 at 08:37, Kim Rose Carlsen <k...@hiper.dk> wrote:
>> 
>>> Yeah.  The ORDER BY creates a partial optimization fence, preventing
>>> any such plan from being considered.
>>>> 
>> 
>> I can see in the general case it semanticly means different things If you 
>> allow the WHERE to pass through ORDER BY.
>> 
>> A special case can be allowed for WHERE to pass the ORDER BY if the column 
>> is part of DISTINCT ON.
> 
> Yeah, we do allow predicates to be pushed down in that case.
> 

Let's ignore that it's not a very useful query I have written. 

Why don't I see that predicate (customer_id) pushed into the outer nested loop 
so we don't have to sort the whole table on each loop. 

(See original post and follow up for definitions) 
QUERY PLAN  

-
Nested Loop Left Join  (cost=139.00..10392.96 rows=668 width=16) (actual 
time=0.528..35.120 rows=200 loops=1)
  Join Filter: (c.customer_id = product.customer_id)
  Rows Removed by Join Filter: 199900
  ->  Nested Loop  (cost=0.28..199.21 rows=334 width=12) (actual 
time=0.075..1.146 rows=100 loops=1)
->  Seq Scan on customer  (cost=0.00..21.51 rows=334 width=8) (actual 
time=0.067..0.282 rows=100 loops=1)
  Filter: (age < 20)
  Rows Removed by Filter: 901
->  Index Only Scan using customer_pkey on customer c  (cost=0.28..0.53 
rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100)
  Index Cond: (customer_id = customer.customer_id)
  Heap Fetches: 100
  ->  Materialize  (cost=138.73..173.75 rows=2001 width=8) (actual 
time=0.005..0.130 rows=2001 loops=100)
->  Sort  (cost=138.73..143.73 rows=2001 width=8) (actual 
time=0.448..0.588 rows=2001 loops=1)
  Sort Key: product.customer_id, product.product_id
  Sort Method: quicksort  Memory: 142kB
  ->  Seq Scan on product  (cost=0.00..29.01 rows=2001 width=8) 
(actual time=0.006..0.215 rows=2001 loops=1)
Planning time: 0.214 ms
Execution time: 35.284 ms

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Equivalence Classes when using IN

2017-10-11 Thread Kim Rose Carlsen

> Yeah.  The ORDER BY creates a partial optimization fence, preventing
> any such plan from being considered.
>> 

I can see in the general case it semanticly means different things If you allow 
the WHERE to pass through ORDER BY. 

A special case can be allowed for WHERE to pass the ORDER BY if the column is 
part of DISTINCT ON.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Equivalence Classes when using IN

2017-10-10 Thread Kim Rose Carlsen
> If the only reason that is_simple_subquery() rejects subqueries with
> ORDER BY is due to wanting to keep the order by of a view, then
> couldn't we make is_simple_subquery() a bit smarter and have it check
> if the subquery is going to be joined to something else, which likely
> would destroy the order, or at least it would remove any guarantees of
> it.
>
> Something like the attached?

I dont know if it makes any difference that the ORDER BY is used in a
DISTINCT ON clause. In this case the ORDER BY is important.


- Kim


Re: [GENERAL] Equivalence Classes when using IN

2017-10-10 Thread Kim Rose Carlsen
> You would benefit from adding the age column to view_customer, or at
> least consider having some view which contains all the columns you'll
> ever need from those tables and if you need special views with only a
> subset of columns due to some software doing "select * from
> viewname;", then you could just create some. Joining to the same table
> again seems like a bit of a waste of effort for the planner and
> executor.

I would argue that the anti pattern would be the software that
insist on using "select * from viewname;" from a view that has
calculated columns that you do not care for. I recommend
introducing both lightweight views and heavyweight views, so you
can join up probably for what you need.

My example is fabricated trying to simplify things, but I seem to
create more confusion than clarity in my example. My point was
only to see if anything could be added to the fabricated
execution path. I agree that the listed example does not make
sense. So I will try and give some more context to real use
cases.

Imagine an invoice entity where you have one relation for invoice
base data and a relation for invoice_line. The invoice has some
invoice_id, customer_id, due_date, paid_date and invoice_line
contains each line with a invoice_id, display_name, amount. A
view (view_invoice_with_amount) where you calculate the total.

so a query could be
SELECT c.customer_id,
   i.invoice_amount_total
  FROM view_customer c
  JOIN view_invoice_with_amount i
ON c.customer_id = i.customer_id
 WHERE c.first_name = 'John';

If you ever need to filter by invoice_amount_total, it might be
necesary denormalize the relations and cache the amount in the
invoice table.

> I'd assume customer_id is the PRIMARY KEY of customer and
> is unique.

This is a continuation of the previous example, maybe I should have
included it all to make it more clear. But customer_id is declared
as a primary key.

> It's not all that clear what your view is doing here. Confusingly
> there's a Sort in the plan, yet nothing in the query asked for that,
> so I guess that the view must have an ORDER BY. If you get rid of that
> the planner would likely use an index on product (customer_id) to
> parameterise the nested loop, at least, it likely would, if you have
> one.

The view is defined in the original post. What I was trying to illustrate
was a DISTINCT ON clause to prioritize multiple products pr customer
to a somewhat "main" product for the customer. The ORDER BY on product_id
would in this case then map the first product a customer gets to its
"main" product. It could also be the most valuable product or newest ordered
active product etc. It is just some way of mapping one to many relation to a
one to one. Again the example is simplified and fabricated and maybe looses
its power to explain its intents.

> It's pretty bad practice to have ORDER BY in views. I kinda wish we
> didn't even allow it, but that ship sailed many years ago...

It is required by DISTINCT ON and as soon as you go into
reporting, datawarehouse then it gets difficult to avoid these
along with group by. Instead of writing each query from the
ground up you get a huge benefit by factorizing each query into
meaningful entities that can stand alone and make sense by
themself, and from these build up the query to answer your
questions. That way you gain lots of re-use of code and
definition doesn't change between queries. The down side is it
leaves alot of work to the planner. It's a trade off between
optimization, readability and simplicity.

I hope I make more sense now.

- Kim


Re: [GENERAL] Equivalence Classes when using IN

2017-10-09 Thread Kim Rose Carlsen

>On 9 October 2017 at 08:01, Kim Rose Carlsen <k...@hiper.dk> wrote:


>> Is this because postgres never consider IN clause when building equivalence
>> class's?
>
>Only btree equality operators are considered at the moment.

After good night sleep and reading the previous discussion, I am no longer sure 
I have reduced my original problem to the right example. If we continue from 
previous setup and add the following:

   ALTER TABLE customer ADD COLUMN age INTEGER;
   UPDATE customer SET age = customer_id / 5;

CREATE INDEX ON customer (age);
CREATE INDEX ON product (customer_id);

   SET enable_hashjoin = false;
   SET enable_mergejoin = false;
    
    EXPLAIN ANALYZE
 SELECT * 
   FROM customer 
   JOIN view_customer 
 ON customer.customer_id = view_customer.customer_id 
  WHERE age < 20;

 QUERY PLAN 
 
-
 Nested Loop Left Join  (cost=139.00..10392.96 rows=668 width=16) (actual 
time=0.528..35.120 rows=200 loops=1)
   Join Filter: (c.customer_id = product.customer_id)
   Rows Removed by Join Filter: 199900
   ->  Nested Loop  (cost=0.28..199.21 rows=334 width=12) (actual 
time=0.075..1.146 rows=100 loops=1)
 ->  Seq Scan on customer  (cost=0.00..21.51 rows=334 width=8) (actual 
time=0.067..0.282 rows=100 loops=1)
   Filter: (age < 20)
   Rows Removed by Filter: 901
 ->  Index Only Scan using customer_pkey on customer c  
(cost=0.28..0.53 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=100)
   Index Cond: (customer_id = customer.customer_id)
   Heap Fetches: 100
   ->  Materialize  (cost=138.73..173.75 rows=2001 width=8) (actual 
time=0.005..0.130 rows=2001 loops=100)
 ->  Sort  (cost=138.73..143.73 rows=2001 width=8) (actual 
time=0.448..0.588 rows=2001 loops=1)
   Sort Key: product.customer_id, product.product_id
   Sort Method: quicksort  Memory: 142kB
   ->  Seq Scan on product  (cost=0.00..29.01 rows=2001 width=8) 
(actual time=0.006..0.215 rows=2001 loops=1)
 Planning time: 0.214 ms
 Execution time: 35.284 ms


The planner prefer to use hash and merge joins which is ok, when many rows are 
to be joined, I don't think any condition can be merged to make these case 
faster. I have disabled merge and hash joins to get to a nested loop join 
instead, in this case it would be much better if customer_id can be pulled 
inside the loop, so it can look at only the relevant rows and not all rows for 
each loop. I somehow inferred that this would be the same as selecting from the 
view using IN clause, now I'm not so sure anymore.

I can see there is a trade off between planner time and how exotic the case is. 
If you want to be able to hide abstraction through views I guess the nature 
becomes more OLAP oriented than OLTP. 

Best Regards
Kim Carlsen

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Equivalence Classes when using IN

2017-10-08 Thread Kim Rose Carlsen
Hi


I have this query where I think it's strange that the join doesn't pull the 
where condition in since RHS is equal to LHS. It might be easier to expain with 
an example

Setup
CREATE TABLE customer (
  customer_id INTEGER PRIMARY KEY
);

CREATE TABLE product (
  product_id  INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL REFERENCES customer (customer_id)
);

INSERT INTO customer (SELECT generate_series FROM 
generate_series(0, 100));

INSERT INTO product (product_id, customer_id) (SELECT 
generate_series, generate_series / 2 FROM generate_series(0, 2000));


Query

   EXPLAIN ANALYSE
SELECT *
  FROM customer c
  JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY 
customer_id, product_id) p
ON c.customer_id = p.customer_id
 WHERE c.customer_id IN (500, 501);

QUERY PLAN
--
 Merge Join  (cost=172.43..186.25 rows=1 width=12) (actual time=1.350..1.353 
rows=2 loops=1)
   Merge Cond: (c.customer_id = product.customer_id)
   ->  Sort  (cost=13.93..13.93 rows=2 width=4) (actual time=0.036..0.036 
rows=2 loops=1)
 Sort Key: c.customer_id
 Sort Method: quicksort  Memory: 25kB
 ->  Bitmap Heap Scan on customer c  (cost=8.58..13.92 rows=2 width=4) 
(actual time=0.026..0.027 rows=2 loops=1)
   Recheck Cond: (customer_id = ANY ('{500,501}'::integer[]))
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on customer_pkey  (cost=0.00..8.58 rows=2 
width=0) (actual time=0.018..0.018 rows=2 loops=1)
 Index Cond: (customer_id = ANY ('{500,501}'::integer[]))
   ->  Unique  (cost=158.51..169.81 rows=200 width=8) (actual time=0.783..1.221 
rows=502 loops=1)
 ->  Sort  (cost=158.51..164.16 rows=2260 width=8) (actual 
time=0.782..0.929 rows=1003 loops=1)
   Sort Key: product.customer_id, product.product_id
   Sort Method: quicksort  Memory: 142kB
   ->  Seq Scan on product  (cost=0.00..32.60 rows=2260 width=8) 
(actual time=0.015..0.366 rows=2001 loops=1)
 Planning time: 0.281 ms
 Execution time: 1.432 ms


I would expect that since c.customer_id = p.customer_id then p.customer_id IN 
(500, 501). If I apply this rule myself, I get a much nicer plan (and it could 
be even better with an index on product_id).


QUERY PLAN
--
 Merge Join  (cost=52.70..53.11 rows=1 width=12) (actual time=0.686..0.693 
rows=2 loops=1)
   Merge Cond: (product.customer_id = c.customer_id)
   ->  Unique  (cost=38.77..38.89 rows=22 width=8) (actual time=0.647..0.651 
rows=2 loops=1)
 ->  Sort  (cost=38.77..38.83 rows=23 width=8) (actual 
time=0.646..0.647 rows=4 loops=1)
   Sort Key: product.customer_id, product.product_id
   Sort Method: quicksort  Memory: 25kB
   ->  Seq Scan on product  (cost=0.00..38.25 rows=23 width=8) 
(actual time=0.331..0.632 rows=4 loops=1)
 Filter: (customer_id = ANY ('{500,501}'::integer[]))
 Rows Removed by Filter: 1997
   ->  Sort  (cost=13.93..13.93 rows=2 width=4) (actual time=0.033..0.033 
rows=2 loops=1)
 Sort Key: c.customer_id
 Sort Method: quicksort  Memory: 25kB
 ->  Bitmap Heap Scan on customer c  (cost=8.58..13.92 rows=2 width=4) 
(actual time=0.025..0.026 rows=2 loops=1)
   Recheck Cond: (customer_id = ANY ('{500,501}'::integer[]))
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on customer_pkey  (cost=0.00..8.58 rows=2 
width=0) (actual time=0.018..0.018 rows=2 loops=1)
 Index Cond: (customer_id = ANY ('{500,501}'::integer[]))
 Planning time: 0.386 ms
 Execution time: 0.774 ms
(19 rows)

Is this because postgres never consider IN clause when building equivalence 
class's?

Are there any interests in adding such rule?


My idea is to wrap this in a view

  CREATE VIEW view_customer AS
SELECT c.customer_id,
   p.product_id
  FROM customer c
 LEFT JOIN (SELECT DISTINCT ON (customer_id) * FROM product ORDER BY 
customer_id, product_id) p
ON c.customer_id = p.customer_id


Where the LEFT JOIN can be pruned if there is no explicit need for product_id. 
Here I loose the power to express that both c.customer_id and p.customer_id is 
the same.


Best regards

Kim Carlsen


[GENERAL] Remove useless joins (VARCHAR vs TEXT)

2017-09-16 Thread Kim Rose Carlsen
Hi


It seems there are some difference in VARCHAR vs TEXT when postgres tries to 
decide if a LEFT JOIN is useful or not. I can't figure out if this is 
intentional because there are some difference between TEXT and VARCHAR that I 
dont know about or if it's a bug.


I would expect both examples to produce same query plan


a)

create table a (id varchar primary key);
create table b (id varchar primary key);

explain   select a.*
 from a
left join (select distinct id from b) as b
   on a.id = b.id;


QUERY PLAN
--
 Hash Right Join  (cost=67.60..113.50 rows=1360 width=32)
   Hash Cond: ((b.id)::text = (a.id)::text)
   ->  HashAggregate  (cost=27.00..40.60 rows=1360 width=32)
 Group Key: b.id
 ->  Seq Scan on b  (cost=0.00..23.60 rows=1360 width=32)
   ->  Hash  (cost=23.60..23.60 rows=1360 width=32)
 ->  Seq Scan on a  (cost=0.00..23.60 rows=1360 width=32)
(7 rows)


b)

create table a (id text primary key);

create table b (id text primary key);

explain   select a.*
 from a
left join (select distinct id from b) as b
   on a.id = b.id;

  QUERY PLAN
--
 Seq Scan on a  (cost=0.00..23.60 rows=1360 width=32)


- Kim Carlsen


Re: [GENERAL] Strict min and max aggregate functions

2016-11-20 Thread Kim Rose Carlsen
> CREATE OR REPLACE FUNCTION strict_min_agg (anyarray,anyelement )

> RETURNS anyarray LANGUAGE sql IMMUTABLE AS $$
> SELECT CASE
> WHEN $1 IS NULL THEN ARRAY[$2]
> WHEN $1[1] IS NULL THEN $1
> WHEN $2 IS NULL THEN ARRAY[$2] -- use $2 not NULL to preserve  
> type
> ELSE ARRAY[least($1[1],$2)] END ;
> $$;
>
>
> CREATE OR REPLACE FUNCTION strict_min_final (anyarray)
> RETURNS anyelement LANGUAGE sql IMMUTABLE AS $$
> SELECT CASE WHEN $1 IS NULL THEN NULL ELSE $1[1] END ;
> $$;
>
> CREATE AGGREGATE strict_min (x anyelement) (
> sfunc = strict_min_agg,
> stype = anyarray,
> finalfunc = strict_min_final
> );
>

It seems like this should be possible to do in something more close to O(log 
n). But I'm not sure how to fix the semantics with aggregates.

SELECT max() FROM ;
SELECT true FROM  WHERE  IS NULL LIMIT 1;

Both these queries can be resolved with a index lookup (if one is available).




Re: [GENERAL] Partial update on an postgres upsert violates constraint

2016-11-19 Thread Kim Rose Carlsen
> AFAIK, EXCLUDED is only available in a trigger function:

>
> https://www.postgresql.org/docs/9.5/static/trigger-definition.html
>
> You are using EXCLUDED in a regular function so it would not be found.
>
> Can you also show the failure for your alternate method?

>From the manual
https://www.postgresql.org/docs/9.5/static/sql-insert.html

"
conflict_action
conflict_action specifies an alternative ON CONFLICT action. It can be either 
DO NOTHING, or a DO UPDATE clause specifying the exact details of the UPDATE 
action to be performed in case of a conflict. The SET and WHERE clauses in ON 
CONFLICT DO UPDATE have access to the existing row using the table's name (or 
an alias), and to rows proposed for insertion using the special excluded table. 
SELECT privilege is required on any column in the target table where 
corresponding excluded columns are read.
"



Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-10 Thread Kim Rose Carlsen

> Hang on -- upthread the context was inner join, and the gripe was join

> fast with '=', slow with INDF.  When he said the nulls were
> 'generated', I didn't follow that they were part of the original
> query.  If the nulls are generated along with the query, sure, an
> index won't help.
>
> I maintain my earlier point; with respect to the original query, to
> get from performance of INDF to =, you have three options:
> a) expr index the nulls  (assuming they are physically stored)
> b) convert to ((a = b) or a is null and b is null) which can help with
> a bitmap or plan
> c) covert to union all equivalent of "b"
>
> merlin

a) and b) would be workaround that would run an order of magnitude slower. The 
query
starts with a full table scan of a large table. If the planner had started 
elsewhere it could
have reduced the result to 1-2 rows from the start. It won't choose this plan 
without the help
from =.

c) could be a acceptable workaround, but it would clutter up if you would want 
more
than one column to be IS NOT DISTINCT FROM. You end up with 2^n unions to 
simulate
IS NOT DISTINCT FROM.

Without knowing the work required, I will still argue that having IS NOT 
DISTINCT FROM
use the same transitive rules as equality,  would be a better approach.

With fear of talking about things I know little(nothing) of, I think the 
description of EquivalenceClasses
in postgres/src/backend/optimizer/README, should be extended to also include 
EquivalenceClasses
of IS NOT DISTINCT FROM.


Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-11-04 Thread Kim Rose Carlsen
>> It might raise another problem, that the nulls are generated through LEFT

>> JOINS where no rows are defined. Then the 0 or -1 value need to be
>> a computed value. Won't this throw off index lookups? (I might be
>> more confused in this area).
>
>Not following this.

The nulls are generated by something like this
SELECT c.circuit_id,
   cc.customer_id
   FROM circuit AS c
LEFT JOIN circuit_customer AS cc
 ON c.circuit_id = cc.circuit_id

To make a magic '0' customer we would be required to use
  COALESCE(cc.customer_id, '0')
I dont think the optimizer will do anything clever with the '0' we have
computed from null.

I could ofc. by default assign all unassigned circuits to '0' in
circuit_customer. I'm not a fan though.

>BTW, if you want a fast plan over the current
>data without consideration of aesthetics, try this:
>
>CREATE VIEW view_circuit_with_status AS (
>SELECT r.*,
>  s.circuit_status,
>  s.customer_id AS s_customer_id,
>  p.line_speed,
>  p.customer_id AS p_customer_id
> FROM view_circuit r
> JOIN view_circuit_product_main s
>  ON r.circuit_id = s.circuit_id
>  AND r.customer_id, s.customer_id
> JOIN view_circuit_product p
>   ON r.circuit_id = p.circuit_id
>  AND r.customer_id, s.customer_id
>  UNION ALL SELECT r.*,
>  s.circuit_status,
>  s.customer_id AS s_customer_id,
>  p.line_speed,
>  p.customer_id AS p_customer_id
> FROM view_circuit r
> JOIN view_circuit_product_main s
>   ON r.circuit_id = s.circuit_id
>  AND r.customer_id IS NULL
>  AND  s.customer_id IS NULL
> JOIN view_circuit_product p
>   ON r.circuit_id = p.circuit_id>

I will have to figure something out, but this specific case is still problematic
since we would like to filter this view using different criteria's, like 
circuit_no,
products or customers.

But with all these detours, I assume that a change to IS NOT DISTINCT FROM,
is difficult or not wanted?


Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-31 Thread Kim Rose Carlsen
On Sat, Oct 29, 2016 at 8:27 AM, Kim Rose Carlsen <k...@hiper.dk> wrote:

> > I have tried creating a function called
> > zero_if_null(int) : int that just select COALESCE($1, 0)
> > and adding a index on (zero_if_null(customer_id)) on table that contains
> > customer_id. The only thing I get from is the planner now only knows how to
> > compare customer_id, but it still doesn't know that they are of same value,
> > only I know that and I want to declare it for the planner.


> Well, the *behavior* is mandated by the sql standard.  Our
> implementation is slow however.

Sorry I'm not following, what behavior is mandated by the sql standard?

> I'm surprised the attached function
> didn't help, it can be inlined and I was able to get bitmap or which
> is pretty good.  As I said upthread I think INDF could theoretically
> run as fast as equality -- it just doesn't today.

It might be harsh to say that it doesn't help at all. I does half the running 
time,
but I need it to run an order of magnitude faster. Here is the plan with the
empty_if_null (customer_id is actually varchar)

https://explain.depesz.com/s/M1LV with empty_if_null + functional index
https://explain.depesz.com/s/eOL with indf

> As your joins are written you could probably convert this by reserving
> a customer_id to the work that you're trying to do with null, say, 0,
> or -1.  This is a somewhat dubious practice but seems a better fit for
> your use case.  I don't think INDF is good in this usage.
>
> merlin

This will work well, I think.

But I'm not sure I can mentally accept an unfilled value should not be
null (eg. 0, '', '-01-01'). But I can see how the equals operator will
work well with this.

It might raise another problem, that the nulls are generated through LEFT
JOINS where now rows are defined. Then the 0 or -1 value need to be
a computed value. Won't this throw of index lookups? (I might be
more confused in this area).



Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-29 Thread Kim Rose Carlsen
>> This doesn't do much good. This doesn't tell the planner that the 3

>> customer_ids are actually of same value, and it therefore can't filter them
>> as it sees fit.

> You do know you can index on a function, and the planner then keeps
> stats on it when you run analyze right?

Yes, but I don't think it will make any difference. I don't think I can solve 
this with
an index lookup. I think my savior is the inference that the 2 columns are of
same value and the planner are free to choose which order to do the filter and 
join
with this extra information.

I have tried creating a function called
zero_if_null(int) : int that just select COALESCE($1, 0)
and adding a index on (zero_if_null(customer_id)) on table that contains 
customer_id. The only thing I get from is the planner now only knows how to 
compare customer_id, but it still doesn't know that they are of same value, 
only I know that and I want to declare it for the planner.

I could probably rewrite the whole view in one query, and then fix it with a 
proper index. But I think I will loose alot of readability.

I could also change the structure to save an explicit state, instead of a 
calculated state. But then I get some redundancy I need to make sure always 
stays the same.

In the end one of these will probably be the solution.

I guess the question is more or less,

why doesn't IS NOT DISTINCT FROM behave the same way as = operator, are there 
any alternatives? And a plausible use case for when it would be useful.


Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-29 Thread Kim Rose Carlsen
> try this :-D

> create or replace function indf(anyelement, anyelement) returns anyelement as
> $$
>   select $1 = $2 or ($1 is null and $2 is null);
> $$ language sql;
>
> CREATE VIEW view_circuit_with_status AS (
>SELECT r.*,
>   s.circuit_status,
>   s.customer_id AS s_customer_id,
>   p.line_speed,
>   p.customer_id AS p_customer_id
>  FROM view_circuit r
>  JOIN view_circuit_product_main s
>ON r.circuit_id = s.circuit_id
>   AND indf(r.customer_id, s.customer_id)
>  JOIN view_circuit_product p
>ON r.circuit_id = p.circuit_id
>   AND indf(r.customer_id, s.customer_id)
>
> merlin

This doesn't do much good. This doesn't tell the planner that the 3 
customer_ids are actually of same value, and it therefore can't filter them as 
it sees fit.


Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Kim Rose Carlsen
> > On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <k...@hiper.dk> wrote:

> > Hi
> >
> > I was wondering if there is a way to hint that two columns in two different
> > tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if
> > table_a.key = 'test' THEN table_b.key = 'test' .
> >
> > The equals operator already does this but it does not handle NULLS very well
> > (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
> > doesn't establish the same inference rules as equals.
>
> The whole idea behing Postgres' query planner is that you don't have
> to use any hints. Late model versions of postgres handle nulls fine,
> but nulls are never "equal" to anything else. I.e. where xxx is null
> works with indexes. Where x=y does not, since null <> null.
>
> Suggestion for getting help, put a large-ish aka production sized
> amount of data into your db, run your queries with explain analyze and
> feed them to https://explain.depesz.com/ and post the links here along
> with the slow queries. A lot of times the fix is non-obvious if you're
> coming from another db with a different set of troubleshooting skills
> for slow queries.

The problem is how to reduce the problem into its core, without introducing
all the unnecessary.

Maybe simplifying the problem, also makes it impossible to say where I go
wrong. It might be that I try to push too much logic into the SQL layer
and Im adding too many layers of abstraction to accomplish what I want.
So let me try and elaborate a little more.

I have couple a tables describing resources (circuits) and allocation
of resources to customers and products.

First layer is a view called view_circuit. This view (left) join any table
the circuit table reference through a foreign key (it gives exactly the same
rows and columns as circuit table + some extra information like customer_id).

Second layer is 2 views
1) a view describing if the circuit is active or inactive, lets call it
   view_circuit_product_main
2) a view describing line_speed about the circuit, lets call it
   view_circuit_product

These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...))
if this has any relevance.

Third layer
Next step is to add a view that tells both (joins the two views together
on circuit_id). lets call the new view view_circuit_with_status

This view is defined as

CREATE VIEW view_circuit_with_status AS (
   SELECT r.*,
  s.circuit_status,
  s.customer_id AS s_customer_id,
  p.line_speed,
  p.customer_id AS p_customer_id
 FROM view_circuit r
 JOIN view_circuit_product_main s
   ON r.circuit_id = s.circuit_id
  AND r.customer_id IS NOT DISTINCT FROM s.customer_id
 JOIN view_circuit_product p
   ON r.circuit_id = p.circuit_id
  AND r.customer_id IS NOT DISTINCT FROM s.customer_id
);

SELECT * FROM view_circuit_with_status WHERE customer_id = 1;

Since customer_id is exposed through view_circuit the planner assumes 
view_circuit.customer_id = 1 and from there attempts to join
view_circuit_product_main and view_circuit_product using circuit_id.
This is not running optimal.

However if we change our query to allow the inference rule to take place, the 
query is executed very fast.

SELECT * FROM view_circuit_with_status WHERE customer_id = 1 AND s_customer_id 
= 1 AND p_customer_id = 1;

If a circuit is not assigned to any customers customer_id is set to NULL. This 
is the reason I can't use = operator. If I do use = then I can't find circuit 
which are unassigned, but the query do run effective.

I can see this still ends up being quite abstract, but the point is it would be 
quite beneficial if IS NOT DISTINCT used the same rules as = operator.

I have attached the 2 query plans

Bad plan: https://explain.depesz.com/s/SZN
Good plan: https://explain.depesz.com/s/61Ro

-
Kim Carlsen
Do you use potatoes for long posts here?









Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Kim Rose Carlsen
>>JOIN a AS table_b
>> ON table_a.id = table_b.id
>> AND table_a.key = table_b.key

> Anyways, to use an index for that join, you'd need a composite index on id 
> *AND* key, not two separate indexes.
Its not as much as for using the index, but to be able to push the where clause 
inside both JOINED tables.

The = operator already does this. It gives the planner the option to join the 
table in using either id = id or key = key. It can deduce that if I have a 
WHERE condition with table_a.key = 'Something', then table_b.key must also be 
'Something'. It can then decide to filter table_b on key. When using IS NOT 
DISTINCT FROM, the planner is not considering the same options. Now its like it 
doesn't know table_a.key is same the same as table_b.key.

I would somehow expect the IS NOT DISTINCT FROM operator to do the same. As it 
establish the same rules.. If a = 'test' and a IS NOT DISTINCT FROM b then b = 
'test' also

-
Kim Carlsen



Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Kim Rose Carlsen
> The bigger picture here is that if you've designed a data representation
> that requires that a null be considered "equal to" another null, you're
> really going to be fighting against the basic semantics of SQL.  You'd
> be best off to rethink the representation.  We've not seen enough info
> about your requirements to suggest just how, though.

Sometimes I do wake up in the night scared and afraid that I have used NULLs 
the wrong way
my whole life. I usually use NULLs to denote a value has not been provided. In 
my dreams I fear
I should have used empty string instead but if that is true, then I no longer 
know what is right and
wrong :).

In this specific case, its a resource management table. If a resource is not 
allocated to any
customers, then customer_id is set to null else the customer_id is set.


From: Tom Lane <t...@sss.pgh.pa.us>
Sent: Friday, October 28, 2016 8:17:01 PM
To: Scott Marlowe
Cc: Kim Rose Carlsen; pgsql-general@postgresql.org
Subject: Re: [GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

Scott Marlowe <scott.marl...@gmail.com> writes:
> On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <k...@hiper.dk> wrote:
>> I was wondering if there is a way to hint that two columns in two different
>> tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if
>> table_a.key = 'test' THEN table_b.key = 'test' .
>>
>> The equals operator already does this but it does not handle NULLS very well
>> (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
>> doesn't establish the same inference rules as equals.

> The whole idea behing Postgres' query planner is that you don't have
> to use any hints. Late model versions of postgres handle nulls fine,
> but nulls are never "equal" to anything else. I.e. where xxx is null
> works with indexes. Where x=y does not, since null <> null.

The bigger picture here is that if you've designed a data representation
that requires that a null be considered "equal to" another null, you're
really going to be fighting against the basic semantics of SQL.  You'd
be best off to rethink the representation.  We've not seen enough info
about your requirements to suggest just how, though.

regards, tom lane


Re: [GENERAL] How to hint two columns IS NOT DISTINCT FROM each other in a join

2016-10-28 Thread Kim Rose Carlsen
Sorry for double post, just ignore this post..



From: pgsql-general-ow...@postgresql.org <pgsql-general-ow...@postgresql.org> 
on behalf of Kim Rose Carlsen <k...@hiper.dk>
Sent: Thursday, October 27, 2016 6:34:58 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] How to hint two columns IS NOT DISTINCT FROM each other in a 
join

This sender failed our fraud detection checks and may not be who they 
appear to be. Learn about spoofing<http://aka.ms/LearnAboutSpoofing>  
Feedback<http://aka.ms/SafetyTipsFeedback>

Hi


I was wondering if there is a way to hint that two columns in two different 
tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if 
table_a.key = 'test' THEN table_b.key = 'test' .


The equals operator already does this but it does not handle NULLS very well 
(or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't 
establish the same inference rules as equals.


Example:


CREATE TABLE a (

  id INTEGER PRIMARY KEY,

  key VARCHAR,

  value VARCHAR

);


CREATE INDEX ON a (key);


INSERT INTO a

  VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 
'qoz');


CREATE VIEW view_a AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

);


CREATE VIEW view_a_eq AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

 AND table_a.key = table_b.key

);


CREATE VIEW view_a_distinct AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

 AND table_a.key IS NOT DISTINCT FROM table_b.key

);

EXPLAIN SELECT * FROM view_a WHERE key = 'test';

 QUERY PLAN

 Hash Join  (cost=12.69..34.42 rows=4 width=100)
   Hash Cond: (table_b.id = table_a.id)
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)


We only get index scan on table_a


EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

 QUERY PLAN

 Nested Loop  (cost=8.36..25.53 rows=1 width=100)
   Join Filter: (table_a.id = table_b.id)
   ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
 Recheck Cond: ((key)::text = 'test'::text)
 ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
   Index Cond: ((key)::text = 'test'::text)
   ->  Materialize  (cost=4.18..12.66 rows=4 width=36)
 ->  Bitmap Heap Scan on a table_b  (cost=4.18..12.64 rows=4 width=36)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down


EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

 QUERY PLAN

 Hash Join  (cost=12.69..34.43 rows=1 width=100)
   Hash Cond: (table_b.id = table_a.id)
   Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
Same as the first example

In these examples it really doesn't matter which plan is used. But for larger 
view it might be a great hint for the optimizer to know that since we are 
joining on a primary key we could hint that all other columns for the table is 
actually NOT DISTINCT from each other. This will result in the planner being 
able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as 
we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;
 id |

[GENERAL] How to hint 2 coulms IS NOT DISTINCT FROM each other

2016-10-28 Thread Kim Rose Carlsen
Hi

I was wondering if there is a way to hint that two columns in two different 
tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if 
table_a.key = 'test' THEN table_b.key = 'test' .

The equals operator already does this but it does not handle NULLS very well 
(or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't 
establish the same inference rules as equals.

Example:

CREATE TABLE a (
  id INTEGER PRIMARY KEY,
  key VARCHAR,
  value VARCHAR

);

CREATE INDEX ON a (key);

INSERT INTO a
  VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 
'qoz');

CREATE VIEW view_a AS (
  SELECT table_a.id,
 table_a.key,
 table_a.value,
 table_b.key as b_key
FROM a AS table_a
JOIN a AS table_b
  ON table_a.id = table_b.id
);


CREATE VIEW view_a_eq AS (
  SELECT table_a.id,
 table_a.key,
 table_a.value,
 table_b.key as b_key
FROM a AS table_a
JOIN a AS table_b
  ON table_a.id = table_b.id
 AND table_a.key = table_b.key
);

CREATE VIEW view_a_distinct AS (
  SELECT table_a.id,
 table_a.key,
 table_a.value,
 table_b.key as b_key
FROM a AS table_a
JOIN a AS table_b
  ON table_a.id = table_b.id
 AND table_a.key IS NOT DISTINCT FROM table_b.key
);

EXPLAIN SELECT * FROM view_a WHERE key = 'test';


 QUERY PLAN

 Hash Join  (cost=12.69..34.42 rows=4 width=100)
   Hash Cond: (table_b.id = table_a.id)
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)


We only get index scan on table_a

EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

 QUERY PLAN

 Nested Loop  (cost=8.36..25.53 rows=1 width=100)
   Join Filter: (table_a.id = table_b.id)
   ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
 Recheck Cond: ((key)::text = 'test'::text)
 ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
   Index Cond: ((key)::text = 'test'::text)
   ->  Materialize  (cost=4.18..12.66 rows=4 width=36)
 ->  Bitmap Heap Scan on a table_b  (cost=4.18..12.64 rows=4 width=36)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down


EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

 QUERY PLAN

 Hash Join  (cost=12.69..34.43 rows=1 width=100)
   Hash Cond: (table_b.id = table_a.id)
   Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
Same as the first example

In these examples it really doesn't matter which plan is used. But for larger 
view it might be a great hint for the optimizer to know that since we are 
joining on a primary key we could hint that all other columns for the table is 
actually NOT DISTINCT from each other. This will result in the planner being 
able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as 
we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;
 id | key | value | b_key
+-+---+---
(0 rows)

Maybe my examples are too simple, but I hope you can verify the same will hold 
for a much larger table with 2 different views on top of them.



[GENERAL] How to hint two columns IS NOT DISTINCT FROM each other in a join

2016-10-27 Thread Kim Rose Carlsen
Hi


I was wondering if there is a way to hint that two columns in two different 
tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if 
table_a.key = 'test' THEN table_b.key = 'test' .


The equals operator already does this but it does not handle NULLS very well 
(or not at all). And it seems IS NOT DISTINCT FROM is not indexable and doesn't 
establish the same inference rules as equals.


Example:


CREATE TABLE a (

  id INTEGER PRIMARY KEY,

  key VARCHAR,

  value VARCHAR

);


CREATE INDEX ON a (key);


INSERT INTO a

  VALUES (1, 'test', 'test'), (2, 'foo', 'bar'), (3, null, null), (4, 'baz', 
'qoz');


CREATE VIEW view_a AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

);


CREATE VIEW view_a_eq AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

 AND table_a.key = table_b.key

);


CREATE VIEW view_a_distinct AS (

  SELECT table_a.id,

 table_a.key,

 table_a.value,

 table_b.key as b_key

FROM a AS table_a

JOIN a AS table_b

  ON table_a.id = table_b.id

 AND table_a.key IS NOT DISTINCT FROM table_b.key

);

EXPLAIN SELECT * FROM view_a WHERE key = 'test';

 QUERY PLAN

 Hash Join  (cost=12.69..34.42 rows=4 width=100)
   Hash Cond: (table_b.id = table_a.id)
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)


We only get index scan on table_a


EXPLAIN SELECT * FROM view_a_eq WHERE key = 'test';

 QUERY PLAN

 Nested Loop  (cost=8.36..25.53 rows=1 width=100)
   Join Filter: (table_a.id = table_b.id)
   ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
 Recheck Cond: ((key)::text = 'test'::text)
 ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 width=0)
   Index Cond: ((key)::text = 'test'::text)
   ->  Materialize  (cost=4.18..12.66 rows=4 width=36)
 ->  Bitmap Heap Scan on a table_b  (cost=4.18..12.64 rows=4 width=36)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
We get index scan on both tables and the where clause is pushed all the way down


EXPLAIN SELECT * FROM view_a_distinct WHERE key = 'test';

 QUERY PLAN

 Hash Join  (cost=12.69..34.43 rows=1 width=100)
   Hash Cond: (table_b.id = table_a.id)
   Join Filter: (NOT ((table_a.key)::text IS DISTINCT FROM (table_b.key)::text))
   ->  Seq Scan on a table_b  (cost=0.00..18.50 rows=850 width=36)
   ->  Hash  (cost=12.64..12.64 rows=4 width=68)
 ->  Bitmap Heap Scan on a table_a  (cost=4.18..12.64 rows=4 width=68)
   Recheck Cond: ((key)::text = 'test'::text)
   ->  Bitmap Index Scan on a_key_idx  (cost=0.00..4.18 rows=4 
width=0)
 Index Cond: ((key)::text = 'test'::text)
Same as the first example

In these examples it really doesn't matter which plan is used. But for larger 
view it might be a great hint for the optimizer to know that since we are 
joining on a primary key we could hint that all other columns for the table is 
actually NOT DISTINCT from each other. This will result in the planner being 
able to push the WHERE condition down into the other joined tables.

It works well for the = operator, but it is easy to see the it fails as soon as 
we asks about nulls.

SELECT * FROM view_a_eq WHERE key IS NULL;
 id | key | value | b_key
+-+---+---
(0 rows)

Maybe my examples are too simple, but I hope you can verify the same will hold 
for a much larger table with 2 different views on top of them.