Re: [HACKERS] Varying results when using merge joins over postgres_fdw vs hash joins

2017-09-20 Thread David Kohn
Yeah. Definitely went down a fun rabbit hole on that separate code paths
issue.

Perhaps, rather than trying to fix this automatically, we should
> leave it to the user.  We could invent another import option that
> says what to translate "default" to, with the default being,
> uh, "default".
>
> I like this idea. One suggestion might be to do a check for the remote
LC_COLLATE and the local LC_COLLATE at the beginning of an IMPORT FOREIGN
SCHEMA call and at least raise a warning if the default collations do not
match. That wouldn't break anything, but at least would notify the user
that something bad could be happening and pointing them to that variable.
Actually, instead of an import option, this might make more sense as an
option on the foreign server, add a default_collation_mapping option for
the foreign server and raise a warning either at foreign server creation
time or at import foreign schema time (probably the latter as I don't think
we actually connect to the remote when we create the foreign server).

D



On Wed, Sep 20, 2017 at 12:28 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Corey Huinker <corey.huin...@gmail.com> writes:
> > We had difficulty finding the place in the code were LC_COLLATE gets
> > recombobulated into a recognized collation.
>
> That's because it isn't.  The DB's default collation boils down to
> "call strcoll(), having set LC_COLLATE to whatever pg_database says".
> Non-default collations go through strcoll_l(), which might not even
> exist on a given platform.  So they're entirely separate code paths.
>
> regards, tom lane
>



-- 
David Kohn | Data Engineer | MOAT
63 Madison Ave, 15th Floor, NYC


Re: [HACKERS] Function Volatility and Views Unexpected Behavior

2017-07-13 Thread David Kohn
Thanks for the reminder about explain verbose, that's helpful.

But optimization does massively change the number of calls of a volatile
function in a naive evaluation of a query:

`explain analyze verbose select data1 from table1_silly_view where id >=10
and id <= 100;`

does an index scan and only runs the volatile function for rows in the view
where id >= 10 and id <=100

Subquery Scan on table1_silly_view  (cost=0.29..33.77 rows=91 width=8)
(actual time=2.552..206.563 rows=91 loops=1)

  Output: table1_silly_view.data1

  ->  Index Scan using table1_pkey on public.table1  (cost=0.29..32.86
rows=91 width=20) (actual time=2.550..206.425 rows=91 loops=1)

Output: NULL::integer, table1.data1, something_silly(table1.id)

Index Cond: ((table1.id >= 10) AND (table1.id <= 100))

Planning time: 0.526 ms

Execution time: 206.724 ms


whereas

`explain analyze verbose select data1 from table1_silly_view where id in (
select id from table1 where id >= 10 and id <=100);`

does a full sequential scan, over the view, producing whatever side effects
the volatile function does for every row in the view even though they
produce the same output and have what should be equivalent quals.

Hash Semi Join  (cost=11.24..2793.50 rows=91 width=8) (actual
time=23.603..22759.297 rows=91 loops=1)

  Output: table1_1.data1

  Hash Cond: (table1_1.id = table1.id)

  ->  Seq Scan on public.table1 table1_1  (cost=0.00..2655.00 rows=1
width=20) (actual time=2.468..22720.942 rows=1 loops=1)

Output: table1_1.id, table1_1.data1, something_silly(table1_1.id)

  ->  Hash  (cost=10.11..10.11 rows=91 width=4) (actual time=0.484..0.484
rows=91 loops=1)

Output: table1.id

Buckets: 1024  Batches: 1  Memory Usage: 12kB

->  Index Only Scan using table1_pkey on public.table1
 (cost=0.29..10.11 rows=91 width=4) (actual time=0.383..0.430 rows=91
loops=1)

  Output: table1.id

  Index Cond: ((table1.id >= 10) AND (table1.id <= 100))

  Heap Fetches: 91

Planning time: 0.877 ms

Execution time: 22759.448 ms


I recognize that it is an anti-pattern to put a volatile function call in a
view, and don't know that there's a better way of dealing with it, as not
using indexes in a view that has a volatile function call in it at all
seems like a very bad choice, but still think it might be something to
document better.

-David



On Wed, Jul 12, 2017 at 3:23 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> David Kohn <djk...@gmail.com> writes:
> > I encountered some unexpected behavior when debugging a query that was
> > taking longer than expected, basically, a volatile function that makes a
> > column in a view is called even when that column is not selected in the
> > query, making it so that the function is called for every row in the
> view,
> > I'm not sure that that would necessarily be the expected behavior, as it
> > was my understanding that columns that are not selected are not
> evaluated,
> > for instance if there was a join in a view that produced some columns and
> > said columns were not selected, I would expect it to be optimized away.
>
> No, this is the expected behavior; we don't like optimization to change
> the number of calls of a volatile function from what would occur in naive
> evaluation of the query.  If that prospect doesn't bother you, it's
> likely because your function isn't really volatile ...
>
> > The other problem is that the function call does not appear in the query
> > plan.
>
> I think "explain verbose" will fix that for you.
>
> regards, tom lane
>


[HACKERS] Function Volatility and Views Unexpected Behavior

2017-07-12 Thread David Kohn
atile, I can imagine a newer user being truly
perplexed as to what was going on. Does anyone have thoughts as to where
documentation on that could be added to provide users some guidance? CREATE
FUNCTION? CREATE VIEW? I'm happy to write up a note on that behavior if
people think that would be useful.

Best,
David Kohn