Re: Rangejoin rebased

2018-03-28 Thread David Steele
On 3/28/18 2:23 PM, Andres Freund wrote:
> On 2018-03-28 14:18:42 -0400, David Steele wrote:
>> It seems that a new patch is needed here but none has been presented.
>> I've marked this Waiting on Author for the moment, but I really think it
>> should be marked Returned with Feedback and submitted to the next CF
>> when a new patch is ready.
> 
> I'd just do so now. There's not been any progress for months, and
> there's been an update request weeks ago...
Done.

-- 
-David
da...@pgmasters.net



Re: Rangejoin rebased

2018-03-28 Thread Andres Freund
Hi,

On 2018-03-28 14:18:42 -0400, David Steele wrote:
> It seems that a new patch is needed here but none has been presented.
> I've marked this Waiting on Author for the moment, but I really think it
> should be marked Returned with Feedback and submitted to the next CF
> when a new patch is ready.

I'd just do so now. There's not been any progress for months, and
there's been an update request weeks ago...

Greetings,

Andres Freund



Re: Rangejoin rebased

2018-03-28 Thread David Steele
On 3/2/18 11:44 AM, Robert Haas wrote:
> On Fri, Mar 2, 2018 at 11:12 AM, Alexander Kuzmenkov
>  wrote:
>> On 16.01.2018 10:49, Jeff Davis wrote:
>>> My proposed fix is to make an internal opfamily identical to the
>>> external one, such that it's not recognized as part of the same EC,
>>> and the planner won't try to eliminate it. It loses out on potential
>>> optimizations, but those are mostly theoretical since the btree
>>> opclass ordering for ranges is not very interesting to a user.
>>
>> I think I figured out what to do with missing sort directions. We can change
>> select_outer_pathkeys_for_merge() to generate the pathkeys we need. Also,
>> find_mergeclauses_for_outer_pathkeys() has to be changed too, so that it
>> knows which pathkeys are compatible to which range join clauses.
>>
>> About the patch, do I understand it right that you are working on the next
>> version now?
> 
> I think we are quite clearly past the deadline to submit a new patch
> for inclusion in v11 at this point.

It seems that a new patch is needed here but none has been presented.
I've marked this Waiting on Author for the moment, but I really think it
should be marked Returned with Feedback and submitted to the next CF
when a new patch is ready.

Regards,
-- 
-David
da...@pgmasters.net



Re: Rangejoin rebased

2018-03-02 Thread Robert Haas
On Fri, Mar 2, 2018 at 11:12 AM, Alexander Kuzmenkov
 wrote:
> On 16.01.2018 10:49, Jeff Davis wrote:
>> My proposed fix is to make an internal opfamily identical to the
>> external one, such that it's not recognized as part of the same EC,
>> and the planner won't try to eliminate it. It loses out on potential
>> optimizations, but those are mostly theoretical since the btree
>> opclass ordering for ranges is not very interesting to a user.
>
> I think I figured out what to do with missing sort directions. We can change
> select_outer_pathkeys_for_merge() to generate the pathkeys we need. Also,
> find_mergeclauses_for_outer_pathkeys() has to be changed too, so that it
> knows which pathkeys are compatible to which range join clauses.
>
> About the patch, do I understand it right that you are working on the next
> version now?

I think we are quite clearly past the deadline to submit a new patch
for inclusion in v11 at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rangejoin rebased

2018-03-02 Thread Alexander Kuzmenkov

On 16.01.2018 10:49, Jeff Davis wrote:

My proposed fix is to make an internal opfamily identical to the
external one, such that it's not recognized as part of the same EC,
and the planner won't try to eliminate it. It loses out on potential
optimizations, but those are mostly theoretical since the btree
opclass ordering for ranges is not very interesting to a user.
I think I figured out what to do with missing sort directions. We can 
change select_outer_pathkeys_for_merge() to generate the pathkeys we 
need. Also, find_mergeclauses_for_outer_pathkeys() has to be changed 
too, so that it knows which pathkeys are compatible to which range join 
clauses.


About the patch, do I understand it right that you are working on the 
next version now?


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Rangejoin rebased

2018-01-24 Thread Jeff Davis
On Tue, Jan 23, 2018 at 2:04 AM, Simon Riggs  wrote:
> Perhaps we are misunderstanding each other
>
> TIMESTAMP <@ RANGE1 doesn't match if RANGE1 is empty
> and that is the most important case

When <@ is supported, that case should be fine if range1 is on the
outer. The case I was concerned about is with a R1 <@ R2 join where R1
is on the inner side and could have empty ranges.

One option would be to try to force R2 to be on the inner. But that
doesn't quite solve other related issues, like if R2 has a few large
ranges that contain almost everything.

Right now I'm considering an approach where we use some counters to
determine that a few ranges are preventing us from moving the mark
forward, and then move those few ranges into a separate tuplestore so
that we can move the mark forward.

> RANGE OP RANGE is important also. It would be useful for OP to be more
> than just &&

I agree that contains/contained-by are useful; do you have other
operators in mind as well?

>
> It's certainly weird that R1 @> EMPTY is true, but R1 && EMPTY is not.

This was discussed back in 9.2, and there were no obviously better
semantics available. I chose to follow set semantics: X contains Y
means that Y is a subset of X; X overlaps Y means that the
intersection of X and Y is nonempty.

I understand it can be surprising, but other definitions can be surprising, too.

Regards,
 Jeff Davis



Re: Rangejoin rebased

2018-01-23 Thread Simon Riggs
On 23 January 2018 at 05:08, Jeff Davis  wrote:
> On Fri, Jan 19, 2018 at 2:07 AM, Simon Riggs  wrote:
>> err... that isn't correct. An empty range matches nothing, so can be
>> ignored in joins.
>>
>> So probably best to explain some more, please.
>
> The semantics of R1 @> R2 will return true if R1 is a non-NULL range
> and R2 is empty.
>
> It's set semantics, and all sets contain the empty set.

Understood

> But I understand @> is an important case so I am looking into it.

Perhaps we are misunderstanding each other

TIMESTAMP <@ RANGE1 doesn't match if RANGE1 is empty
and that is the most important case

RANGE OP RANGE is important also. It would be useful for OP to be more
than just &&

It's certainly weird that R1 @> EMPTY is true, but R1 && EMPTY is not.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rangejoin rebased

2018-01-22 Thread Jeff Davis
On Fri, Jan 19, 2018 at 2:07 AM, Simon Riggs  wrote:
> err... that isn't correct. An empty range matches nothing, so can be
> ignored in joins.
>
> So probably best to explain some more, please.

The semantics of R1 @> R2 will return true if R1 is a non-NULL range
and R2 is empty.

It's set semantics, and all sets contain the empty set.

But I understand @> is an important case so I am looking into it.

Regards,
Jeff Davis



Re: Rangejoin rebased

2018-01-19 Thread Simon Riggs
On 19 January 2018 at 08:25, Simon Riggs  wrote:
> On 17 January 2018 at 05:49, Jeff Davis  wrote:
>> On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs  wrote:
>>> Do we optimize for TIMESTAMP <@ RANGE as well?
>>
>> Not currently. It requires a little extra complexity because empty
>> ranges will match anything and need special handling.

err... that isn't correct. An empty range matches nothing, so can be
ignored in joins.

So probably best to explain some more, please.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rangejoin rebased

2018-01-19 Thread Simon Riggs
On 17 January 2018 at 05:49, Jeff Davis  wrote:
> On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs  wrote:
>> Do we optimize for TIMESTAMP <@ RANGE as well?
>
> Not currently. It requires a little extra complexity because empty
> ranges will match anything and need special handling.

TIMESTAMP <@ RANGE is arguably more important than RANGE && RANGE

Trying to cast timestamp to range to make that work is a bit hokey

If the problem is just empty ranges, it seems like we should do that here also.

I'd be happy with the optimization only working if ranges are provably
non-empty, e.g. CHECK (NOT isempty(col))
Or perhaps we need non-empty types: e.g. tsrangene

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rangejoin rebased

2018-01-15 Thread Jeff Davis
On Fri, Jan 12, 2018 at 11:02 AM, Alexander Kuzmenkov
 wrote:
> The sort order isn't right for the join, it seems. I remember having similar
> troubles with my full merge join implementation. I tried filtering
> unsuitable paths in try_mergejoin_path, but that was not quite enough. The
> planner tries to use only one sort direction to limit the number of path, so
> the path we need might not be there at all. This optimization was added in
> commit 834ddc62, see right_merge_direction(). Sadly, I have no idea how to
> solve this.

Interesting problem, thank you. My first reaction was to hack
right_merge_direction() to recognize the range opfamily in the ASC
direction as always potentially useful. But what's happening is that,
when the inputs are already forced to be sorted DESC, as in your
example, there is still no appropriate pathkey. So the problem is
reproducible with no ORDER BY clause at all.

My proposed fix is to make an internal opfamily identical to the
external one, such that it's not recognized as part of the same EC,
and the planner won't try to eliminate it. It loses out on potential
optimizations, but those are mostly theoretical since the btree
opclass ordering for ranges is not very interesting to a user.

I notice that window functions seem to handle these cases better,
maybe that approach would work for your full join patch? I haven't
investigated that yet.

 Regards,
   Jeff Davis



Re: Rangejoin rebased

2018-01-15 Thread Robert Haas
On Tue, Jan 9, 2018 at 11:24 PM, Jeff Davis  wrote:
>> Just to emphasise why we want this, it might be better for the EXPLAIN
>> to say "Time Range Join" when the ranges being joined are Time Ranges,
>> and for other cases to just say "Range Join". The use of the word
>> Merge doesn't help much there.
>
> I don't care for special-casing the word "time" in there, because no
> other type would benefit. It seems a little too magical. I also do
> like leaving "merge" in there because it helps the user understand why
> their inputs are being sorted.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rangejoin rebased

2018-01-12 Thread Alexander Kuzmenkov

Hi Jeff,

Just a quick comment -- I ran a slightly modified version of a query 
from the regression tests, and got an assertion failure:


select i1, ir1, i2, ir2
  from (select * from rangejoin_left order by ir1 desc) as a1 inner 
join (select * from rangejoin_right order by ir2 desc) as a2

    on (i1 = i2 and ir1 && ir2)
  order by ir1 desc, i1;
TRAP: FailedAssertion("!(!ssup->ssup_reverse)", File: 
"/home/akuzmenkov/pgp-old/build/../postgrespro/src/backend/executor/nodeMergejoin.c", 
Line: 492)


The sort order isn't right for the join, it seems. I remember having 
similar troubles with my full merge join implementation. I tried 
filtering unsuitable paths in try_mergejoin_path, but that was not quite 
enough. The planner tries to use only one sort direction to limit the 
number of path, so the path we need might not be there at all. This 
optimization was added in commit 834ddc62, see right_merge_direction(). 
Sadly, I have no idea how to solve this.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Rangejoin rebased

2018-01-10 Thread Simon Riggs
On 10 January 2018 at 04:24, Jeff Davis  wrote:

> Done.

I think you need to make changes to other parts of the docs also, so
that it is clear what will now be possible

https://www.postgresql.org/docs/devel/static/using-explain.html
https://www.postgresql.org/docs/devel/static/xoper-optimization.html#id-1.8.3.17.9
https://www.postgresql.org/docs/devel/static/planner-optimizer.html

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rangejoin rebased

2018-01-10 Thread Simon Riggs
On 10 January 2018 at 04:24, Jeff Davis  wrote:
> On Sat, Jan 6, 2018 at 10:38 AM, Simon Riggs  wrote:
>> For this to be useful, it needs to include some details of how to use
>> it when people have NOT used range datatypes in their tables.
>
> Good idea. I added an example that doesn't have range types in the base table.

Cool, thanks

...


It would be useful to consider any related use cases.
Are there applications for range operators other than &&?

Do we optimize for TIMESTAMP <@ RANGE as well?

Does this link in nicely with partition-aware joins?
Does it allow partition exclusion if you join a daterange to a time
range partitioned table?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services