Re: Contribution of IN operator handling

2018-11-20 Thread Julian Hyde
PS Here’s why we should focus on the SQL-to-JDBC example: it is concrete. If 
get it right, I’m sure the other adapters will benefit. We need to be sure not 
to over-fit the internal representation to SQL semantics (e.g. SQL’s IN allows 
composite values, e.g. “(x, y) IN ((1, 2), (3, 4))”, has a particular null 
semantics, allows expressions and column references on both LHS and RHS, e.g. 
“(x, 1) IN (1, y - 5), (2, y + 1)”). If we stick to RexNode with OR, AND, =, 
then the semantics are clear, and other adapters can implement those semantics 
using their particular language constructs.

> On Nov 20, 2018, at 1:21 PM, Julian Hyde  wrote:
> 
> Yes, the other adapters need to write their own converters, specific to their 
> target language. DB languages tend to be divergent in how they represent 
> “column in (constant1, …, constantN)”[1][2], so this makes sense. Of course 
> they can borrow/share code with the JDBC adapter.
> 
> Julian
> 
> [1] 
> https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators
>  
> 
> 
> [2] http://druid.io/docs/latest/querying/filters.html#in-filter 
> 
> 
> 
> 
>> On Nov 20, 2018, at 12:14 PM, Andrei Sereda > > wrote:
>> 
>> Hi Julian,
>> 
>> Thanks for your reply. I have a question
>> 
>> … where t is a table based on a JDBC data source …
>> Do we agree on that goal?
>> 
>> Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
>> implement their own version of converters (similar to RexToSqlConverter) ?
>> If it is a flat RexCall of ORs this should be doable, I guess.
>> 
>> On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde > > wrote:
>> 
>>> Thanks Andrei, that’s the discussion I was thinking of.
>>> 
>>> In Mykola’s case, I think it would be useful to solve the end-to-end
>>> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
>>> cN)”, where t is a table based on a JDBC data source, ci are constants and
>>> N is large, we want Calcite’s JDBC adapter to send a query similar to
>>> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>>> 
>>> Do we agree on that goal?
>>> 
>>> If we agree on the goal, the next question is how that query should be
>>> represented in the RelNode/RexNode intermediate representation. The choice
>>> of that representation has implications: performance (e.g. whether we hit a
>>> stack-overflow exception or quadratic algorithm), quality (whether we are
>>> forging a new code path that is untested), surface area (are we going to
>>> need to write a lot of new code, for example new planner rules, in order to
>>> achieve parity with existing cases).
>>> 
>>> The approach I advocate in
>>> https://issues.apache.org/jira/browse/CALCITE-2630 
>>>  <
>>> https://issues.apache.org/jira/browse/CALCITE-2630 
>>> > - representing the IN
>>> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
>>> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
>>> criteria. (We may need to fix some bugs relating to quadratic performance
>>> or stack depth, but those are worth doing anyway.)
>>> 
>>> Are there other approaches that meet the same criteria? The original
>>> proposal - adding IN as a Rex operator - is a significant increase in
>>> surface area, so we would either lose functionality (e.g. not be able to
>>> push filters into the IN list) or find ourselves having to write a lot of
>>> new code and have to fix a lot of new bugs.
>>> 
>>> Julian
>>> 
>>> 
 On Nov 20, 2018, at 10:06 AM, Andrei Sereda >>> > wrote:
 
 Convert SqlInOperator to In-Expression :
 https://issues.apache.org/jira/browse/CALCITE-2630 
 
 
 Related. full table scans and subQueryThreshold.
 
>>> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
>>>  
>>> 
 
 
 
 On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde  wrote:
 
> I recall contributing to some other conversations about large IN lists
> over the past few months. Before we jump into a discussion, can you
>>> locate
> those threads? Also, if there is not a JIRA case, can you please create
>>> one?
> 
> Julian
> 
>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk >> .INVALID>
> wrote:
>> 
>> Dear Calcite Administrators,
>> 
>> my name is Mykola, software engineer from Ukraine.
>> 
>> I had an issue with Calcite IN operator handling.
>>>

Re: Contribution of IN operator handling

2018-11-20 Thread Julian Hyde
Yes, the other adapters need to write their own converters, specific to their 
target language. DB languages tend to be divergent in how they represent 
“column in (constant1, …, constantN)”[1][2], so this makes sense. Of course 
they can borrow/share code with the JDBC adapter.

Julian

[1] 
https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators
 


[2] http://druid.io/docs/latest/querying/filters.html#in-filter 




> On Nov 20, 2018, at 12:14 PM, Andrei Sereda  wrote:
> 
> Hi Julian,
> 
> Thanks for your reply. I have a question
> 
> … where t is a table based on a JDBC data source …
> Do we agree on that goal?
> 
> Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
> implement their own version of converters (similar to RexToSqlConverter) ?
> If it is a flat RexCall of ORs this should be doable, I guess.
> 
> On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde  > wrote:
> 
>> Thanks Andrei, that’s the discussion I was thinking of.
>> 
>> In Mykola’s case, I think it would be useful to solve the end-to-end
>> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
>> cN)”, where t is a table based on a JDBC data source, ci are constants and
>> N is large, we want Calcite’s JDBC adapter to send a query similar to
>> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>> 
>> Do we agree on that goal?
>> 
>> If we agree on the goal, the next question is how that query should be
>> represented in the RelNode/RexNode intermediate representation. The choice
>> of that representation has implications: performance (e.g. whether we hit a
>> stack-overflow exception or quadratic algorithm), quality (whether we are
>> forging a new code path that is untested), surface area (are we going to
>> need to write a lot of new code, for example new planner rules, in order to
>> achieve parity with existing cases).
>> 
>> The approach I advocate in
>> https://issues.apache.org/jira/browse/CALCITE-2630 <
>> https://issues.apache.org/jira/browse/CALCITE-2630 
>> > - representing the IN
>> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
>> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
>> criteria. (We may need to fix some bugs relating to quadratic performance
>> or stack depth, but those are worth doing anyway.)
>> 
>> Are there other approaches that meet the same criteria? The original
>> proposal - adding IN as a Rex operator - is a significant increase in
>> surface area, so we would either lose functionality (e.g. not be able to
>> push filters into the IN list) or find ourselves having to write a lot of
>> new code and have to fix a lot of new bugs.
>> 
>> Julian
>> 
>> 
>>> On Nov 20, 2018, at 10:06 AM, Andrei Sereda  wrote:
>>> 
>>> Convert SqlInOperator to In-Expression :
>>> https://issues.apache.org/jira/browse/CALCITE-2630
>>> 
>>> Related. full table scans and subQueryThreshold.
>>> 
>> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
>>> 
>>> 
>>> 
>>> On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde  wrote:
>>> 
 I recall contributing to some other conversations about large IN lists
 over the past few months. Before we jump into a discussion, can you
>> locate
 those threads? Also, if there is not a JIRA case, can you please create
>> one?
 
 Julian
 
> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk > .INVALID>
 wrote:
> 
> Dear Calcite Administrators,
> 
> my name is Mykola, software engineer from Ukraine.
> 
> I had an issue with Calcite IN operator handling.
> 
> Here is my previous email to you:
> 
 
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
> 
> It is really important to us to have an option to left IN operator "as
> is" and do not do any conversions. I implemented it a while ago at my
> local, and it successfully works in our project.
> 
> Our team would be happy to have your review and contribute it to
>> Calcite.
> 
> If you have no objections may i create a work item in Jira? I am
> following these steps:
> https://calcite.apache.org/develop/#contributing
> 
> Thanks a lot,
> Mykola



Re: Contribution of IN operator handling

2018-11-20 Thread Andrei Sereda
Hi Julian,

Thanks for your reply. I have a question

… where t is a table based on a JDBC data source …
Do we agree on that goal?

Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
implement their own version of converters (similar to RexToSqlConverter) ?
If it is a flat RexCall of ORs this should be doable, I guess.

On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde  wrote:

> Thanks Andrei, that’s the discussion I was thinking of.
>
> In Mykola’s case, I think it would be useful to solve the end-to-end
> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
> cN)”, where t is a table based on a JDBC data source, ci are constants and
> N is large, we want Calcite’s JDBC adapter to send a query similar to
> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>
> Do we agree on that goal?
>
> If we agree on the goal, the next question is how that query should be
> represented in the RelNode/RexNode intermediate representation. The choice
> of that representation has implications: performance (e.g. whether we hit a
> stack-overflow exception or quadratic algorithm), quality (whether we are
> forging a new code path that is untested), surface area (are we going to
> need to write a lot of new code, for example new planner rules, in order to
> achieve parity with existing cases).
>
> The approach I advocate in
> https://issues.apache.org/jira/browse/CALCITE-2630 <
> https://issues.apache.org/jira/browse/CALCITE-2630> - representing the IN
> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
> criteria. (We may need to fix some bugs relating to quadratic performance
> or stack depth, but those are worth doing anyway.)
>
> Are there other approaches that meet the same criteria? The original
> proposal - adding IN as a Rex operator - is a significant increase in
> surface area, so we would either lose functionality (e.g. not be able to
> push filters into the IN list) or find ourselves having to write a lot of
> new code and have to fix a lot of new bugs.
>
> Julian
>
>
> > On Nov 20, 2018, at 10:06 AM, Andrei Sereda  wrote:
> >
> > Convert SqlInOperator to In-Expression :
> > https://issues.apache.org/jira/browse/CALCITE-2630
> >
> > Related. full table scans and subQueryThreshold.
> >
> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
> >
> >
> >
> > On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde  wrote:
> >
> >> I recall contributing to some other conversations about large IN lists
> >> over the past few months. Before we jump into a discussion, can you
> locate
> >> those threads? Also, if there is not a JIRA case, can you please create
> one?
> >>
> >> Julian
> >>
> >>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk  .INVALID>
> >> wrote:
> >>>
> >>> Dear Calcite Administrators,
> >>>
> >>> my name is Mykola, software engineer from Ukraine.
> >>>
> >>> I had an issue with Calcite IN operator handling.
> >>>
> >>> Here is my previous email to you:
> >>>
> >>
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
> >>>
> >>> It is really important to us to have an option to left IN operator "as
> >>> is" and do not do any conversions. I implemented it a while ago at my
> >>> local, and it successfully works in our project.
> >>>
> >>> Our team would be happy to have your review and contribute it to
> Calcite.
> >>>
> >>> If you have no objections may i create a work item in Jira? I am
> >>> following these steps:
> >>> https://calcite.apache.org/develop/#contributing
> >>>
> >>> Thanks a lot,
> >>> Mykola
> >>
> >>
>
>


Re: Contribution of IN operator handling

2018-11-20 Thread Julian Hyde
Thanks Andrei, that’s the discussion I was thinking of.

In Mykola’s case, I think it would be useful to solve the end-to-end problem. 
Given a Calcite query “select … from t where x in (c1, c2, …, cN)”, where t is 
a table based on a JDBC data source, ci are constants and N is large, we want 
Calcite’s JDBC adapter to send a query similar to “select … from t where x in 
(c1, c2, …, cN)” to the JDBC source.

Do we agree on that goal?

If we agree on the goal, the next question is how that query should be 
represented in the RelNode/RexNode intermediate representation. The choice of 
that representation has implications: performance (e.g. whether we hit a 
stack-overflow exception or quadratic algorithm), quality (whether we are 
forging a new code path that is untested), surface area (are we going to need 
to write a lot of new code, for example new planner rules, in order to achieve 
parity with existing cases).

The approach I advocate in https://issues.apache.org/jira/browse/CALCITE-2630 
 - representing the IN 
clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and having 
RexToSqlConverter translate that OR into an IN SqlNode - meets those criteria. 
(We may need to fix some bugs relating to quadratic performance or stack depth, 
but those are worth doing anyway.)

Are there other approaches that meet the same criteria? The original proposal - 
adding IN as a Rex operator - is a significant increase in surface area, so we 
would either lose functionality (e.g. not be able to push filters into the IN 
list) or find ourselves having to write a lot of new code and have to fix a lot 
of new bugs.

Julian


> On Nov 20, 2018, at 10:06 AM, Andrei Sereda  wrote:
> 
> Convert SqlInOperator to In-Expression :
> https://issues.apache.org/jira/browse/CALCITE-2630
> 
> Related. full table scans and subQueryThreshold.
> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E
> 
> 
> 
> On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde  wrote:
> 
>> I recall contributing to some other conversations about large IN lists
>> over the past few months. Before we jump into a discussion, can you locate
>> those threads? Also, if there is not a JIRA case, can you please create one?
>> 
>> Julian
>> 
>>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk 
>>> 
>> wrote:
>>> 
>>> Dear Calcite Administrators,
>>> 
>>> my name is Mykola, software engineer from Ukraine.
>>> 
>>> I had an issue with Calcite IN operator handling.
>>> 
>>> Here is my previous email to you:
>>> 
>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
>>> 
>>> It is really important to us to have an option to left IN operator "as
>>> is" and do not do any conversions. I implemented it a while ago at my
>>> local, and it successfully works in our project.
>>> 
>>> Our team would be happy to have your review and contribute it to Calcite.
>>> 
>>> If you have no objections may i create a work item in Jira? I am
>>> following these steps:
>>> https://calcite.apache.org/develop/#contributing
>>> 
>>> Thanks a lot,
>>> Mykola
>> 
>> 



Re: Contribution of IN operator handling

2018-11-20 Thread Andrei Sereda
Convert SqlInOperator to In-Expression :
https://issues.apache.org/jira/browse/CALCITE-2630

Related. full table scans and subQueryThreshold.
https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E



On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde  wrote:

> I recall contributing to some other conversations about large IN lists
> over the past few months. Before we jump into a discussion, can you locate
> those threads? Also, if there is not a JIRA case, can you please create one?
>
> Julian
>
> > On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk 
> > 
> wrote:
> >
> > Dear Calcite Administrators,
> >
> > my name is Mykola, software engineer from Ukraine.
> >
> > I had an issue with Calcite IN operator handling.
> >
> > Here is my previous email to you:
> >
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
> >
> > It is really important to us to have an option to left IN operator "as
> > is" and do not do any conversions. I implemented it a while ago at my
> > local, and it successfully works in our project.
> >
> > Our team would be happy to have your review and contribute it to Calcite.
> >
> > If you have no objections may i create a work item in Jira? I am
> > following these steps:
> > https://calcite.apache.org/develop/#contributing
> >
> > Thanks a lot,
> > Mykola
>
>


Re: Contribution of IN operator handling

2018-11-20 Thread Julian Hyde
I recall contributing to some other conversations about large IN lists over the 
past few months. Before we jump into a discussion, can you locate those 
threads? Also, if there is not a JIRA case, can you please create one?

Julian

> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk  
> wrote:
> 
> Dear Calcite Administrators,
> 
> my name is Mykola, software engineer from Ukraine.
> 
> I had an issue with Calcite IN operator handling.
> 
> Here is my previous email to you:
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
> 
> It is really important to us to have an option to left IN operator "as
> is" and do not do any conversions. I implemented it a while ago at my
> local, and it successfully works in our project.
> 
> Our team would be happy to have your review and contribute it to Calcite.
> 
> If you have no objections may i create a work item in Jira? I am
> following these steps:
> https://calcite.apache.org/develop/#contributing
> 
> Thanks a lot,
> Mykola



Contribution of IN operator handling

2018-11-20 Thread Mykola Zerniuk
Dear Calcite Administrators,

my name is Mykola, software engineer from Ukraine.

I had an issue with Calcite IN operator handling.

Here is my previous email to you:
https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E

It is really important to us to have an option to left IN operator "as
is" and do not do any conversions. I implemented it a while ago at my
local, and it successfully works in our project.

Our team would be happy to have your review and contribute it to Calcite.

If you have no objections may i create a work item in Jira? I am
following these steps:
https://calcite.apache.org/develop/#contributing

Thanks a lot,
Mykola