How to give type in case of type mismatch while doing union

2020-04-27 Thread Anjali Shrishrimal
Hi,

While doing union of 2 RelNodes with different types, I am getting NPE. (I am 
using calcite 1.21.0)
java.lang.NullPointerException: at index 0
at 
com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:225)
at 
com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:215)
at 
com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:209)
at 
com.google.common.collect.ImmutableList.construct(ImmutableList.java:346)
at 
com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.canonize(RelDataTypeFactoryImpl.java:373)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:155)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:146)
at 
org.apache.calcite.rel.type.RelDataTypeFactory$Builder.build(RelDataTypeFactory.java:569)
at 
org.apache.calcite.rel.type.RelDataTypeFactoryImpl.leastRestrictiveStructuredType(RelDataTypeFactoryImpl.java:257)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictiveSqlType(SqlTypeFactoryImpl.java:285)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictive(SqlTypeFactoryImpl.java:156)
at 
org.apache.calcite.rel.core.SetOp.deriveRowType(SetOp.java:107)

If the column types (family types) are different, currently the derived type is 
null. Is there any way to control that?
Where can I define the type in case of mismatch ?


Thank you,
Anjali Shrishrimal


[Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Danny Chan
Here is an example https://travis-ci.org/github/apache/calcite/jobs/679970288

Can someone help with that ? Thanks ~

Best,
Danny Chan


Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Vladimir Sitnikov
I guess the solution is to ask INFRA to reset all the Travis caches for
Calcite.

Vladimir


Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Michael Mior
Yup! Just open an issue on the INFRA JIRA and they should be able to
take care of it.
--
Michael Mior
mm...@apache.org

Le lun. 27 avr. 2020 à 08:51, Danny Chan  a écrit :
>
> Thanks, how could I ask them to do that ? Log an issue there ?
>
> Best,
> Danny Chan
> 在 2020年4月27日 +0800 PM5:38,Vladimir Sitnikov ,写道:
> > I guess the solution is to ask INFRA to reset all the Travis caches for
> > Calcite.
> >
> > Vladimir


Re: How to give type in case of type mismatch while doing union

2020-04-27 Thread XING JIN
Hi, Anjali ~
Are you doing the UNION by Sql ? If so, can you give the Sql content ?
Are you doing the UNION on RelNodes ?, If so, you need to do type CAST.

Jin

Anjali Shrishrimal  于2020年4月27日周一
下午4:25写道:

> Hi,
>
> While doing union of 2 RelNodes with different types, I am getting NPE. (I
> am using calcite 1.21.0)
> java.lang.NullPointerException: at index 0
> at
> com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:225)
> at
> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:215)
> at
> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:209)
> at
> com.google.common.collect.ImmutableList.construct(ImmutableList.java:346)
> at
> com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
> at
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.canonize(RelDataTypeFactoryImpl.java:373)
> at
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:155)
> at
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:146)
> at
> org.apache.calcite.rel.type.RelDataTypeFactory$Builder.build(RelDataTypeFactory.java:569)
> at
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.leastRestrictiveStructuredType(RelDataTypeFactoryImpl.java:257)
> at
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictiveSqlType(SqlTypeFactoryImpl.java:285)
> at
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictive(SqlTypeFactoryImpl.java:156)
> at
> org.apache.calcite.rel.core.SetOp.deriveRowType(SetOp.java:107)
>
> If the column types (family types) are different, currently the derived
> type is null. Is there any way to control that?
> Where can I define the type in case of mismatch ?
>
>
> Thank you,
> Anjali Shrishrimal
>


Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Vladimir Sitnikov
> Log an issue there ?

An issue for INFRA project would probably do:
https://issues.apache.org/jira/projects/INFRA

Vladimir


Re: Building a Calcite Adapter

2020-04-27 Thread Jon Pither
Hi Stamatis & Calcite team,

Thanks for your response. We've made some good progress since - following
JdbcConvention as you suggest - and now we've got the Crux adapter handling
joins, sorts and more. We're in a good place I feel, and it's exciting to
see Calcite providing a SQL layer on top of our Datalog. Thanks again :-)

One Q: is it possible to extend the Calcite parser to do the following:
`VALIDTIME AS OF date('2010...') SELECT * FROM FOO`. So far I've played
with extending the parser using fmpp & javacc and it certainly feels
doable, but I can't quite grok what the extension point would be in Calcite
to add this - for example you can hang off arbitrary extensions from
subtrees such as CREATE and DROP (by extending SqlCreate and SqlDrop
respectively)... where might an arbitrary precursor command such as
`VALIDTIME AS OF date()` fit in?

Regards,

Jon.


On Tue, 21 Apr 2020 at 22:43, Stamatis Zampetakis  wrote:

> Hi Jon,
>
> Thanks for your kind words. I'm sure people working on the project are very
> happy to receive some positive feedback for their work from time to time :)
>
> I had a quick look on your project and definitely looks interesting.
>
> If your engine (Crux) uses better join algorithms than the ones provided by
> Calcite and if you have an optimizer that can apply join re-ordering and
> other optimization techniques efficiently then I guess going further and
> pushing joins and other things to Crux is a good idea.
>
> Having said that, I am not sure if the TranslatableTable approach will get
> you much further to this direction.
> I would suggest to have a look in JdbcConvention [1] and see how the notion
> of Convention along with the respective rules and relational expressions
> help to push operations into traditional RDBMs. The Cassandra, Mongo, and
> Elastic adapters are not a very good example since the underlying engines
> do not support joins.
>
> I am not aware if there are people offering consulting services for Calcite
> but I guess if there are you will know already.
> Apart from that the project has many volunteers willing to help so if you
> have more questions don't hesitate to send them to this list.
>
> Best,
> Stamatis
>
> [1]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcConvention.java
>
>
> On Tue, Apr 7, 2020, 12:22 PM Jon Pither  wrote:
>
> > Hi Calcite Devs,
> >
> > Firstly, thank you to all of you for building this fantastic tool.
> >
> > I'm currently experimenting with using Calcite on top of our document
> > database Crux (opencrux.com) offering bitemporal features using a
> Datalog
> > query language. You can see our efforts here, written in Clojure!
> >
> >
> >
> https://github.com/juxt/crux/blob/jp/calcite/crux-calcite/src/crux/calcite.clj
> >
> >
> https://github.com/juxt/crux/blob/jp/calcite/crux-test/test/crux/calcite_test.clj
> >
> > So far we've been impressed at the power Calcite gives, with such little
> > amount of integration code needed.
> >
> > We now have an initial MVP working using the ProjectableFilterableTable
> > route. The adapter is basically constructing a Datalog query that we then
> > execute against our DB.
> >
> > So far so good, and now I have some initial questions:
> >
> > Firstly, in this code we're making use of ProjectableFilterableTable to
> get
> > us up and running. I've looked at the Mongo and Elastic adapters in the
> > Calcite source, and they opt for TranslatableTable which is a deeper
> > integration. From I can see the immediate disadvantage of
> > ProjectableFilterableTable is that it's a query per table, meaning that
> we
> > can't efficiently delegate joins to our DB.
> >
> > Moving to TranslatableTable would be a significant investment for us. My
> > first question is: would you encourage us to make this investment, given
> > we've got something up and running using ProjectableFilterableTable, with
> > Calcite doing the heavy lifting? Please could you also advise on
> soliciting
> > mentoring / consulting to help guide us, for which we can compensate.
> >
> > Our next question is around temporality. I can see in the Calcite code
> that
> > there is a concept of a TemporalTable, supporing "FOR SYSTEM_TIME AS OF
> X".
> > It looks like we wouldn't be able to make use of this using
> > ProjectableFilterableTable, at least this is my experience thus far. In
> > Crux we also expose VALID_TIME to our users to be able to query for,
> > whereby users can query against VALID_TIME and/or SYSTEM_TIME. How might
> > you recommend we achieve this using Calcite?
> >
> > Thanks & Regards,
> >
> > Jon
> >
>


Advise on ClassCastException in Linq4j$EnumeratorIterator

2020-04-27 Thread Ayelet Morris
Hi Calcite Developers,
Hope you are all doing well at this crazy time.

I have a question regarding Linq4j$EnumeratorIterator
throwing ClassCastException.
*Background*: I'm running a Tableau testing framework called TDVT (it's in
beta) in order to test our product's JDBC connector.  Our product uses
Calcite 1.21.0 and Avatics 1.15.0.

I encountered this class cast exception when running timestamp related sql
functions (I have about 127 different SQLs failing with the same class cast
exception).
*My SQL**:* select DAYOFMONTH(datetime0) from Calcs
datetime0 is a java.sql.Timestamp field in my POJO, the data in the CSV I'm
using to fill in the POJO is written like this: '2004-07-23 21:13:37'
*The exception:*
ClassCastException: java.sql.Timestamp cannot be cast to java.lang.Long
at Baz$1$1.current(Unknown Source)
at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:683)
at
org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
at
org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
at com.gigaspaces.jdbc.TimestampTest.test(TimestampTest.java:64)

This exception is thrown when trying to perform "next()"
I cannot debug the generated code... I just see it being thrown there.

I looked into the generated code for performing this (and similar) sql
function and couldn't find a specific time format that is required, but in
the documentation I saw the use of java.sql.Date as the function type, I
tried to run my test with Date field I have in my POJO and it did return a
correct result.
I suspect it might not be compatible with Timestamp data type, though I saw
in your tests you do test with timestamp '2008-1-23 12:12:12', I didn't try
to run the test in the DruidAdapter suite but I saw that you are using a
timestamp based column there.

Do you know of a reason for this exception at this location? Did you see
this exception before at this location? Do you have any idea where I should
look for further information/clues?

Thanks,
Ayelet


Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Danny Chan
Thanks, how could I ask them to do that ? Log an issue there ?

Best,
Danny Chan
在 2020年4月27日 +0800 PM5:38,Vladimir Sitnikov ,写道:
> I guess the solution is to ask INFRA to reset all the Travis caches for
> Calcite.
>
> Vladimir


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Roman Kondakov
Hi all,

Stamatis, Haisheng thank you very much for your feedback! I really
appreciate it.

> If in the new planner we end up copy-pasting code then I guess it will be a 
> bad idea.

Yes, there are some code duplication between Volcano planner and
Cascades planner. I think I'll move it to some common superclass: either
to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.

> Like that it will be easier for everybody to test and see if the changes make 
> this better or worse. From a backward compatibility perspective it seems 
> feasible to keep the new eatures configurable for a certain amount of time.

I was thinking about backward compatibility in this way: what if we can
switch planner in tests by setting some flag (system property?)
somewhere and check what happens with new planner if we replace Volcano
with it. And if ever it turns out that the new planner passes all
Volcano's tests and at the same time it works more efficiently, we can
safely replace Volcano planner with Cascades planner.

> A design doc would definitely help, especially if it has a few end-to-end 
> (from logical to physical plan) examples showing how the optimizer works
at each step before/after the changes. This is actually what is usually
missing in research papers that makes them hard to understand.
> I am thinking some similar to the examples that Haisheng send in the first  
> email but possibly a bit more detailed.

I agree, I'll add some exmples to the design doc very soon.

> I looked very briefly in the PR by Roman but I think I didn't see tests where 
> the final plan contains operators from multiple conventions. Multiple 
> conventions is among the choices that complicate certain parts of he existing 
> planner so we should make sure that we take this into account.

It's on my radar. I'm going to add these tests.

I would like to ask the commutnity about my next steps on the way from
transition of the Cascades planner from the prototype status to becoming
a part of the project. I see these steps like this:

1. Create a jira ticket.
2. Update design document with examples.
3. Make some research to obtain backward compatibility with Volcano
planner to be able to replace Volcano planner with Cascades planner in
test and to understand the current porblems with planner.
4. Solve known problems
  - materialized views
  - hints
  - multiple convetions
  - listener hooks
  - problems from p.3.
5. new PR, review and merge.
6. Replacie Volcano planner wiith Cascades after several releases.

What do you think about this roadmap?


-- 
Kind Regards
Roman Kondakov


On 27.04.2020 01:55, Stamatis Zampetakis wrote:
> Hi all,
> 
> I am very excited about the ideas discussed so far and especially by the
> enthusiasm of many people that are ready to help for pulling this out.
> I wouldn't except that we could have a prototype so quickly.
> Thanks a lot everyone!
> 
> In the debate between creating new planner or patching the existing one, I
> don't have a clear preference.
> I think the answer depends on how many things can we reuse.
> If in the new planner we end up copy-pasting code then I guess it will be a
> bad idea.
> On the other hand, if the new and old planner do not have many things in
> common then I guess the answer is obvious.
> 
> From Haisheng's description, I was thinking that many of the proposed
> changes could go in the existing planner.
> Like that it will be easier for everybody to test and see if the changes
> make this better or worse.
> From a backward compatibility perspective it seems feasible to keep the new
> features configurable for a certain amount of time.
> 
> From the wish-list, I think we should focus initially on points:
> 1. Top-down trait request
> 2. Convert traits without Abstract converters
> 4. Bottom-up trait derivation
> 
> I know that 3, and 5, are also important but I have the feeling they can
> wait a bit longer.
> 
> A design doc would definitely help, especially if it has a few end-to-end
> (from logical to physical plan) examples showing how the optimizer works at
> each step before/after the changes.
> This is actually what is usually missing in research papers that makes them
> hard to understand.
> I am thinking some similar to the examples that Haisheng send in the first
> email but possibly a bit more detailed.
> 
> I looked very briefly in the PR by Roman but I think I didn't see tests
> where the final plan contains operators from multiple conventions.
> Multiple conventions is among the choices that complicate certain parts of
> the existing planner so we should make sure that we take this into account.
> 
> Hoping to find some time to think over all this more quietly. Very
> interesting stuff :)
> 
> Best,
> Stamatis
> 
> On Sun, Apr 26, 2020 at 11:14 PM Haisheng Yuan  wrote:
> 
>> Hi Roman,
>>
>> Excellent! This is definitely a helpful contribution to the Calcite
>> community.
>> Thank you for your endeavors.
>>
>> Haisheng
>>
>> On 2020/04/26 19:25:00, Roman 

Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-27 Thread Rui Wang
Congrats Vineet!



-Rui

On Mon, Apr 27, 2020 at 11:16 AM Julian Hyde  wrote:

> Welcome Vineet! Thanks for your contributions so far.
>
> > On Apr 26, 2020, at 2:38 PM, Vineet G  wrote:
> >
> > Thanks a lot guys!
> >
> > Just to briefly introduce myself - I work with Cloudera (Hortonworks
> before) on Hive and I am a Hive PMC member. As Stamatis noted I have been
> involved in calcite since 2017. It is great honor to be part of this
> community. I am very excited to become committer and I look forward to
> contributing more.
> >
> > Regards,
> > Vineet Garg
> >
> >> On Apr 26, 2020, at 2:26 PM, Jesus Camacho Rodriguez <
> jcama...@apache.org> wrote:
> >>
> >> Congrats Vineet, well deserved!
> >>
> >> -Jesús
> >>
> >> On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu  wrote:
> >>
> >>> Congratulations, Vineet!
> >>>
> >>> Best,
> >>> Leonard Xu
>  在 2020年4月26日,18:07,xu  写道:
> 
>  Congrats, Vineet!
> 
>  Danny Chan  于2020年4月26日周日 下午4:52写道:
> 
> > Congrats, Vineet!
> >
> > Best,
> > Danny Chan
> > 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
> >>
> >> Congrats, Vineet!
> >
> 
> 
>  --
> 
>  Best regards,
> 
>  Xu
> >>>
> >>>
> >
>
>


Re: Building a Calcite Adapter

2020-04-27 Thread Jon Pither
Hi,

Another route we're looking at is to use `ALTER SESSION SET VALID_TIME =
date('2010')`. When we experiment with this - hoping to trigger
`SqlSetOption` - we get an java.lang.UnsupportedOperationException:

   CalcitePrepareImpl.java:  369
 org.apache.calcite.prepare.CalcitePrepareImpl/executeDdl

How could we make use of SqlSetOption? Do we need to extend the parser or
is there a simpler way?

Regards,

Jon.


On Mon, 27 Apr 2020 at 13:30, Jon Pither  wrote:

> Hi Stamatis & Calcite team,
>
> Thanks for your response. We've made some good progress since - following
> JdbcConvention as you suggest - and now we've got the Crux adapter handling
> joins, sorts and more. We're in a good place I feel, and it's exciting to
> see Calcite providing a SQL layer on top of our Datalog. Thanks again :-)
>
> One Q: is it possible to extend the Calcite parser to do the following:
> `VALIDTIME AS OF date('2010...') SELECT * FROM FOO`. So far I've played
> with extending the parser using fmpp & javacc and it certainly feels
> doable, but I can't quite grok what the extension point would be in Calcite
> to add this - for example you can hang off arbitrary extensions from
> subtrees such as CREATE and DROP (by extending SqlCreate and SqlDrop
> respectively)... where might an arbitrary precursor command such as
> `VALIDTIME AS OF date()` fit in?
>
> Regards,
>
> Jon.
>
>
> On Tue, 21 Apr 2020 at 22:43, Stamatis Zampetakis 
> wrote:
>
>> Hi Jon,
>>
>> Thanks for your kind words. I'm sure people working on the project are
>> very
>> happy to receive some positive feedback for their work from time to time
>> :)
>>
>> I had a quick look on your project and definitely looks interesting.
>>
>> If your engine (Crux) uses better join algorithms than the ones provided
>> by
>> Calcite and if you have an optimizer that can apply join re-ordering and
>> other optimization techniques efficiently then I guess going further and
>> pushing joins and other things to Crux is a good idea.
>>
>> Having said that, I am not sure if the TranslatableTable approach will get
>> you much further to this direction.
>> I would suggest to have a look in JdbcConvention [1] and see how the
>> notion
>> of Convention along with the respective rules and relational expressions
>> help to push operations into traditional RDBMs. The Cassandra, Mongo, and
>> Elastic adapters are not a very good example since the underlying engines
>> do not support joins.
>>
>> I am not aware if there are people offering consulting services for
>> Calcite
>> but I guess if there are you will know already.
>> Apart from that the project has many volunteers willing to help so if you
>> have more questions don't hesitate to send them to this list.
>>
>> Best,
>> Stamatis
>>
>> [1]
>>
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/jdbc/JdbcConvention.java
>>
>>
>> On Tue, Apr 7, 2020, 12:22 PM Jon Pither  wrote:
>>
>> > Hi Calcite Devs,
>> >
>> > Firstly, thank you to all of you for building this fantastic tool.
>> >
>> > I'm currently experimenting with using Calcite on top of our document
>> > database Crux (opencrux.com) offering bitemporal features using a
>> Datalog
>> > query language. You can see our efforts here, written in Clojure!
>> >
>> >
>> >
>> https://github.com/juxt/crux/blob/jp/calcite/crux-calcite/src/crux/calcite.clj
>> >
>> >
>> https://github.com/juxt/crux/blob/jp/calcite/crux-test/test/crux/calcite_test.clj
>> >
>> > So far we've been impressed at the power Calcite gives, with such little
>> > amount of integration code needed.
>> >
>> > We now have an initial MVP working using the ProjectableFilterableTable
>> > route. The adapter is basically constructing a Datalog query that we
>> then
>> > execute against our DB.
>> >
>> > So far so good, and now I have some initial questions:
>> >
>> > Firstly, in this code we're making use of ProjectableFilterableTable to
>> get
>> > us up and running. I've looked at the Mongo and Elastic adapters in the
>> > Calcite source, and they opt for TranslatableTable which is a deeper
>> > integration. From I can see the immediate disadvantage of
>> > ProjectableFilterableTable is that it's a query per table, meaning that
>> we
>> > can't efficiently delegate joins to our DB.
>> >
>> > Moving to TranslatableTable would be a significant investment for us. My
>> > first question is: would you encourage us to make this investment, given
>> > we've got something up and running using ProjectableFilterableTable,
>> with
>> > Calcite doing the heavy lifting? Please could you also advise on
>> soliciting
>> > mentoring / consulting to help guide us, for which we can compensate.
>> >
>> > Our next question is around temporality. I can see in the Calcite code
>> that
>> > there is a concept of a TemporalTable, supporing "FOR SYSTEM_TIME AS OF
>> X".
>> > It looks like we wouldn't be able to make use of this using
>> > ProjectableFilterableTable, at least this is my experience thus 

Re: Advise on ClassCastException in Linq4j$EnumeratorIterator

2020-04-27 Thread Julian Hyde
In the Enumerable convention, values of SQL datatype TIMESTAMP are represented 
using Java values of type java.lang.Long.

Not sure exactly what you’re doing wrong, but your query (or other code) that 
gets the values into Java should be getting them as Long, not as 
java.sql.Timestamp values. This issue has occurred before, and a search of JIRA 
or the dev@ archive might yield results.

Julian




> On Apr 27, 2020, at 6:49 AM, Ayelet Morris  
> wrote:
> 
> Hi Calcite Developers,
> Hope you are all doing well at this crazy time.
> 
> I have a question regarding Linq4j$EnumeratorIterator
> throwing ClassCastException.
> *Background*: I'm running a Tableau testing framework called TDVT (it's in
> beta) in order to test our product's JDBC connector.  Our product uses
> Calcite 1.21.0 and Avatics 1.15.0.
> 
> I encountered this class cast exception when running timestamp related sql
> functions (I have about 127 different SQLs failing with the same class cast
> exception).
> *My SQL**:* select DAYOFMONTH(datetime0) from Calcs
> datetime0 is a java.sql.Timestamp field in my POJO, the data in the CSV I'm
> using to fill in the POJO is written like this: '2004-07-23 21:13:37'
> *The exception:*
> ClassCastException: java.sql.Timestamp cannot be cast to java.lang.Long
> at Baz$1$1.current(Unknown Source)
> at org.apache.calcite.linq4j.Linq4j$EnumeratorIterator.next(Linq4j.java:683)
> at
> org.apache.calcite.avatica.util.IteratorCursor.next(IteratorCursor.java:46)
> at
> org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
> at com.gigaspaces.jdbc.TimestampTest.test(TimestampTest.java:64)
> 
> This exception is thrown when trying to perform "next()"
> I cannot debug the generated code... I just see it being thrown there.
> 
> I looked into the generated code for performing this (and similar) sql
> function and couldn't find a specific time format that is required, but in
> the documentation I saw the use of java.sql.Date as the function type, I
> tried to run my test with Date field I have in my POJO and it did return a
> correct result.
> I suspect it might not be compatible with Timestamp data type, though I saw
> in your tests you do test with timestamp '2008-1-23 12:12:12', I didn't try
> to run the test in the DruidAdapter suite but I saw that you are using a
> timestamp based column there.
> 
> Do you know of a reason for this exception at this location? Did you see
> this exception before at this location? Do you have any idea where I should
> look for further information/clues?
> 
> Thanks,
> Ayelet



Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-27 Thread Julian Hyde
Welcome Vineet! Thanks for your contributions so far.

> On Apr 26, 2020, at 2:38 PM, Vineet G  wrote:
> 
> Thanks a lot guys!
> 
> Just to briefly introduce myself - I work with Cloudera (Hortonworks before) 
> on Hive and I am a Hive PMC member. As Stamatis noted I have been involved in 
> calcite since 2017. It is great honor to be part of this community. I am very 
> excited to become committer and I look forward to contributing more.
> 
> Regards,
> Vineet Garg
> 
>> On Apr 26, 2020, at 2:26 PM, Jesus Camacho Rodriguez  
>> wrote:
>> 
>> Congrats Vineet, well deserved!
>> 
>> -Jesús
>> 
>> On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu  wrote:
>> 
>>> Congratulations, Vineet!
>>> 
>>> Best,
>>> Leonard Xu
 在 2020年4月26日,18:07,xu  写道:
 
 Congrats, Vineet!
 
 Danny Chan  于2020年4月26日周日 下午4:52写道:
 
> Congrats, Vineet!
> 
> Best,
> Danny Chan
> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
>> 
>> Congrats, Vineet!
> 
 
 
 --
 
 Best regards,
 
 Xu
>>> 
>>> 
> 



Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Kevin Risden
We should be able to clear the Travis cache by ourselves.

If you have your github and asf accounts linked:

https://travis-ci.org/github/apache/calcite/caches

That page is linked under the more options in the top right of the Calcite
page. You should be able to clear all or some subset of caches.

Kevin Risden


On Mon, Apr 27, 2020 at 8:53 AM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> > Log an issue there ?
>
> An issue for INFRA project would probably do:
> https://issues.apache.org/jira/projects/INFRA
>
> Vladimir
>


Re: [DISCUSS] Deprecate grouped window functions

2020-04-27 Thread Julian Hyde
Changing my +1 to +0. We have to make reasonable accommodations for our users. 
Glad we had this discussion.

> On Apr 24, 2020, at 11:10 AM, Rui Wang  wrote:
> 
> Hi Timo,
> 
> My intention is to fully drop concepts such as SqlGroupedWindowFunction and
> auxiliary group functions, which include relevant code in parser/syntax,
> operator, planner, etc.
> 
> Since you mentioned the need for more time to migrate. How many Calcite
> releases that you think can probably leave enough buffer time? (Calcite
> schedules 4 releases a year. So say 2 releases will give 6 months)
> 
> 
> -Rui
> 
> On Fri, Apr 24, 2020 at 1:50 AM Timo Walther  wrote:
> 
>> Hi everyone,
>> 
>> so far Apache Flink depends on this feature. We are fine with improving
>> the SQL compliance and eventually dropping GROUP BY TUMBLE/HOP/SESSION
>> in the future. However, we would like to give our users some time to
>> migrate their existing pipelines.
>> 
>> What does dropping mean for Calcite? Will users of Calcite be able to
>> still support this syntax? In particular, are you intending to also drop
>> concepts such as SqlGroupedWindowFunction and auxiliary group functions?
>> Or are you intending to just remove entries from Calcite's default
>> operator table?
>> 
>> Regards,
>> Timo
>> 
>> 
>> On 24.04.20 10:30, Julian Hyde wrote:
>>> +1
>>> 
>>> Let’s remove TUMBLE etc from the GROUP BY clause. Since this is a SQL
>> change, not an API change, I don’t we need to give notice. Let’s just do it.
>>> 
>>> Julian
>>> 
 On Apr 22, 2020, at 4:05 PM, Rui Wang  wrote:
 
 Made a mistake on the example above, and update it as follows:
 
 // Table function windowing syntax.
 SELECT
product_id, count(*), window_start
 FROM TABLE(TUMBLE(order, DESCRIPTOR(rowtime), INTERVAL '1' hour))
 GROUP BY product_id, window_start
 
> On Wed, Apr 22, 2020 at 2:31 PM Rui Wang  wrote:
> 
> Hi community,
> 
> I want to kick off a discussion about deprecating grouped window
>> functions
> (GROUP BY TUMBLE/HOP/SESSION) as the table function windowing support
> becomes a thing [1] (FROM TABLE(TUMBLE/HOP/SESSION)). The current
>> stage of
> table function windowing is TUMBLE support is checked in. HOP and
>> SESSION
> support is likely to be merged in 1.23.0.
> 
> A briefly example of two different windowing syntax:
> 
> // Grouped window functions.
> SELECT
>   product_id, count(*), TUMBLE_START() as window_start
> FROM order
> GROUP BY product_id, TUMBLE(rowtime, INTERVAL '1' hour); // an hour
>> long
> fixed window size.
> 
> // Table function windowing syntax.
> SELECT
>product_id, count(*), window_start
> FROM TABLE(TUMBLE(order, DESCRIPTOR(.rowtime), INTERVAL '1' hour)
> GROUP BY product_id
> 
> I am giving a short, selective comparison as the following:
> 
> The places that table function windowing behaves better
> 1) no GROUPING/GROUP BY enforced. It becomes a problem in streaming
>> JOIN.
> For example, one use case is for each hour, apply a JOIN on two
>> streams. In
> this case, no GROUP BY is needed.
> 2) grouped window functions allow multiple calls in GROUP BY. For
>> example,
> from SQL syntax perspective, GROUP BY TUMBLE(...), HOP(...),
>> SESSION(...)
> is not wrong, but it is an illegal query.
> 3) Calcite includes an Enumerable implementation of table function
> windowing, while grouped window functions do not have that.
> 
> 
> The places that table function windowing behaves worse
> 1) table function windowing adds "window_start", "window_end" into
>> table
> directly, which increases the volume of data (number of rows *
> sizeof(timestamp) * 2).
> 
> 
> I want to focus on discussing two questions in this thread:
> 1) Do people support deprecating grouped window functions?
> 2) By which version people prefer to make grouped window functions
> completely removed?(if 1) is yes).
> 
> 
> 
> [1]: https://jira.apache.org/jira/browse/CALCITE-3271
> 
> 
> -Rui
> 
>> 
>> 



Re: How to give type in case of type mismatch while doing union

2020-04-27 Thread Julian Hyde
Anjali,

If you’re using RelBuilder to create the union, or creating the union manually, 
it is your responsibility to make sure that the input RelNodes have compatible 
types.

RelDataTypeFactory.leastRestrictive(List) may be useful.

Julian


> On Apr 27, 2020, at 1:32 PM, Rui Wang  wrote:
> 
> Did a quick test by running a SQL query that has UNION on two different
> types. Validator gave a correct error message (not NPE) to remind type
> mismatch.
> 
> Agreed with Jin, could you provide more context/example how you reach the
> NPE? (It could be better if you can file a Jira with your context).
> 
> 
> -Rui
> 
> On Mon, Apr 27, 2020 at 4:35 AM XING JIN  wrote:
> 
>> Hi, Anjali ~
>> Are you doing the UNION by Sql ? If so, can you give the Sql content ?
>> Are you doing the UNION on RelNodes ?, If so, you need to do type CAST.
>> 
>> Jin
>> 
>> Anjali Shrishrimal  于2020年4月27日周一
>> 下午4:25写道:
>> 
>>> Hi,
>>> 
>>> While doing union of 2 RelNodes with different types, I am getting NPE.
>> (I
>>> am using calcite 1.21.0)
>>> java.lang.NullPointerException: at index 0
>>>at
>>> 
>> com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:225)
>>>at
>>> 
>> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:215)
>>>at
>>> 
>> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:209)
>>>at
>>> com.google.common.collect.ImmutableList.construct(ImmutableList.java:346)
>>>at
>>> com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
>>>at
>>> 
>> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.canonize(RelDataTypeFactoryImpl.java:373)
>>>at
>>> 
>> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:155)
>>>at
>>> 
>> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:146)
>>>at
>>> 
>> org.apache.calcite.rel.type.RelDataTypeFactory$Builder.build(RelDataTypeFactory.java:569)
>>>at
>>> 
>> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.leastRestrictiveStructuredType(RelDataTypeFactoryImpl.java:257)
>>>at
>>> 
>> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictiveSqlType(SqlTypeFactoryImpl.java:285)
>>>at
>>> 
>> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictive(SqlTypeFactoryImpl.java:156)
>>>at
>>> org.apache.calcite.rel.core.SetOp.deriveRowType(SetOp.java:107)
>>> 
>>> If the column types (family types) are different, currently the derived
>>> type is null. Is there any way to control that?
>>> Where can I define the type in case of mismatch ?
>>> 
>>> 
>>> Thank you,
>>> Anjali Shrishrimal
>>> 
>> 



Re: How to give type in case of type mismatch while doing union

2020-04-27 Thread Rui Wang
Did a quick test by running a SQL query that has UNION on two different
types. Validator gave a correct error message (not NPE) to remind type
mismatch.

Agreed with Jin, could you provide more context/example how you reach the
NPE? (It could be better if you can file a Jira with your context).


-Rui

On Mon, Apr 27, 2020 at 4:35 AM XING JIN  wrote:

> Hi, Anjali ~
> Are you doing the UNION by Sql ? If so, can you give the Sql content ?
> Are you doing the UNION on RelNodes ?, If so, you need to do type CAST.
>
> Jin
>
> Anjali Shrishrimal  于2020年4月27日周一
> 下午4:25写道:
>
> > Hi,
> >
> > While doing union of 2 RelNodes with different types, I am getting NPE.
> (I
> > am using calcite 1.21.0)
> > java.lang.NullPointerException: at index 0
> > at
> >
> com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:225)
> > at
> >
> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:215)
> > at
> >
> com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:209)
> > at
> > com.google.common.collect.ImmutableList.construct(ImmutableList.java:346)
> > at
> > com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:258)
> > at
> >
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.canonize(RelDataTypeFactoryImpl.java:373)
> > at
> >
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:155)
> > at
> >
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.createStructType(RelDataTypeFactoryImpl.java:146)
> > at
> >
> org.apache.calcite.rel.type.RelDataTypeFactory$Builder.build(RelDataTypeFactory.java:569)
> > at
> >
> org.apache.calcite.rel.type.RelDataTypeFactoryImpl.leastRestrictiveStructuredType(RelDataTypeFactoryImpl.java:257)
> > at
> >
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictiveSqlType(SqlTypeFactoryImpl.java:285)
> > at
> >
> org.apache.calcite.sql.type.SqlTypeFactoryImpl.leastRestrictive(SqlTypeFactoryImpl.java:156)
> > at
> > org.apache.calcite.rel.core.SetOp.deriveRowType(SetOp.java:107)
> >
> > If the column types (family types) are different, currently the derived
> > type is null. Is there any way to control that?
> > Where can I define the type in case of mismatch ?
> >
> >
> > Thank you,
> > Anjali Shrishrimal
> >
>


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
This thread has almost gotten too long to respond to. I confess I’ve not read 
much of it. I’m going to reply anyway. Sorry.

I support making Calcite’s optimizer support “Cascades”.

We should keep the existing VolcanoPlanner working during the transition, and 
perhaps longer. (I acknowledge that this will not be easy. But it will not be 
impossible, because we already have 2 planner engines, and going from 2 to 3 is 
much harder than going from 1 to 2.)

I perhaps have a different philosophy than Haisheng + Cascades on logical vs 
physical. I do believe that logical & physical rels and rules are more similar 
than they are different, therefore they should have the same classes, etc. 
Pragmatically, it makes a lot of sense to create rule sets and planner phases 
that deal with just logical rels, or just physical rels. But that’s a best 
practice, not a rule.

This philosophy manifested in a discussion a couple of weeks ago about whether 
RelBuilder should be able to create physical rels. I still believe that it 
should.

Julian


> On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
> wrote:
> 
> Hi all,
> 
> Stamatis, Haisheng thank you very much for your feedback! I really
> appreciate it.
> 
>> If in the new planner we end up copy-pasting code then I guess it will be a 
>> bad idea.
> 
> Yes, there are some code duplication between Volcano planner and
> Cascades planner. I think I'll move it to some common superclass: either
> to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.
> 
>> Like that it will be easier for everybody to test and see if the changes 
>> make this better or worse. From a backward compatibility perspective it 
>> seems feasible to keep the new eatures configurable for a certain amount of 
>> time.
> 
> I was thinking about backward compatibility in this way: what if we can
> switch planner in tests by setting some flag (system property?)
> somewhere and check what happens with new planner if we replace Volcano
> with it. And if ever it turns out that the new planner passes all
> Volcano's tests and at the same time it works more efficiently, we can
> safely replace Volcano planner with Cascades planner.
> 
>> A design doc would definitely help, especially if it has a few end-to-end 
>> (from logical to physical plan) examples showing how the optimizer works
> at each step before/after the changes. This is actually what is usually
> missing in research papers that makes them hard to understand.
>> I am thinking some similar to the examples that Haisheng send in the first  
>> email but possibly a bit more detailed.
> 
> I agree, I'll add some exmples to the design doc very soon.
> 
>> I looked very briefly in the PR by Roman but I think I didn't see tests 
>> where the final plan contains operators from multiple conventions. Multiple 
>> conventions is among the choices that complicate certain parts of he 
>> existing planner so we should make sure that we take this into account.
> 
> It's on my radar. I'm going to add these tests.
> 
> I would like to ask the commutnity about my next steps on the way from
> transition of the Cascades planner from the prototype status to becoming
> a part of the project. I see these steps like this:
> 
> 1. Create a jira ticket.
> 2. Update design document with examples.
> 3. Make some research to obtain backward compatibility with Volcano
> planner to be able to replace Volcano planner with Cascades planner in
> test and to understand the current porblems with planner.
> 4. Solve known problems
>  - materialized views
>  - hints
>  - multiple convetions
>  - listener hooks
>  - problems from p.3.
> 5. new PR, review and merge.
> 6. Replacie Volcano planner wiith Cascades after several releases.
> 
> What do you think about this roadmap?
> 
> 
> -- 
> Kind Regards
> Roman Kondakov
> 
> 
> On 27.04.2020 01:55, Stamatis Zampetakis wrote:
>> Hi all,
>> 
>> I am very excited about the ideas discussed so far and especially by the
>> enthusiasm of many people that are ready to help for pulling this out.
>> I wouldn't except that we could have a prototype so quickly.
>> Thanks a lot everyone!
>> 
>> In the debate between creating new planner or patching the existing one, I
>> don't have a clear preference.
>> I think the answer depends on how many things can we reuse.
>> If in the new planner we end up copy-pasting code then I guess it will be a
>> bad idea.
>> On the other hand, if the new and old planner do not have many things in
>> common then I guess the answer is obvious.
>> 
>> From Haisheng's description, I was thinking that many of the proposed
>> changes could go in the existing planner.
>> Like that it will be easier for everybody to test and see if the changes
>> make this better or worse.
>> From a backward compatibility perspective it seems feasible to keep the new
>> features configurable for a certain amount of time.
>> 
>> From the wish-list, I think we should focus initially on points:
>> 1. Top-down trait request
>> 2. 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
Re 1. By all means have multiple instances of a rule (e.g. one instance that 
matches LogicalFilter and another that matches FooFilter) and enable different 
instances during different phases. (We have been slow to create all of these 
variant instances, in part because of the difficulty of changing the 
constructors of many existing rules. My proposed change 
https://issues.apache.org/jira/browse/CALCITE-3923 
 would make it easier to 
create new instances that are more precisely targeted.)

Re 2. Sure, there are bugs.

Re 3. That’s a good idea. The planner could have a "single-convention mode", 
which might be faster, but would throw if it encountered a rel in a different 
convention.

I think we’re in agreement - separating rules is a best practice. But we 
shouldn’t force that best practice on everyone. The multi-convention case is 
still crucial for planning hybrid queries (e.g. joining MySQL to MongoDB).

Julian


> On Apr 27, 2020, at 4:28 PM, Xiening Dai  wrote:
> 
> Hi Julian,
> 
> In my view, separating logic and physical rules have a number of benefits -
> 
> 1. With current design, a rule can match both physical and logical nodes. 
> This behavior could cause duplication of rule firings and explosion of memo 
> and search space. There was a long discussion regarding this (CALCITE-2223). 
> Although the indefinitely rule matching problem is fixed by a separate 
> change, the duplicate rule firing is not resolved. There's a patch trying to 
> address it (https://github.com/apache/calcite/pull/1543 
> ), but it still fell short due 
> to current design limitation. 
> 
> 2. We have a few meta inconsistency issues today which are due to the reality 
> that we don’t clearly define transformation phase. For example, we don’t have 
> a solution for CALCITE-2166 as long as transformation can still apply to a 
> RelSet after it’s been implemented, which means the group logical properties 
> (such as row count) can still change and invalidate all the previous best 
> cost calculation, so best cost can increase (oops!).
> 
> 3. The other benefit is the planner can choose shortcut a number of expensive 
> operations (such as RelSubSet registration, cost calculation and propagation, 
> etc) during transformation phase, if it was clearly defined and enforced by 
> framework.
> 
> 
>> On Apr 27, 2020, at 11:59 AM, Julian Hyde  wrote:
>> 
>> This thread has almost gotten too long to respond to. I confess I’ve not 
>> read much of it. I’m going to reply anyway. Sorry.
>> 
>> I support making Calcite’s optimizer support “Cascades”.
>> 
>> We should keep the existing VolcanoPlanner working during the transition, 
>> and perhaps longer. (I acknowledge that this will not be easy. But it will 
>> not be impossible, because we already have 2 planner engines, and going from 
>> 2 to 3 is much harder than going from 1 to 2.)
>> 
>> I perhaps have a different philosophy than Haisheng + Cascades on logical vs 
>> physical. I do believe that logical & physical rels and rules are more 
>> similar than they are different, therefore they should have the same 
>> classes, etc. Pragmatically, it makes a lot of sense to create rule sets and 
>> planner phases that deal with just logical rels, or just physical rels. But 
>> that’s a best practice, not a rule.
>> 
>> This philosophy manifested in a discussion a couple of weeks ago about 
>> whether RelBuilder should be able to create physical rels. I still believe 
>> that it should.
>> 
>> Julian
>> 
>> 
>>> On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
>>> wrote:
>>> 
>>> Hi all,
>>> 
>>> Stamatis, Haisheng thank you very much for your feedback! I really
>>> appreciate it.
>>> 
 If in the new planner we end up copy-pasting code then I guess it will be 
 a bad idea.
>>> 
>>> Yes, there are some code duplication between Volcano planner and
>>> Cascades planner. I think I'll move it to some common superclass: either
>>> to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.
>>> 
 Like that it will be easier for everybody to test and see if the changes 
 make this better or worse. From a backward compatibility perspective it 
 seems feasible to keep the new eatures configurable for a certain amount 
 of time.
>>> 
>>> I was thinking about backward compatibility in this way: what if we can
>>> switch planner in tests by setting some flag (system property?)
>>> somewhere and check what happens with new planner if we replace Volcano
>>> with it. And if ever it turns out that the new planner passes all
>>> Volcano's tests and at the same time it works more efficiently, we can
>>> safely replace Volcano planner with Cascades planner.
>>> 
 A design doc would definitely help, especially if it has a few end-to-end 
 (from logical to physical plan) examples showing how the optimizer works
>>> at each step before/after the changes. This is actually 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
Hi Roman,

First, thank you for sharing your design and prototype code. I took a quick 
look at your design and have some high level feedback -

1. Do we really need RelGroup and RelSubGroup? I believe the memo structure 
would be largely the same even if we move towards a Cascade planner. I think we 
can reuse most of the RelSet/RelSubset today. RelSubset is a tricky one. 
There’s no corresponding concept in the Cascade framework. But we do need a way 
to track the RelNode physical traits anyway. A RelSubset is more like a logical 
grouping of the RelNodes with same trait set, which I think is still valuable.

2. Keeping backward compatibility is a key goal. We have to think through this 
at the very beginning of the design. The community has invested a lot with 
Volcano planner over the years. We cannot ask people to rewrite their rules 
just to make them work with the new planner. We should expect current rules 
would just work, or would work with “minimal” changes.

3. There are some building blocks that are currently missing which will prevent 
us from using the Cascade top down search strategy. For example, the way how 
rule is matched, the lack of transformation phase, the lack of the ability to 
create physical nodes by the framework, etc. If we don’t care about current 
rules, and start from scratch, we probably don’t need to fix these issues. But 
with #2 in mind, we have to work out solutions for these so we can carry over 
existing rules.

I think your execution plan looks good overall. We can iterate on the design 
while working out a document.

For the purpose of transparency, I’ve been working with Haisheng in the last 
few months regarding this topics. I agree with him on most parts of his 
roadmap, which I think it’s a tangible plan to evolve current Volcano planner.


> On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
> wrote:
> 
> Hi all,
> 
> Stamatis, Haisheng thank you very much for your feedback! I really
> appreciate it.
> 
>> If in the new planner we end up copy-pasting code then I guess it will be a 
>> bad idea.
> 
> Yes, there are some code duplication between Volcano planner and
> Cascades planner. I think I'll move it to some common superclass: either
> to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.
> 
>> Like that it will be easier for everybody to test and see if the changes 
>> make this better or worse. From a backward compatibility perspective it 
>> seems feasible to keep the new eatures configurable for a certain amount of 
>> time.
> 
> I was thinking about backward compatibility in this way: what if we can
> switch planner in tests by setting some flag (system property?)
> somewhere and check what happens with new planner if we replace Volcano
> with it. And if ever it turns out that the new planner passes all
> Volcano's tests and at the same time it works more efficiently, we can
> safely replace Volcano planner with Cascades planner.
> 
>> A design doc would definitely help, especially if it has a few end-to-end 
>> (from logical to physical plan) examples showing how the optimizer works
> at each step before/after the changes. This is actually what is usually
> missing in research papers that makes them hard to understand.
>> I am thinking some similar to the examples that Haisheng send in the first  
>> email but possibly a bit more detailed.
> 
> I agree, I'll add some exmples to the design doc very soon.
> 
>> I looked very briefly in the PR by Roman but I think I didn't see tests 
>> where the final plan contains operators from multiple conventions. Multiple 
>> conventions is among the choices that complicate certain parts of he 
>> existing planner so we should make sure that we take this into account.
> 
> It's on my radar. I'm going to add these tests.
> 
> I would like to ask the commutnity about my next steps on the way from
> transition of the Cascades planner from the prototype status to becoming
> a part of the project. I see these steps like this:
> 
> 1. Create a jira ticket.
> 2. Update design document with examples.
> 3. Make some research to obtain backward compatibility with Volcano
> planner to be able to replace Volcano planner with Cascades planner in
> test and to understand the current porblems with planner.
> 4. Solve known problems
>  - materialized views
>  - hints
>  - multiple convetions
>  - listener hooks
>  - problems from p.3.
> 5. new PR, review and merge.
> 6. Replacie Volcano planner wiith Cascades after several releases.
> 
> What do you think about this roadmap?
> 
> 
> -- 
> Kind Regards
> Roman Kondakov
> 
> 
> On 27.04.2020 01:55, Stamatis Zampetakis wrote:
>> Hi all,
>> 
>> I am very excited about the ideas discussed so far and especially by the
>> enthusiasm of many people that are ready to help for pulling this out.
>> I wouldn't except that we could have a prototype so quickly.
>> Thanks a lot everyone!
>> 
>> In the debate between creating new planner or patching the existing one, I
>> don't have a 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
Hi Julian,

In my view, separating logic and physical rules have a number of benefits -

1. With current design, a rule can match both physical and logical nodes. This 
behavior could cause duplication of rule firings and explosion of memo and 
search space. There was a long discussion regarding this (CALCITE-2223). 
Although the indefinitely rule matching problem is fixed by a separate change, 
the duplicate rule firing is not resolved. There's a patch trying to address it 
(https://github.com/apache/calcite/pull/1543 
), but it still fell short due to 
current design limitation. 

2. We have a few meta inconsistency issues today which are due to the reality 
that we don’t clearly define transformation phase. For example, we don’t have a 
solution for CALCITE-2166 as long as transformation can still apply to a RelSet 
after it’s been implemented, which means the group logical properties (such as 
row count) can still change and invalidate all the previous best cost 
calculation, so best cost can increase (oops!).

3. The other benefit is the planner can choose shortcut a number of expensive 
operations (such as RelSubSet registration, cost calculation and propagation, 
etc) during transformation phase, if it was clearly defined and enforced by 
framework.


> On Apr 27, 2020, at 11:59 AM, Julian Hyde  wrote:
> 
> This thread has almost gotten too long to respond to. I confess I’ve not read 
> much of it. I’m going to reply anyway. Sorry.
> 
> I support making Calcite’s optimizer support “Cascades”.
> 
> We should keep the existing VolcanoPlanner working during the transition, and 
> perhaps longer. (I acknowledge that this will not be easy. But it will not be 
> impossible, because we already have 2 planner engines, and going from 2 to 3 
> is much harder than going from 1 to 2.)
> 
> I perhaps have a different philosophy than Haisheng + Cascades on logical vs 
> physical. I do believe that logical & physical rels and rules are more 
> similar than they are different, therefore they should have the same classes, 
> etc. Pragmatically, it makes a lot of sense to create rule sets and planner 
> phases that deal with just logical rels, or just physical rels. But that’s a 
> best practice, not a rule.
> 
> This philosophy manifested in a discussion a couple of weeks ago about 
> whether RelBuilder should be able to create physical rels. I still believe 
> that it should.
> 
> Julian
> 
> 
>> On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
>> wrote:
>> 
>> Hi all,
>> 
>> Stamatis, Haisheng thank you very much for your feedback! I really
>> appreciate it.
>> 
>>> If in the new planner we end up copy-pasting code then I guess it will be a 
>>> bad idea.
>> 
>> Yes, there are some code duplication between Volcano planner and
>> Cascades planner. I think I'll move it to some common superclass: either
>> to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.
>> 
>>> Like that it will be easier for everybody to test and see if the changes 
>>> make this better or worse. From a backward compatibility perspective it 
>>> seems feasible to keep the new eatures configurable for a certain amount of 
>>> time.
>> 
>> I was thinking about backward compatibility in this way: what if we can
>> switch planner in tests by setting some flag (system property?)
>> somewhere and check what happens with new planner if we replace Volcano
>> with it. And if ever it turns out that the new planner passes all
>> Volcano's tests and at the same time it works more efficiently, we can
>> safely replace Volcano planner with Cascades planner.
>> 
>>> A design doc would definitely help, especially if it has a few end-to-end 
>>> (from logical to physical plan) examples showing how the optimizer works
>> at each step before/after the changes. This is actually what is usually
>> missing in research papers that makes them hard to understand.
>>> I am thinking some similar to the examples that Haisheng send in the first  
>>> email but possibly a bit more detailed.
>> 
>> I agree, I'll add some exmples to the design doc very soon.
>> 
>>> I looked very briefly in the PR by Roman but I think I didn't see tests 
>>> where the final plan contains operators from multiple conventions. Multiple 
>>> conventions is among the choices that complicate certain parts of he 
>>> existing planner so we should make sure that we take this into account.
>> 
>> It's on my radar. I'm going to add these tests.
>> 
>> I would like to ask the commutnity about my next steps on the way from
>> transition of the Cascades planner from the prototype status to becoming
>> a part of the project. I see these steps like this:
>> 
>> 1. Create a jira ticket.
>> 2. Update design document with examples.
>> 3. Make some research to obtain backward compatibility with Volcano
>> planner to be able to replace Volcano planner with Cascades planner in
>> test and to understand the current porblems with planner.
>> 4. Solve 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
PS Someone mentioned that logical properties do not propagate across RelSubsets 
in the same RelSet. That is a bug, and we should fix it.

For example, if subset#1 has determined that MinRowCount=1 then subset#2 in the 
same set should also inherit that MinRowCount. The same goes for other logical 
properties, such as RelMdPredicates. The benefits to sharing properties among 
subsets are very significant.

We might need to formalize what we mean by “logical properties”. It would not 
be valid to propagate RelMdCollation between two subsets that have a different 
RelCollation property.

Julian


> On Apr 27, 2020, at 4:59 PM, Julian Hyde  wrote:
> 
> Re 1. By all means have multiple instances of a rule (e.g. one instance that 
> matches LogicalFilter and another that matches FooFilter) and enable 
> different instances during different phases. (We have been slow to create all 
> of these variant instances, in part because of the difficulty of changing the 
> constructors of many existing rules. My proposed change 
> https://issues.apache.org/jira/browse/CALCITE-3923 
>  would make it easier to 
> create new instances that are more precisely targeted.)
> 
> Re 2. Sure, there are bugs.
> 
> Re 3. That’s a good idea. The planner could have a "single-convention mode", 
> which might be faster, but would throw if it encountered a rel in a different 
> convention.
> 
> I think we’re in agreement - separating rules is a best practice. But we 
> shouldn’t force that best practice on everyone. The multi-convention case is 
> still crucial for planning hybrid queries (e.g. joining MySQL to MongoDB).
> 
> Julian
> 
> 
>> On Apr 27, 2020, at 4:28 PM, Xiening Dai > > wrote:
>> 
>> Hi Julian,
>> 
>> In my view, separating logic and physical rules have a number of benefits -
>> 
>> 1. With current design, a rule can match both physical and logical nodes. 
>> This behavior could cause duplication of rule firings and explosion of memo 
>> and search space. There was a long discussion regarding this (CALCITE-2223). 
>> Although the indefinitely rule matching problem is fixed by a separate 
>> change, the duplicate rule firing is not resolved. There's a patch trying to 
>> address it (https://github.com/apache/calcite/pull/1543 
>>  
>> > >), but it still fell short due 
>> to current design limitation. 
>> 
>> 2. We have a few meta inconsistency issues today which are due to the 
>> reality that we don’t clearly define transformation phase. For example, we 
>> don’t have a solution for CALCITE-2166 as long as transformation can still 
>> apply to a RelSet after it’s been implemented, which means the group logical 
>> properties (such as row count) can still change and invalidate all the 
>> previous best cost calculation, so best cost can increase (oops!).
>> 
>> 3. The other benefit is the planner can choose shortcut a number of 
>> expensive operations (such as RelSubSet registration, cost calculation and 
>> propagation, etc) during transformation phase, if it was clearly defined and 
>> enforced by framework.
>> 
>> 
>>> On Apr 27, 2020, at 11:59 AM, Julian Hyde >> > wrote:
>>> 
>>> This thread has almost gotten too long to respond to. I confess I’ve not 
>>> read much of it. I’m going to reply anyway. Sorry.
>>> 
>>> I support making Calcite’s optimizer support “Cascades”.
>>> 
>>> We should keep the existing VolcanoPlanner working during the transition, 
>>> and perhaps longer. (I acknowledge that this will not be easy. But it will 
>>> not be impossible, because we already have 2 planner engines, and going 
>>> from 2 to 3 is much harder than going from 1 to 2.)
>>> 
>>> I perhaps have a different philosophy than Haisheng + Cascades on logical 
>>> vs physical. I do believe that logical & physical rels and rules are more 
>>> similar than they are different, therefore they should have the same 
>>> classes, etc. Pragmatically, it makes a lot of sense to create rule sets 
>>> and planner phases that deal with just logical rels, or just physical rels. 
>>> But that’s a best practice, not a rule.
>>> 
>>> This philosophy manifested in a discussion a couple of weeks ago about 
>>> whether RelBuilder should be able to create physical rels. I still believe 
>>> that it should.
>>> 
>>> Julian
>>> 
>>> 
 On Apr 27, 2020, at 11:29 AM, Roman Kondakov >>> > wrote:
 
 Hi all,
 
 Stamatis, Haisheng thank you very much for your feedback! I really
 appreciate it.
 
> If in the new planner we end up copy-pasting code then I guess it will be 
> a bad idea.
 
 Yes, there are some code duplication between Volcano planner and
 Cascades planner. I think I'll move it to some common superclass: either
 to existing 

Re: [Tests Failing] Master Travis test fails continuously for JDK14

2020-04-27 Thread Michael Mior
I have my accounts linked, but I see " Your permissions are
insufficient to access this content."
--
Michael Mior
mm...@apache.org

Le lun. 27 avr. 2020 à 13:10, Kevin Risden  a écrit :
>
> We should be able to clear the Travis cache by ourselves.
>
> If you have your github and asf accounts linked:
>
> https://travis-ci.org/github/apache/calcite/caches
>
> That page is linked under the more options in the top right of the Calcite
> page. You should be able to clear all or some subset of caches.
>
> Kevin Risden
>
>
> On Mon, Apr 27, 2020 at 8:53 AM Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com> wrote:
>
> > > Log an issue there ?
> >
> > An issue for INFRA project would probably do:
> > https://issues.apache.org/jira/projects/INFRA
> >
> > Vladimir
> >


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
For #1, aside from that we need to be able to build physical nodes based on a 
convention. For example, if we merge two EnumerableProject, we would want to 
create an EnumerableProject as a result, instead of LogicalProject. The 
RelBuilder change I work on would help this case.

For #2, I don’t think it’s just a bug. If the physical cost cannot be reliable 
before transformation is finished, we should probably delay the physical cost 
calculation, or we risk doing it over again. The other way is to complete 
RelSet transformation before implementing it - which is a common practice in 
industry, including Orca.

The multi-convention is a key scenario, and I agree we should support. My 
thinking is more about seperating logical one (Conventions.NONE) from others.


> On Apr 27, 2020, at 4:59 PM, Julian Hyde  wrote:
> 
> Re 1. By all means have multiple instances of a rule (e.g. one instance that 
> matches LogicalFilter and another that matches FooFilter) and enable 
> different instances during different phases. (We have been slow to create all 
> of these variant instances, in part because of the difficulty of changing the 
> constructors of many existing rules. My proposed change 
> https://issues.apache.org/jira/browse/CALCITE-3923 
>  would make it easier to 
> create new instances that are more precisely targeted.)
> 
> Re 2. Sure, there are bugs.
> 
> Re 3. That’s a good idea. The planner could have a "single-convention mode", 
> which might be faster, but would throw if it encountered a rel in a different 
> convention.
> 
> I think we’re in agreement - separating rules is a best practice. But we 
> shouldn’t force that best practice on everyone. The multi-convention case is 
> still crucial for planning hybrid queries (e.g. joining MySQL to MongoDB).
> 
> Julian
> 
> 
>> On Apr 27, 2020, at 4:28 PM, Xiening Dai  wrote:
>> 
>> Hi Julian,
>> 
>> In my view, separating logic and physical rules have a number of benefits -
>> 
>> 1. With current design, a rule can match both physical and logical nodes. 
>> This behavior could cause duplication of rule firings and explosion of memo 
>> and search space. There was a long discussion regarding this (CALCITE-2223). 
>> Although the indefinitely rule matching problem is fixed by a separate 
>> change, the duplicate rule firing is not resolved. There's a patch trying to 
>> address it (https://github.com/apache/calcite/pull/1543 
>> ), but it still fell short due 
>> to current design limitation. 
>> 
>> 2. We have a few meta inconsistency issues today which are due to the 
>> reality that we don’t clearly define transformation phase. For example, we 
>> don’t have a solution for CALCITE-2166 as long as transformation can still 
>> apply to a RelSet after it’s been implemented, which means the group logical 
>> properties (such as row count) can still change and invalidate all the 
>> previous best cost calculation, so best cost can increase (oops!).
>> 
>> 3. The other benefit is the planner can choose shortcut a number of 
>> expensive operations (such as RelSubSet registration, cost calculation and 
>> propagation, etc) during transformation phase, if it was clearly defined and 
>> enforced by framework.
>> 
>> 
>>> On Apr 27, 2020, at 11:59 AM, Julian Hyde  wrote:
>>> 
>>> This thread has almost gotten too long to respond to. I confess I’ve not 
>>> read much of it. I’m going to reply anyway. Sorry.
>>> 
>>> I support making Calcite’s optimizer support “Cascades”.
>>> 
>>> We should keep the existing VolcanoPlanner working during the transition, 
>>> and perhaps longer. (I acknowledge that this will not be easy. But it will 
>>> not be impossible, because we already have 2 planner engines, and going 
>>> from 2 to 3 is much harder than going from 1 to 2.)
>>> 
>>> I perhaps have a different philosophy than Haisheng + Cascades on logical 
>>> vs physical. I do believe that logical & physical rels and rules are more 
>>> similar than they are different, therefore they should have the same 
>>> classes, etc. Pragmatically, it makes a lot of sense to create rule sets 
>>> and planner phases that deal with just logical rels, or just physical rels. 
>>> But that’s a best practice, not a rule.
>>> 
>>> This philosophy manifested in a discussion a couple of weeks ago about 
>>> whether RelBuilder should be able to create physical rels. I still believe 
>>> that it should.
>>> 
>>> Julian
>>> 
>>> 
 On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
 wrote:
 
 Hi all,
 
 Stamatis, Haisheng thank you very much for your feedback! I really
 appreciate it.
 
> If in the new planner we end up copy-pasting code then I guess it will be 
> a bad idea.
 
 Yes, there are some code duplication between Volcano planner and
 Cascades planner. I think I'll move it to some common superclass: either
 to existing AbstractRelOptPlanner 

Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-27 Thread Zoltán Haindrich
Congratulations Vineet!

On April 27, 2020 8:17:41 PM GMT+02:00, Rui Wang  wrote:
>Congrats Vineet!
>
>
>
>-Rui
>
>On Mon, Apr 27, 2020 at 11:16 AM Julian Hyde  wrote:
>
>> Welcome Vineet! Thanks for your contributions so far.
>>
>> > On Apr 26, 2020, at 2:38 PM, Vineet G 
>wrote:
>> >
>> > Thanks a lot guys!
>> >
>> > Just to briefly introduce myself - I work with Cloudera
>(Hortonworks
>> before) on Hive and I am a Hive PMC member. As Stamatis noted I have
>been
>> involved in calcite since 2017. It is great honor to be part of this
>> community. I am very excited to become committer and I look forward
>to
>> contributing more.
>> >
>> > Regards,
>> > Vineet Garg
>> >
>> >> On Apr 26, 2020, at 2:26 PM, Jesus Camacho Rodriguez <
>> jcama...@apache.org> wrote:
>> >>
>> >> Congrats Vineet, well deserved!
>> >>
>> >> -Jesús
>> >>
>> >> On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu 
>wrote:
>> >>
>> >>> Congratulations, Vineet!
>> >>>
>> >>> Best,
>> >>> Leonard Xu
>>  在 2020年4月26日,18:07,xu  写道:
>> 
>>  Congrats, Vineet!
>> 
>>  Danny Chan  于2020年4月26日周日 下午4:52写道:
>> 
>> > Congrats, Vineet!
>> >
>> > Best,
>> > Danny Chan
>> > 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
>> >>
>> >> Congrats, Vineet!
>> >
>> 
>> 
>>  --
>> 
>>  Best regards,
>> 
>>  Xu
>> >>>
>> >>>
>> >
>>
>>

-- 
Zoltán Haindrich