Re: Remove a specific type coersion rule

2024-02-22 Thread Gonzalo Ortiz Jaureguizar
> When I read CALCITE-6210 it wasn’t clear that you wanted Calcite to give
a validation error when someone tries to cast a VARCHAR to a VARBINARY. I
agree that that is the desirable behavior. (And I believe it is consistent
with the SQL standard.

In Apache Pinot we are still discussing the correct behavior here. We are
migrating from a mostly in-house and limited (no join, no window function,
etc) query engine to one that is in large part based on Apache Calcite and
our plan is to move more and more logic to the Calcite model instead of
ours. We are not very original and they are called V1 and V2 respectively.

In V1 we had our own logic to cast from String to Binary and in fact this
logic is not super clear and dependent on the content. In V2 we are trying
to be saner and we always try to be closer to other databases (usually
Postgres). So one of the ideas is to just embrace CALCITE-6210, but that
would mean that users that migrate from V1 to V2 would have to change their
queries. Otherwise they will receive incorrect values (because the
semantics will change). Therefore there is a safer second option which is
to forbid that cast and force users to either use binary literals
(`x'abcd') or our own function (`hexToBytes('abcd')`). The third option
would be to be able to modify calcite behavior to implement our own casting
logic, but I didn't have the feeling that CALCITE-6210 was going to open
that possibility.

> Can you amend the Jira case so that it clearly the goal?

As said, our goal is not clear yet. IMHO CALCITE-6210 is fine as it is
right now. At the end of the day right now there is a bug during
simplification. Whether that simplification semantics can be plugable or
not is not part of the issue, right?

> Reading the code, I see that
https://issues.apache.org/jira/browse/CALCITE-3550 was trying to provide a
way for people to change the coercion rules. Does that solution meet your
needs?

That is what I've tried, right? What I'm trying to report here is that some
parts of the code are just ignoring SqlTypeAssignmentRule. Specifically
AbstractTypeCoercion.commonTypeForBinaryComparison

 and AbstractTypeCoersion.implicitCast

assume
some castings are always going to be allowed and in fact completely
ignore assignment rules. AbstractTypeCoercion.needToCast

seems
to be correct, but that is not the code that is being called. So removed
type assignment rules may or may not be honored depending on the code path.

> I will amend the Jira case when we decide what the implementation should
do. I can change the corresponding PR to give an error instead, but that is
not a solution for the Pinot implementation either

As said above, I understand CALCITE-6210 as "there is a bug in this
casting, lets fix them". It would be great if we could make casting
semantics plugable, but that is not the reason why CALCITE-6210 was
open, right?

El vie, 23 feb 2024 a las 1:34, Mihai Budiu () escribió:

> When I filed the Jira issue I was planning to give an error, but I
> discovered that several dialects implement this cast, so my PR actually
> implements the behavior from Postgres.
>
> I will amend the Jira case when we decide what the implementation should
> do. I can change the corresponding PR to give an error instead, but that is
> not a solution for the Pinot implementation either. It would be great if
> [CALCITE-3550] provides a solution for them.
>
> Mihai
>
> 
> From: Julian Hyde 
> Sent: Thursday, February 22, 2024 4:16 PM
> To: dev@calcite.apache.org 
> Subject: Re: Remove a specific type coersion rule
>
> When I read CALCITE-6210 it wasn’t clear that you wanted Calcite to give a
> validation error when someone tries to cast a VARCHAR to a VARBINARY. I
> agree that that is the desirable behavior. (And I believe it is consistent
> with the SQL standard.) Can you amend the Jira case so that it clearly the
> goal?
>
> Reading the code, I see that
> https://issues.apache.org/jira/browse/CALCITE-3550 was trying to provide
> a way for people to change the coercion rules. Does that solution meet your
> needs?
>
> Julian
>
>
>
> > On Feb 22, 2024, at 6:59 AM, Gonzalo Ortiz Jaureguizar <
> golthir...@gmail.com> wrote:
> >
> > Hello there,
> >
> > In the context of https://issues.apache.org/jira/browse/CALCITE-6210,
> the
> > Apache Pinot team is thinking about forbidding casting from VARCHAR to
> > VARBINARY.
> >
> > I've been trying to implement that, but I'm not sure if it is possible or
> > not. Following the Javadoc of SqlTypeCoercionRule (which, btw

Re: about optimization rules for over(partition by id, age order by id, age)

2024-02-22 Thread Benchao Li
FWIW, FLINK-12575 is a mechanism that removes redundant shuffle/sort
in physical stage, and it's not a a generic rule, but it's in
AbstractConverter while we try to satisfy Distribution and Collation
trait.

I'm not aware of any existing rules/RelBuilder in Calcite that has
this optimization, it sounds like a good improvement.

key lou  于2024年2月23日周五 11:09写道:
>
> thanks  I  also found a rule in fink.
> https://issues.apache.org/jira/browse/FLINK-12575
>
> Julian Hyde  于2024年2月23日周五 09:04写道:
>
> > I noticed that DuckDB implemented such an optimization recently.
> >
> > Can you please log a jira case to remove redundant ORDER BY keys from
> > OVER? Yes, it would be either a planner rule or a rewrite by RelBuilder.
> >
> > Julian
> >
> >
> > > On Feb 21, 2024, at 11:43 PM, key lou  wrote:
> > >
> > > HI ALL :
> > >   I have the following sql:
> > > select name, row_number() over(partition by id, age order by id, age) rn
> > > from t; In fact, it can be optimized into
> > > select name ,row_number() over(partition by id,age) rn from t;
> > > Remove order by because order by and partition by are repeated. Since
> > they
> > > are grouped according to the partition by field, the sorting is
> > > meaningless. Is there such a solution in calcite? Optimize rules?
> > > thanks.
> >
> >



-- 

Best,
Benchao Li


Re: about optimization rules for over(partition by id, age order by id, age)

2024-02-22 Thread key lou
thanks  I  also found a rule in fink.
https://issues.apache.org/jira/browse/FLINK-12575

Julian Hyde  于2024年2月23日周五 09:04写道:

> I noticed that DuckDB implemented such an optimization recently.
>
> Can you please log a jira case to remove redundant ORDER BY keys from
> OVER? Yes, it would be either a planner rule or a rewrite by RelBuilder.
>
> Julian
>
>
> > On Feb 21, 2024, at 11:43 PM, key lou  wrote:
> >
> > HI ALL :
> >   I have the following sql:
> > select name, row_number() over(partition by id, age order by id, age) rn
> > from t; In fact, it can be optimized into
> > select name ,row_number() over(partition by id,age) rn from t;
> > Remove order by because order by and partition by are repeated. Since
> they
> > are grouped according to the partition by field, the sorting is
> > meaningless. Is there such a solution in calcite? Optimize rules?
> > thanks.
>
>


Re: Some questions about calcite

2024-02-22 Thread Ran Tao
Thanks cancai for opening this discussion.

One additional point, regarding your so-called multi-version implementation
of a certain library, such as Spark.
Personally, I think we can implement it according to the latest release
version.
There will be updates in the future, which can be aligned in calcite.
For example, ArrayInsert had a bug in early versions of Spark[1], and
calcite's implementation based on it was already wrong.

In fact, there are many Library functions in calcite, including
mysql/pg/spark, etc., and their implementations are outdated.
It is true that we have not discovered or planned to fix them.

I bring this up because one of these open source projects is always moving
forward, calcite also needs to synchronize their modifications.

[1] https://issues.apache.org/jira/browse/SPARK-45078

Best Regards,
Ran Tao


Cancai Cai  于2024年2月23日周五 10:10写道:

> Maybe I also overlooked some issues in the whole jira case
>
> Cancai Cai
>
> On 2024/02/23 02:01:53 Cancai Cai wrote:
> > I am still happy to implement them. As far as adapting the log2 function
> is
> > concerned, at present I may just not be able to guarantee that log2(0)
> > returns null, and I have encountered some stuck points here. But you can
> > see that I've raised two other jira issues [CALCITE-6259]
> > , [CALCITE-6277]
> >  about this, and I'm
> > going to want to fix the jira issue that I raised, because that's what I
> > found, and I should do it. This is why I keep saying that I can merge
> this
> > jira first. Regarding extreme scenarios, I will continue to adapt the
> type
> > of discourse in the next jira, because I am not sure whether my current
> > writing method is appropriate in the next jira.
> >
> > Cancai Cai  于2024年2月23日周五 09:50写道:
> >
> > > Hi, everyone
> > > Thank you very much for your answers. In fact, my doubts may not only
> be
> > > about adapting log2 (function), as I have discovered a lot when
> adapting
> > > log2 functions, such as log10(0) about the difference in return values
> ​​of
> > > mysql and postgres, about calcite Regarding the handling of
> infinities, I
> > > don't know why I haven't discovered them before. As clay said, calcite
> may
> > > have had its own set of standards before, but I may not be clear about
> > > this. I hope that calcite’s return value of the math function in
> certain
> > > calculation situations will meet the requirements of mathematics or
> SQL.
> > > rule. If mysql is lazy from the beginning, should we still follow it
> > > instead of just adapting to an extreme situation?
> > >
> > > Best wishes.
> > > Cancai Cai
> > >
> > > On 2024/02/23 00:45:13 Tanner Clary wrote:
> > > > I feel like I'm missing something about this whole issue. We have
> > > > implemented so many functions that there's probably an existing
> pattern
> > > for
> > > > just about any issue with dialect parity we encounter. What's the
> core
> > > > problem? What behavior is so difficult to emulate and why? Caican
> let me
> > > > know if you want to pair I'm happy to manage the edge cases if you
> want.
> > > >
> > > > Tanner
> > > >
> > > > On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> > > wrote:
> > > >
> > > > > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and
> its
> > > > > accompanying PR muddies the waters because it also mentions Spark,
> > > Postgres
> > > > > and “many databases”. The case should state that the function is
> > > consistent
> > > > > with MySQL and returns NULL if the argument is non-positive.
> > > > >
> > > > > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu 
> wrote:
> > > > > >
> > > > > > In the case of log2 it's simple, because the documentation says
> that
> > > it
> > > > > comes from the MySQL dialect. So there is a spec and a golden
> > > > > implementation to compare against.
> > > > > >
> > > > > > I certainly won't object to implementing a separate log2 function
> > > that
> > > > > is undefined for 0 and negative values (i.e., can return any value
> for
> > > such
> > > > > arguments), let's just not pretend it's the MySQL function.
> > > > > >
> > > > > > Mihai
> > > > > > 
> > > > > > From: Julian Hyde 
> > > > > > Sent: Thursday, February 22, 2024 4:05 PM
> > > > > > To: dev@calcite.apache.org 
> > > > > > Subject: Re: Some questions about calcite
> > > > > >
> > > > > > But what is the spec of the LOG2 function? It’s not in the SQL
> > > standard.
> > > > > So, we need to write our own spec. We can say that LOG2(0) returns
> 42,
> > > if
> > > > > we wish, and go implement our own spec.
> > > > > >
> > > > > > Yes, Calcite is a compiler, but it is also a standard library,
> and
> > > it is
> > > > > also an extended library. LOG2 is in the latter category. If you,
> as a
> > > > > vendor, don’t trust the implementation of LOG2 then you can
> exclude it
> > > from
> > > > > your distribution.
> > > > > >
> > > >

Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
Maybe I also overlooked some issues in the whole jira case

Cancai Cai

On 2024/02/23 02:01:53 Cancai Cai wrote:
> I am still happy to implement them. As far as adapting the log2 function is
> concerned, at present I may just not be able to guarantee that log2(0)
> returns null, and I have encountered some stuck points here. But you can
> see that I've raised two other jira issues [CALCITE-6259]
> , [CALCITE-6277]
>  about this, and I'm
> going to want to fix the jira issue that I raised, because that's what I
> found, and I should do it. This is why I keep saying that I can merge this
> jira first. Regarding extreme scenarios, I will continue to adapt the type
> of discourse in the next jira, because I am not sure whether my current
> writing method is appropriate in the next jira.
> 
> Cancai Cai  于2024年2月23日周五 09:50写道:
> 
> > Hi, everyone
> > Thank you very much for your answers. In fact, my doubts may not only be
> > about adapting log2 (function), as I have discovered a lot when adapting
> > log2 functions, such as log10(0) about the difference in return values ​​of
> > mysql and postgres, about calcite Regarding the handling of infinities, I
> > don't know why I haven't discovered them before. As clay said, calcite may
> > have had its own set of standards before, but I may not be clear about
> > this. I hope that calcite’s return value of the math function in certain
> > calculation situations will meet the requirements of mathematics or SQL.
> > rule. If mysql is lazy from the beginning, should we still follow it
> > instead of just adapting to an extreme situation?
> >
> > Best wishes.
> > Cancai Cai
> >
> > On 2024/02/23 00:45:13 Tanner Clary wrote:
> > > I feel like I'm missing something about this whole issue. We have
> > > implemented so many functions that there's probably an existing pattern
> > for
> > > just about any issue with dialect parity we encounter. What's the core
> > > problem? What behavior is so difficult to emulate and why? Caican let me
> > > know if you want to pair I'm happy to manage the edge cases if you want.
> > >
> > > Tanner
> > >
> > > On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> > wrote:
> > >
> > > > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > > > accompanying PR muddies the waters because it also mentions Spark,
> > Postgres
> > > > and “many databases”. The case should state that the function is
> > consistent
> > > > with MySQL and returns NULL if the argument is non-positive.
> > > >
> > > > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > > > >
> > > > > In the case of log2 it's simple, because the documentation says that
> > it
> > > > comes from the MySQL dialect. So there is a spec and a golden
> > > > implementation to compare against.
> > > > >
> > > > > I certainly won't object to implementing a separate log2 function
> > that
> > > > is undefined for 0 and negative values (i.e., can return any value for
> > such
> > > > arguments), let's just not pretend it's the MySQL function.
> > > > >
> > > > > Mihai
> > > > > 
> > > > > From: Julian Hyde 
> > > > > Sent: Thursday, February 22, 2024 4:05 PM
> > > > > To: dev@calcite.apache.org 
> > > > > Subject: Re: Some questions about calcite
> > > > >
> > > > > But what is the spec of the LOG2 function? It’s not in the SQL
> > standard.
> > > > So, we need to write our own spec. We can say that LOG2(0) returns 42,
> > if
> > > > we wish, and go implement our own spec.
> > > > >
> > > > > Yes, Calcite is a compiler, but it is also a standard library, and
> > it is
> > > > also an extended library. LOG2 is in the latter category. If you, as a
> > > > vendor, don’t trust the implementation of LOG2 then you can exclude it
> > from
> > > > your distribution.
> > > > >
> > > > > As an open source project we have to BOTH improve the quality of our
> > > > core and lower the barrier to contributions to the non-core code. We
> > have
> > > > to recognize that not everything is the same standard. And I think
> > vendors,
> > > > like your company, who want to deliver a high-quality experience
> > should put
> > > > barriers around what features are trusted.
> > > > >
> > > > > Julian
> > > > >
> > > > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > > > >>
> > > > >> If we can't even implement correctly the log2 function according to
> > its
> > > > spec, there is no hope that we will implement anything correctly.
> > > > >>
> > > > >> I am not a QA person, but I am spending more than 50% of my time
> > > > diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> > > > implement interesting new functionality. But I cannot tell a user of
> > our
> > > > tools "I have no idea whether the results you get using this tool will
> > be
> > > > correct. If you are lucky, they will be, don't worry about corner
> > cases."
> > > > Our goal is 

Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
I am still happy to implement them. As far as adapting the log2 function is
concerned, at present I may just not be able to guarantee that log2(0)
returns null, and I have encountered some stuck points here. But you can
see that I've raised two other jira issues [CALCITE-6259]
, [CALCITE-6277]
 about this, and I'm
going to want to fix the jira issue that I raised, because that's what I
found, and I should do it. This is why I keep saying that I can merge this
jira first. Regarding extreme scenarios, I will continue to adapt the type
of discourse in the next jira, because I am not sure whether my current
writing method is appropriate in the next jira.

Cancai Cai  于2024年2月23日周五 09:50写道:

> Hi, everyone
> Thank you very much for your answers. In fact, my doubts may not only be
> about adapting log2 (function), as I have discovered a lot when adapting
> log2 functions, such as log10(0) about the difference in return values ​​of
> mysql and postgres, about calcite Regarding the handling of infinities, I
> don't know why I haven't discovered them before. As clay said, calcite may
> have had its own set of standards before, but I may not be clear about
> this. I hope that calcite’s return value of the math function in certain
> calculation situations will meet the requirements of mathematics or SQL.
> rule. If mysql is lazy from the beginning, should we still follow it
> instead of just adapting to an extreme situation?
>
> Best wishes.
> Cancai Cai
>
> On 2024/02/23 00:45:13 Tanner Clary wrote:
> > I feel like I'm missing something about this whole issue. We have
> > implemented so many functions that there's probably an existing pattern
> for
> > just about any issue with dialect parity we encounter. What's the core
> > problem? What behavior is so difficult to emulate and why? Caican let me
> > know if you want to pair I'm happy to manage the edge cases if you want.
> >
> > Tanner
> >
> > On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> wrote:
> >
> > > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > > accompanying PR muddies the waters because it also mentions Spark,
> Postgres
> > > and “many databases”. The case should state that the function is
> consistent
> > > with MySQL and returns NULL if the argument is non-positive.
> > >
> > > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > > >
> > > > In the case of log2 it's simple, because the documentation says that
> it
> > > comes from the MySQL dialect. So there is a spec and a golden
> > > implementation to compare against.
> > > >
> > > > I certainly won't object to implementing a separate log2 function
> that
> > > is undefined for 0 and negative values (i.e., can return any value for
> such
> > > arguments), let's just not pretend it's the MySQL function.
> > > >
> > > > Mihai
> > > > 
> > > > From: Julian Hyde 
> > > > Sent: Thursday, February 22, 2024 4:05 PM
> > > > To: dev@calcite.apache.org 
> > > > Subject: Re: Some questions about calcite
> > > >
> > > > But what is the spec of the LOG2 function? It’s not in the SQL
> standard.
> > > So, we need to write our own spec. We can say that LOG2(0) returns 42,
> if
> > > we wish, and go implement our own spec.
> > > >
> > > > Yes, Calcite is a compiler, but it is also a standard library, and
> it is
> > > also an extended library. LOG2 is in the latter category. If you, as a
> > > vendor, don’t trust the implementation of LOG2 then you can exclude it
> from
> > > your distribution.
> > > >
> > > > As an open source project we have to BOTH improve the quality of our
> > > core and lower the barrier to contributions to the non-core code. We
> have
> > > to recognize that not everything is the same standard. And I think
> vendors,
> > > like your company, who want to deliver a high-quality experience
> should put
> > > barriers around what features are trusted.
> > > >
> > > > Julian
> > > >
> > > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > > >>
> > > >> If we can't even implement correctly the log2 function according to
> its
> > > spec, there is no hope that we will implement anything correctly.
> > > >>
> > > >> I am not a QA person, but I am spending more than 50% of my time
> > > diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> > > implement interesting new functionality. But I cannot tell a user of
> our
> > > tools "I have no idea whether the results you get using this tool will
> be
> > > correct. If you are lucky, they will be, don't worry about corner
> cases."
> > > Our goal is to use Calcite in a production environment. If Calcite is
> > > designed to be just a research tool, maybe we should make that clear.
> > > >>
> > > >> There are fundamental bugs in Calcite which have been there for a
> > > decade. Even basic things like arithmetic casts are still incorrect.
> Which
> > > is proof that once a b

Re: Some questions about calcite

2024-02-22 Thread Cancai Cai
Hi, everyone
Thank you very much for your answers. In fact, my doubts may not only be about 
adapting log2 (function), as I have discovered a lot when adapting log2 
functions, such as log10(0) about the difference in return values ​​of mysql 
and postgres, about calcite Regarding the handling of infinities, I don't know 
why I haven't discovered them before. As clay said, calcite may have had its 
own set of standards before, but I may not be clear about this. I hope that 
calcite’s return value of the math function in certain calculation situations 
will meet the requirements of mathematics or SQL. rule. If mysql is lazy from 
the beginning, should we still follow it instead of just adapting to an extreme 
situation?

Best wishes.
Cancai Cai

On 2024/02/23 00:45:13 Tanner Clary wrote:
> I feel like I'm missing something about this whole issue. We have
> implemented so many functions that there's probably an existing pattern for
> just about any issue with dialect parity we encounter. What's the core
> problem? What behavior is so difficult to emulate and why? Caican let me
> know if you want to pair I'm happy to manage the edge cases if you want.
> 
> Tanner
> 
> On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde  wrote:
> 
> > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > accompanying PR muddies the waters because it also mentions Spark, Postgres
> > and “many databases”. The case should state that the function is consistent
> > with MySQL and returns NULL if the argument is non-positive.
> >
> > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > >
> > > In the case of log2 it's simple, because the documentation says that it
> > comes from the MySQL dialect. So there is a spec and a golden
> > implementation to compare against.
> > >
> > > I certainly won't object to implementing a separate log2 function that
> > is undefined for 0 and negative values (i.e., can return any value for such
> > arguments), let's just not pretend it's the MySQL function.
> > >
> > > Mihai
> > > 
> > > From: Julian Hyde 
> > > Sent: Thursday, February 22, 2024 4:05 PM
> > > To: dev@calcite.apache.org 
> > > Subject: Re: Some questions about calcite
> > >
> > > But what is the spec of the LOG2 function? It’s not in the SQL standard.
> > So, we need to write our own spec. We can say that LOG2(0) returns 42, if
> > we wish, and go implement our own spec.
> > >
> > > Yes, Calcite is a compiler, but it is also a standard library, and it is
> > also an extended library. LOG2 is in the latter category. If you, as a
> > vendor, don’t trust the implementation of LOG2 then you can exclude it from
> > your distribution.
> > >
> > > As an open source project we have to BOTH improve the quality of our
> > core and lower the barrier to contributions to the non-core code. We have
> > to recognize that not everything is the same standard. And I think vendors,
> > like your company, who want to deliver a high-quality experience should put
> > barriers around what features are trusted.
> > >
> > > Julian
> > >
> > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > >>
> > >> If we can't even implement correctly the log2 function according to its
> > spec, there is no hope that we will implement anything correctly.
> > >>
> > >> I am not a QA person, but I am spending more than 50% of my time
> > diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> > implement interesting new functionality. But I cannot tell a user of our
> > tools "I have no idea whether the results you get using this tool will be
> > correct. If you are lucky, they will be, don't worry about corner cases."
> > Our goal is to use Calcite in a production environment. If Calcite is
> > designed to be just a research tool, maybe we should make that clear.
> > >>
> > >> There are fundamental bugs in Calcite which have been there for a
> > decade. Even basic things like arithmetic casts are still incorrect. Which
> > is proof that once a bug is in, people are not incentivized to fix them. We
> > should not let bugs in deliberately. They may essentially never get fixed.
> > >>
> > >> I don't think a compiler can cut any corners. The compiler is the
> > foundation of an entire software ecosystem. If the foundation is broken,
> > everything crumbles.
> > >>
> > >> Mihai
> > >>
> > >> 
> > >> From: Julian Hyde 
> > >> Sent: Thursday, February 22, 2024 3:28 PM
> > >> To: dev@calcite.apache.org 
> > >> Subject: Re: Some questions about calcite
> > >>
> > >> Don’t let the perfect be the enemy of the good. Or as they say in open
> > source, “Release early and often”.
> > >>
> > >> Just about everyone who wants a LOG2 function is intending to apply it
> > to positive numbers. So they won’t notice, or care, that the function
> > doesn’t do exactly what they expected when you apply it to zero. You should
> > release a LOG2 function that does the right thing for the positive numbers,
> 

Re: about optimization rules for over(partition by id, age order by id, age)

2024-02-22 Thread Julian Hyde
I noticed that DuckDB implemented such an optimization recently.

Can you please log a jira case to remove redundant ORDER BY keys from OVER? 
Yes, it would be either a planner rule or a rewrite by RelBuilder.

Julian


> On Feb 21, 2024, at 11:43 PM, key lou  wrote:
> 
> HI ALL :
>   I have the following sql:
> select name, row_number() over(partition by id, age order by id, age) rn
> from t; In fact, it can be optimized into
> select name ,row_number() over(partition by id,age) rn from t;
> Remove order by because order by and partition by are repeated. Since they
> are grouped according to the partition by field, the sorting is
> meaningless. Is there such a solution in calcite? Optimize rules?
> thanks.



Re: Some questions about calcite

2024-02-22 Thread Tanner Clary
The SAFE_* functions do this.

On Thu, Feb 22, 2024 at 4:58 PM Mihai Budiu  wrote:

> The only complication that I can imagine is with the NullPolicy, but I
> haven't tried to implement it myself.
>
> This function may return NULL when neither of the operands is NULL. This
> is unusual for a function.
>
> Mihai
> 
> From: Tanner Clary 
> Sent: Thursday, February 22, 2024 4:45 PM
> To: dev@calcite.apache.org 
> Subject: Re: Some questions about calcite
>
> I feel like I'm missing something about this whole issue. We have
> implemented so many functions that there's probably an existing pattern for
> just about any issue with dialect parity we encounter. What's the core
> problem? What behavior is so difficult to emulate and why? Caican let me
> know if you want to pair I'm happy to manage the edge cases if you want.
>
> Tanner
>
> On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde 
> wrote:
>
> > I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> > accompanying PR muddies the waters because it also mentions Spark,
> Postgres
> > and “many databases”. The case should state that the function is
> consistent
> > with MySQL and returns NULL if the argument is non-positive.
> >
> > > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> > >
> > > In the case of log2 it's simple, because the documentation says that it
> > comes from the MySQL dialect. So there is a spec and a golden
> > implementation to compare against.
> > >
> > > I certainly won't object to implementing a separate log2 function that
> > is undefined for 0 and negative values (i.e., can return any value for
> such
> > arguments), let's just not pretend it's the MySQL function.
> > >
> > > Mihai
> > > 
> > > From: Julian Hyde 
> > > Sent: Thursday, February 22, 2024 4:05 PM
> > > To: dev@calcite.apache.org 
> > > Subject: Re: Some questions about calcite
> > >
> > > But what is the spec of the LOG2 function? It’s not in the SQL
> standard.
> > So, we need to write our own spec. We can say that LOG2(0) returns 42, if
> > we wish, and go implement our own spec.
> > >
> > > Yes, Calcite is a compiler, but it is also a standard library, and it
> is
> > also an extended library. LOG2 is in the latter category. If you, as a
> > vendor, don’t trust the implementation of LOG2 then you can exclude it
> from
> > your distribution.
> > >
> > > As an open source project we have to BOTH improve the quality of our
> > core and lower the barrier to contributions to the non-core code. We have
> > to recognize that not everything is the same standard. And I think
> vendors,
> > like your company, who want to deliver a high-quality experience should
> put
> > barriers around what features are trusted.
> > >
> > > Julian
> > >
> > >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> > >>
> > >> If we can't even implement correctly the log2 function according to
> its
> > spec, there is no hope that we will implement anything correctly.
> > >>
> > >> I am not a QA person, but I am spending more than 50% of my time
> > diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> > implement interesting new functionality. But I cannot tell a user of our
> > tools "I have no idea whether the results you get using this tool will be
> > correct. If you are lucky, they will be, don't worry about corner cases."
> > Our goal is to use Calcite in a production environment. If Calcite is
> > designed to be just a research tool, maybe we should make that clear.
> > >>
> > >> There are fundamental bugs in Calcite which have been there for a
> > decade. Even basic things like arithmetic casts are still incorrect.
> Which
> > is proof that once a bug is in, people are not incentivized to fix them.
> We
> > should not let bugs in deliberately. They may essentially never get
> fixed.
> > >>
> > >> I don't think a compiler can cut any corners. The compiler is the
> > foundation of an entire software ecosystem. If the foundation is broken,
> > everything crumbles.
> > >>
> > >> Mihai
> > >>
> > >> 
> > >> From: Julian Hyde 
> > >> Sent: Thursday, February 22, 2024 3:28 PM
> > >> To: dev@calcite.apache.org 
> > >> Subject: Re: Some questions about calcite
> > >>
> > >> Don’t let the perfect be the enemy of the good. Or as they say in open
> > source, “Release early and often”.
> > >>
> > >> Just about everyone who wants a LOG2 function is intending to apply it
> > to positive numbers. So they won’t notice, or care, that the function
> > doesn’t do exactly what they expected when you apply it to zero. You
> should
> > release a LOG2 function that does the right thing for the positive
> numbers,
> > if it’s less effort than handling all non-negative numbers.
> > >>
> > >> Don’t listen too much to the QA folks. Their job is to find the corner
> > cases. But they forget that the corner cases are usually not as important
> > as the core cases. So, let the QA folks log bugs (or

Re: Some questions about calcite

2024-02-22 Thread Mihai Budiu
The only complication that I can imagine is with the NullPolicy, but I haven't 
tried to implement it myself.

This function may return NULL when neither of the operands is NULL. This is 
unusual for a function.

Mihai

From: Tanner Clary 
Sent: Thursday, February 22, 2024 4:45 PM
To: dev@calcite.apache.org 
Subject: Re: Some questions about calcite

I feel like I'm missing something about this whole issue. We have
implemented so many functions that there's probably an existing pattern for
just about any issue with dialect parity we encounter. What's the core
problem? What behavior is so difficult to emulate and why? Caican let me
know if you want to pair I'm happy to manage the edge cases if you want.

Tanner

On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde  wrote:

> I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> accompanying PR muddies the waters because it also mentions Spark, Postgres
> and “many databases”. The case should state that the function is consistent
> with MySQL and returns NULL if the argument is non-positive.
>
> > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> >
> > In the case of log2 it's simple, because the documentation says that it
> comes from the MySQL dialect. So there is a spec and a golden
> implementation to compare against.
> >
> > I certainly won't object to implementing a separate log2 function that
> is undefined for 0 and negative values (i.e., can return any value for such
> arguments), let's just not pretend it's the MySQL function.
> >
> > Mihai
> > 
> > From: Julian Hyde 
> > Sent: Thursday, February 22, 2024 4:05 PM
> > To: dev@calcite.apache.org 
> > Subject: Re: Some questions about calcite
> >
> > But what is the spec of the LOG2 function? It’s not in the SQL standard.
> So, we need to write our own spec. We can say that LOG2(0) returns 42, if
> we wish, and go implement our own spec.
> >
> > Yes, Calcite is a compiler, but it is also a standard library, and it is
> also an extended library. LOG2 is in the latter category. If you, as a
> vendor, don’t trust the implementation of LOG2 then you can exclude it from
> your distribution.
> >
> > As an open source project we have to BOTH improve the quality of our
> core and lower the barrier to contributions to the non-core code. We have
> to recognize that not everything is the same standard. And I think vendors,
> like your company, who want to deliver a high-quality experience should put
> barriers around what features are trusted.
> >
> > Julian
> >
> >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> >>
> >> If we can't even implement correctly the log2 function according to its
> spec, there is no hope that we will implement anything correctly.
> >>
> >> I am not a QA person, but I am spending more than 50% of my time
> diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> implement interesting new functionality. But I cannot tell a user of our
> tools "I have no idea whether the results you get using this tool will be
> correct. If you are lucky, they will be, don't worry about corner cases."
> Our goal is to use Calcite in a production environment. If Calcite is
> designed to be just a research tool, maybe we should make that clear.
> >>
> >> There are fundamental bugs in Calcite which have been there for a
> decade. Even basic things like arithmetic casts are still incorrect. Which
> is proof that once a bug is in, people are not incentivized to fix them. We
> should not let bugs in deliberately. They may essentially never get fixed.
> >>
> >> I don't think a compiler can cut any corners. The compiler is the
> foundation of an entire software ecosystem. If the foundation is broken,
> everything crumbles.
> >>
> >> Mihai
> >>
> >> 
> >> From: Julian Hyde 
> >> Sent: Thursday, February 22, 2024 3:28 PM
> >> To: dev@calcite.apache.org 
> >> Subject: Re: Some questions about calcite
> >>
> >> Don’t let the perfect be the enemy of the good. Or as they say in open
> source, “Release early and often”.
> >>
> >> Just about everyone who wants a LOG2 function is intending to apply it
> to positive numbers. So they won’t notice, or care, that the function
> doesn’t do exactly what they expected when you apply it to zero. You should
> release a LOG2 function that does the right thing for the positive numbers,
> if it’s less effort than handling all non-negative numbers.
> >>
> >> Don’t listen too much to the QA folks. Their job is to find the corner
> cases. But they forget that the corner cases are usually not as important
> as the core cases. So, let the QA folks log bugs (or you can a log
> yourself, when you submit an imperfect implementation). Just release early
> and often.
> >>
> >> Also, note that the implementation of a function in Java, so that it
> can be executed by Calcite, does not have to be the *only* implementation.
> It is often better to have the JDBC adapter push th

Re: Some questions about calcite

2024-02-22 Thread Tanner Clary
I feel like I'm missing something about this whole issue. We have
implemented so many functions that there's probably an existing pattern for
just about any issue with dialect parity we encounter. What's the core
problem? What behavior is so difficult to emulate and why? Caican let me
know if you want to pair I'm happy to manage the edge cases if you want.

Tanner

On Thu, Feb 22, 2024 at 4:34 PM Julian Hyde  wrote:

> I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its
> accompanying PR muddies the waters because it also mentions Spark, Postgres
> and “many databases”. The case should state that the function is consistent
> with MySQL and returns NULL if the argument is non-positive.
>
> > On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> >
> > In the case of log2 it's simple, because the documentation says that it
> comes from the MySQL dialect. So there is a spec and a golden
> implementation to compare against.
> >
> > I certainly won't object to implementing a separate log2 function that
> is undefined for 0 and negative values (i.e., can return any value for such
> arguments), let's just not pretend it's the MySQL function.
> >
> > Mihai
> > 
> > From: Julian Hyde 
> > Sent: Thursday, February 22, 2024 4:05 PM
> > To: dev@calcite.apache.org 
> > Subject: Re: Some questions about calcite
> >
> > But what is the spec of the LOG2 function? It’s not in the SQL standard.
> So, we need to write our own spec. We can say that LOG2(0) returns 42, if
> we wish, and go implement our own spec.
> >
> > Yes, Calcite is a compiler, but it is also a standard library, and it is
> also an extended library. LOG2 is in the latter category. If you, as a
> vendor, don’t trust the implementation of LOG2 then you can exclude it from
> your distribution.
> >
> > As an open source project we have to BOTH improve the quality of our
> core and lower the barrier to contributions to the non-core code. We have
> to recognize that not everything is the same standard. And I think vendors,
> like your company, who want to deliver a high-quality experience should put
> barriers around what features are trusted.
> >
> > Julian
> >
> >> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> >>
> >> If we can't even implement correctly the log2 function according to its
> spec, there is no hope that we will implement anything correctly.
> >>
> >> I am not a QA person, but I am spending more than 50% of my time
> diagnosing and fixing bugs in Calcite. It's not fun. I would rather
> implement interesting new functionality. But I cannot tell a user of our
> tools "I have no idea whether the results you get using this tool will be
> correct. If you are lucky, they will be, don't worry about corner cases."
> Our goal is to use Calcite in a production environment. If Calcite is
> designed to be just a research tool, maybe we should make that clear.
> >>
> >> There are fundamental bugs in Calcite which have been there for a
> decade. Even basic things like arithmetic casts are still incorrect. Which
> is proof that once a bug is in, people are not incentivized to fix them. We
> should not let bugs in deliberately. They may essentially never get fixed.
> >>
> >> I don't think a compiler can cut any corners. The compiler is the
> foundation of an entire software ecosystem. If the foundation is broken,
> everything crumbles.
> >>
> >> Mihai
> >>
> >> 
> >> From: Julian Hyde 
> >> Sent: Thursday, February 22, 2024 3:28 PM
> >> To: dev@calcite.apache.org 
> >> Subject: Re: Some questions about calcite
> >>
> >> Don’t let the perfect be the enemy of the good. Or as they say in open
> source, “Release early and often”.
> >>
> >> Just about everyone who wants a LOG2 function is intending to apply it
> to positive numbers. So they won’t notice, or care, that the function
> doesn’t do exactly what they expected when you apply it to zero. You should
> release a LOG2 function that does the right thing for the positive numbers,
> if it’s less effort than handling all non-negative numbers.
> >>
> >> Don’t listen too much to the QA folks. Their job is to find the corner
> cases. But they forget that the corner cases are usually not as important
> as the core cases. So, let the QA folks log bugs (or you can a log
> yourself, when you submit an imperfect implementation). Just release early
> and often.
> >>
> >> Also, note that the implementation of a function in Java, so that it
> can be executed by Calcite, does not have to be the *only* implementation.
> It is often better to have the JDBC adapter push the function down. That is
> exactly what Bertil is doing for geospatial functions in
> https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking
> at making a map so that we know which SQL dialects can implement which
> functions.
> >>
> >> Julian
> >>
> >>
> >>
> >>> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
> >>>
> >>> When I was working CALCITE-6224
> >>> 

Re: Some questions about calcite

2024-02-22 Thread Julian Hyde
I agree. https://issues.apache.org/jira/browse/CALCITE-6224 and its 
accompanying PR muddies the waters because it also mentions Spark, Postgres and 
“many databases”. The case should state that the function is consistent with 
MySQL and returns NULL if the argument is non-positive. 

> On Feb 22, 2024, at 4:24 PM, Mihai Budiu  wrote:
> 
> In the case of log2 it's simple, because the documentation says that it comes 
> from the MySQL dialect. So there is a spec and a golden implementation to 
> compare against.
> 
> I certainly won't object to implementing a separate log2 function that is 
> undefined for 0 and negative values (i.e., can return any value for such 
> arguments), let's just not pretend it's the MySQL function.
> 
> Mihai
> 
> From: Julian Hyde 
> Sent: Thursday, February 22, 2024 4:05 PM
> To: dev@calcite.apache.org 
> Subject: Re: Some questions about calcite
> 
> But what is the spec of the LOG2 function? It’s not in the SQL standard. So, 
> we need to write our own spec. We can say that LOG2(0) returns 42, if we 
> wish, and go implement our own spec.
> 
> Yes, Calcite is a compiler, but it is also a standard library, and it is also 
> an extended library. LOG2 is in the latter category. If you, as a vendor, 
> don’t trust the implementation of LOG2 then you can exclude it from your 
> distribution.
> 
> As an open source project we have to BOTH improve the quality of our core and 
> lower the barrier to contributions to the non-core code. We have to recognize 
> that not everything is the same standard. And I think vendors, like your 
> company, who want to deliver a high-quality experience should put barriers 
> around what features are trusted.
> 
> Julian
> 
>> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
>> 
>> If we can't even implement correctly the log2 function according to its 
>> spec, there is no hope that we will implement anything correctly.
>> 
>> I am not a QA person, but I am spending more than 50% of my time diagnosing 
>> and fixing bugs in Calcite. It's not fun. I would rather implement 
>> interesting new functionality. But I cannot tell a user of our tools "I have 
>> no idea whether the results you get using this tool will be correct. If you 
>> are lucky, they will be, don't worry about corner cases." Our goal is to use 
>> Calcite in a production environment. If Calcite is designed to be just a 
>> research tool, maybe we should make that clear.
>> 
>> There are fundamental bugs in Calcite which have been there for a decade. 
>> Even basic things like arithmetic casts are still incorrect. Which is proof 
>> that once a bug is in, people are not incentivized to fix them. We should 
>> not let bugs in deliberately. They may essentially never get fixed.
>> 
>> I don't think a compiler can cut any corners. The compiler is the foundation 
>> of an entire software ecosystem. If the foundation is broken, everything 
>> crumbles.
>> 
>> Mihai
>> 
>> 
>> From: Julian Hyde 
>> Sent: Thursday, February 22, 2024 3:28 PM
>> To: dev@calcite.apache.org 
>> Subject: Re: Some questions about calcite
>> 
>> Don’t let the perfect be the enemy of the good. Or as they say in open 
>> source, “Release early and often”.
>> 
>> Just about everyone who wants a LOG2 function is intending to apply it to 
>> positive numbers. So they won’t notice, or care, that the function doesn’t 
>> do exactly what they expected when you apply it to zero. You should release 
>> a LOG2 function that does the right thing for the positive numbers, if it’s 
>> less effort than handling all non-negative numbers.
>> 
>> Don’t listen too much to the QA folks. Their job is to find the corner 
>> cases. But they forget that the corner cases are usually not as important as 
>> the core cases. So, let the QA folks log bugs (or you can a log yourself, 
>> when you submit an imperfect implementation). Just release early and often.
>> 
>> Also, note that the implementation of a function in Java, so that it can be 
>> executed by Calcite, does not have to be the *only* implementation. It is 
>> often better to have the JDBC adapter push the function down. That is 
>> exactly what Bertil is doing for geospatial functions in 
>> https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking at 
>> making a map so that we know which SQL dialects can implement which 
>> functions.
>> 
>> Julian
>> 
>> 
>> 
>>> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
>>> 
>>> When I was working CALCITE-6224
>>> , I encountered some
>>> problems and I always had some doubts in my heart.
>>> I thought about it for a long time, maybe I think I already understand the
>>> doubts in my heart.
>>> 
>>> As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
>>> own type rules. What calcite currently does is to use Java syntax to adapt
>>> to the SQL rules of each database to complete execution 

Re: Remove a specific type coersion rule

2024-02-22 Thread Mihai Budiu
When I filed the Jira issue I was planning to give an error, but I discovered 
that several dialects implement this cast, so my PR actually implements the 
behavior from Postgres.

I will amend the Jira case when we decide what the implementation should do. I 
can change the corresponding PR to give an error instead, but that is not a 
solution for the Pinot implementation either. It would be great if 
[CALCITE-3550] provides a solution for them.

Mihai


From: Julian Hyde 
Sent: Thursday, February 22, 2024 4:16 PM
To: dev@calcite.apache.org 
Subject: Re: Remove a specific type coersion rule

When I read CALCITE-6210 it wasn’t clear that you wanted Calcite to give a 
validation error when someone tries to cast a VARCHAR to a VARBINARY. I agree 
that that is the desirable behavior. (And I believe it is consistent with the 
SQL standard.) Can you amend the Jira case so that it clearly the goal?

Reading the code, I see that https://issues.apache.org/jira/browse/CALCITE-3550 
was trying to provide a way for people to change the coercion rules. Does that 
solution meet your needs?

Julian



> On Feb 22, 2024, at 6:59 AM, Gonzalo Ortiz Jaureguizar  
> wrote:
>
> Hello there,
>
> In the context of https://issues.apache.org/jira/browse/CALCITE-6210, the
> Apache Pinot team is thinking about forbidding casting from VARCHAR to
> VARBINARY.
>
> I've been trying to implement that, but I'm not sure if it is possible or
> not. Following the Javadoc of SqlTypeCoercionRule (which, btw, seems a bit
> outdated) I've tried to create my own coercion rule as:
>
> ```
>  private static SqlTypeCoercionRule createPinotCoercionRule() {
>// Initialize a Builder instance with the default mappings.
>Map> pinotTypeMapping = new
> HashMap<>(
>SqlTypeCoercionRule.instance().getTypeMapping()
>);
>pinotTypeMapping.put(SqlTypeName.BINARY,
> ImmutableSet.of(SqlTypeName.VARBINARY));
>pinotTypeMapping.put(SqlTypeName.VARBINARY,
> ImmutableSet.of(SqlTypeName.BINARY));
>
>// Initialize a SqlTypeCoercionRules with the new builder mappings.
>return SqlTypeCoercionRule.instance(pinotTypeMapping);
>  }
> ```
>
> Then I've tried to execute a query like: `select 1 from Table where
> varBinaryField = 'some text'` and even when that SqlTypeCoercionRule is
> used, the Validator turns that into `select 1 from Table where
> varBinaryField = cast('some text'` as VARBINARY)`, which should be illegal.
> That expression is then simplified when transformed into a RelRoot and then
> the error described in https://issues.apache.org/jira/browse/CALCITE-6210
> is thrown. Same cast is added with other queries like `select 1 from Table
> where OCTET_LENGTH('80c062bc98021f94f1404e9bda0f6b0202') > 0`.
>
> It seems that the reason why this CAST is being added is because
> AbstractTypeCoersion.commonTypeForBinaryComparison and
> AbstractTypeCoersion.implicitCast assume some coercions are always valid.
>
> The question then is: Is this working as expected? Should we assume that
> rules can be added to SqlTypeCoercionRule but they cannot be not removed?
> What are the alternatives we have if we want to be more restrictive than
> the castings explained in
> https://docs.google.com/spreadsheets/d/1GhleX5h5W8-kJKh7NMJ4vtoE78pwfaZRJl88ULX_MgU
> ?
>
> It would be fine to me if I could add extra enforcement at validation time.
> Specifically, if that enforcement could be added after Validator modified
> the AST so I can be sure it will catch any possible CAST. I can do that in
> a RelOptRule, but it would be better to enforce the restriction in SqlNode
> in order to be able to include in the error message the position in the
> original expression.
>
> Bests
>
> Gonzalo



Re: Some questions about calcite

2024-02-22 Thread Mihai Budiu
In the case of log2 it's simple, because the documentation says that it comes 
from the MySQL dialect. So there is a spec and a golden implementation to 
compare against.

I certainly won't object to implementing a separate log2 function that is 
undefined for 0 and negative values (i.e., can return any value for such 
arguments), let's just not pretend it's the MySQL function.

Mihai

From: Julian Hyde 
Sent: Thursday, February 22, 2024 4:05 PM
To: dev@calcite.apache.org 
Subject: Re: Some questions about calcite

But what is the spec of the LOG2 function? It’s not in the SQL standard. So, we 
need to write our own spec. We can say that LOG2(0) returns 42, if we wish, and 
go implement our own spec.

Yes, Calcite is a compiler, but it is also a standard library, and it is also 
an extended library. LOG2 is in the latter category. If you, as a vendor, don’t 
trust the implementation of LOG2 then you can exclude it from your distribution.

As an open source project we have to BOTH improve the quality of our core and 
lower the barrier to contributions to the non-core code. We have to recognize 
that not everything is the same standard. And I think vendors, like your 
company, who want to deliver a high-quality experience should put barriers 
around what features are trusted.

Julian

> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
>
> If we can't even implement correctly the log2 function according to its spec, 
> there is no hope that we will implement anything correctly.
>
> I am not a QA person, but I am spending more than 50% of my time diagnosing 
> and fixing bugs in Calcite. It's not fun. I would rather implement 
> interesting new functionality. But I cannot tell a user of our tools "I have 
> no idea whether the results you get using this tool will be correct. If you 
> are lucky, they will be, don't worry about corner cases." Our goal is to use 
> Calcite in a production environment. If Calcite is designed to be just a 
> research tool, maybe we should make that clear.
>
> There are fundamental bugs in Calcite which have been there for a decade. 
> Even basic things like arithmetic casts are still incorrect. Which is proof 
> that once a bug is in, people are not incentivized to fix them. We should not 
> let bugs in deliberately. They may essentially never get fixed.
>
> I don't think a compiler can cut any corners. The compiler is the foundation 
> of an entire software ecosystem. If the foundation is broken, everything 
> crumbles.
>
> Mihai
>
> 
> From: Julian Hyde 
> Sent: Thursday, February 22, 2024 3:28 PM
> To: dev@calcite.apache.org 
> Subject: Re: Some questions about calcite
>
> Don’t let the perfect be the enemy of the good. Or as they say in open 
> source, “Release early and often”.
>
> Just about everyone who wants a LOG2 function is intending to apply it to 
> positive numbers. So they won’t notice, or care, that the function doesn’t do 
> exactly what they expected when you apply it to zero. You should release a 
> LOG2 function that does the right thing for the positive numbers, if it’s 
> less effort than handling all non-negative numbers.
>
> Don’t listen too much to the QA folks. Their job is to find the corner cases. 
> But they forget that the corner cases are usually not as important as the 
> core cases. So, let the QA folks log bugs (or you can a log yourself, when 
> you submit an imperfect implementation). Just release early and often.
>
> Also, note that the implementation of a function in Java, so that it can be 
> executed by Calcite, does not have to be the *only* implementation. It is 
> often better to have the JDBC adapter push the function down. That is exactly 
> what Bertil is doing for geospatial functions in 
> https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking at 
> making a map so that we know which SQL dialects can implement which functions.
>
> Julian
>
>
>
>> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
>>
>> When I was working CALCITE-6224
>> , I encountered some
>> problems and I always had some doubts in my heart.
>> I thought about it for a long time, maybe I think I already understand the
>> doubts in my heart.
>>
>> As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
>> own type rules. What calcite currently does is to use Java syntax to adapt
>> to the SQL rules of each database to complete execution optimization. In
>> some extreme scenarios, the SQL rules of various databases are
>> inconsistent. Calcite
>> needs to be sure to adapt to these extreme situations. But, I mean, if one
>> day, for example, mysql returns the result of log10(0) as an error instead
>> of null, then does calcite need to adapt to the new version of mysql? If it
>> adapts to the new version of mysql, does calcite still need to adapt to the
>> old version of mysql? It seems to me that this may be a paradox. B

Re: Remove a specific type coersion rule

2024-02-22 Thread Julian Hyde
When I read CALCITE-6210 it wasn’t clear that you wanted Calcite to give a 
validation error when someone tries to cast a VARCHAR to a VARBINARY. I agree 
that that is the desirable behavior. (And I believe it is consistent with the 
SQL standard.) Can you amend the Jira case so that it clearly the goal?

Reading the code, I see that https://issues.apache.org/jira/browse/CALCITE-3550 
was trying to provide a way for people to change the coercion rules. Does that 
solution meet your needs?

Julian



> On Feb 22, 2024, at 6:59 AM, Gonzalo Ortiz Jaureguizar  
> wrote:
> 
> Hello there,
> 
> In the context of https://issues.apache.org/jira/browse/CALCITE-6210, the
> Apache Pinot team is thinking about forbidding casting from VARCHAR to
> VARBINARY.
> 
> I've been trying to implement that, but I'm not sure if it is possible or
> not. Following the Javadoc of SqlTypeCoercionRule (which, btw, seems a bit
> outdated) I've tried to create my own coercion rule as:
> 
> ```
>  private static SqlTypeCoercionRule createPinotCoercionRule() {
>// Initialize a Builder instance with the default mappings.
>Map> pinotTypeMapping = new
> HashMap<>(
>SqlTypeCoercionRule.instance().getTypeMapping()
>);
>pinotTypeMapping.put(SqlTypeName.BINARY,
> ImmutableSet.of(SqlTypeName.VARBINARY));
>pinotTypeMapping.put(SqlTypeName.VARBINARY,
> ImmutableSet.of(SqlTypeName.BINARY));
> 
>// Initialize a SqlTypeCoercionRules with the new builder mappings.
>return SqlTypeCoercionRule.instance(pinotTypeMapping);
>  }
> ```
> 
> Then I've tried to execute a query like: `select 1 from Table where
> varBinaryField = 'some text'` and even when that SqlTypeCoercionRule is
> used, the Validator turns that into `select 1 from Table where
> varBinaryField = cast('some text'` as VARBINARY)`, which should be illegal.
> That expression is then simplified when transformed into a RelRoot and then
> the error described in https://issues.apache.org/jira/browse/CALCITE-6210
> is thrown. Same cast is added with other queries like `select 1 from Table
> where OCTET_LENGTH('80c062bc98021f94f1404e9bda0f6b0202') > 0`.
> 
> It seems that the reason why this CAST is being added is because
> AbstractTypeCoersion.commonTypeForBinaryComparison and
> AbstractTypeCoersion.implicitCast assume some coercions are always valid.
> 
> The question then is: Is this working as expected? Should we assume that
> rules can be added to SqlTypeCoercionRule but they cannot be not removed?
> What are the alternatives we have if we want to be more restrictive than
> the castings explained in
> https://docs.google.com/spreadsheets/d/1GhleX5h5W8-kJKh7NMJ4vtoE78pwfaZRJl88ULX_MgU
> ?
> 
> It would be fine to me if I could add extra enforcement at validation time.
> Specifically, if that enforcement could be added after Validator modified
> the AST so I can be sure it will catch any possible CAST. I can do that in
> a RelOptRule, but it would be better to enforce the restriction in SqlNode
> in order to be able to include in the error message the position in the
> original expression.
> 
> Bests
> 
> Gonzalo



Re: Some questions about calcite

2024-02-22 Thread Julian Hyde
But what is the spec of the LOG2 function? It’s not in the SQL standard. So, we 
need to write our own spec. We can say that LOG2(0) returns 42, if we wish, and 
go implement our own spec.

Yes, Calcite is a compiler, but it is also a standard library, and it is also 
an extended library. LOG2 is in the latter category. If you, as a vendor, don’t 
trust the implementation of LOG2 then you can exclude it from your distribution.

As an open source project we have to BOTH improve the quality of our core and 
lower the barrier to contributions to the non-core code. We have to recognize 
that not everything is the same standard. And I think vendors, like your 
company, who want to deliver a high-quality experience should put barriers 
around what features are trusted.

Julian

> On Feb 22, 2024, at 3:42 PM, Mihai Budiu  wrote:
> 
> If we can't even implement correctly the log2 function according to its spec, 
> there is no hope that we will implement anything correctly.
> 
> I am not a QA person, but I am spending more than 50% of my time diagnosing 
> and fixing bugs in Calcite. It's not fun. I would rather implement 
> interesting new functionality. But I cannot tell a user of our tools "I have 
> no idea whether the results you get using this tool will be correct. If you 
> are lucky, they will be, don't worry about corner cases." Our goal is to use 
> Calcite in a production environment. If Calcite is designed to be just a 
> research tool, maybe we should make that clear.
> 
> There are fundamental bugs in Calcite which have been there for a decade. 
> Even basic things like arithmetic casts are still incorrect. Which is proof 
> that once a bug is in, people are not incentivized to fix them. We should not 
> let bugs in deliberately. They may essentially never get fixed.
> 
> I don't think a compiler can cut any corners. The compiler is the foundation 
> of an entire software ecosystem. If the foundation is broken, everything 
> crumbles.
> 
> Mihai
> 
> 
> From: Julian Hyde 
> Sent: Thursday, February 22, 2024 3:28 PM
> To: dev@calcite.apache.org 
> Subject: Re: Some questions about calcite
> 
> Don’t let the perfect be the enemy of the good. Or as they say in open 
> source, “Release early and often”.
> 
> Just about everyone who wants a LOG2 function is intending to apply it to 
> positive numbers. So they won’t notice, or care, that the function doesn’t do 
> exactly what they expected when you apply it to zero. You should release a 
> LOG2 function that does the right thing for the positive numbers, if it’s 
> less effort than handling all non-negative numbers.
> 
> Don’t listen too much to the QA folks. Their job is to find the corner cases. 
> But they forget that the corner cases are usually not as important as the 
> core cases. So, let the QA folks log bugs (or you can a log yourself, when 
> you submit an imperfect implementation). Just release early and often.
> 
> Also, note that the implementation of a function in Java, so that it can be 
> executed by Calcite, does not have to be the *only* implementation. It is 
> often better to have the JDBC adapter push the function down. That is exactly 
> what Bertil is doing for geospatial functions in 
> https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking at 
> making a map so that we know which SQL dialects can implement which functions.
> 
> Julian
> 
> 
> 
>> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
>> 
>> When I was working CALCITE-6224
>> , I encountered some
>> problems and I always had some doubts in my heart.
>> I thought about it for a long time, maybe I think I already understand the
>> doubts in my heart.
>> 
>> As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
>> own type rules. What calcite currently does is to use Java syntax to adapt
>> to the SQL rules of each database to complete execution optimization. In
>> some extreme scenarios, the SQL rules of various databases are
>> inconsistent. Calcite
>> needs to be sure to adapt to these extreme situations. But, I mean, if one
>> day, for example, mysql returns the result of log10(0) as an error instead
>> of null, then does calcite need to adapt to the new version of mysql? If it
>> adapts to the new version of mysql, does calcite still need to adapt to the
>> old version of mysql? It seems to me that this may be a paradox. Because in
>> my opinion, it is very difficult to 100% adapt to the SQL dialect of all
>> databases, because different dialects of each database need to be
>> considered, and there may even be differences between versions of different
>> versions of databases.
>> 
>> Can anyone explain it to me? I would be very grateful.
> 



Re: Some questions about calcite

2024-02-22 Thread Mihai Budiu
If we can't even implement correctly the log2 function according to its spec, 
there is no hope that we will implement anything correctly.

I am not a QA person, but I am spending more than 50% of my time diagnosing and 
fixing bugs in Calcite. It's not fun. I would rather implement interesting new 
functionality. But I cannot tell a user of our tools "I have no idea whether 
the results you get using this tool will be correct. If you are lucky, they 
will be, don't worry about corner cases." Our goal is to use Calcite in a 
production environment. If Calcite is designed to be just a research tool, 
maybe we should make that clear.

There are fundamental bugs in Calcite which have been there for a decade. Even 
basic things like arithmetic casts are still incorrect. Which is proof that 
once a bug is in, people are not incentivized to fix them. We should not let 
bugs in deliberately. They may essentially never get fixed.

I don't think a compiler can cut any corners. The compiler is the foundation of 
an entire software ecosystem. If the foundation is broken, everything crumbles.

Mihai


From: Julian Hyde 
Sent: Thursday, February 22, 2024 3:28 PM
To: dev@calcite.apache.org 
Subject: Re: Some questions about calcite

Don’t let the perfect be the enemy of the good. Or as they say in open source, 
“Release early and often”.

Just about everyone who wants a LOG2 function is intending to apply it to 
positive numbers. So they won’t notice, or care, that the function doesn’t do 
exactly what they expected when you apply it to zero. You should release a LOG2 
function that does the right thing for the positive numbers, if it’s less 
effort than handling all non-negative numbers.

Don’t listen too much to the QA folks. Their job is to find the corner cases. 
But they forget that the corner cases are usually not as important as the core 
cases. So, let the QA folks log bugs (or you can a log yourself, when you 
submit an imperfect implementation). Just release early and often.

Also, note that the implementation of a function in Java, so that it can be 
executed by Calcite, does not have to be the *only* implementation. It is often 
better to have the JDBC adapter push the function down. That is exactly what 
Bertil is doing for geospatial functions in 
https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking at 
making a map so that we know which SQL dialects can implement which functions.

Julian



> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
>
> When I was working CALCITE-6224
> , I encountered some
> problems and I always had some doubts in my heart.
> I thought about it for a long time, maybe I think I already understand the
> doubts in my heart.
>
> As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
> own type rules. What calcite currently does is to use Java syntax to adapt
> to the SQL rules of each database to complete execution optimization. In
> some extreme scenarios, the SQL rules of various databases are
> inconsistent. Calcite
> needs to be sure to adapt to these extreme situations. But, I mean, if one
> day, for example, mysql returns the result of log10(0) as an error instead
> of null, then does calcite need to adapt to the new version of mysql? If it
> adapts to the new version of mysql, does calcite still need to adapt to the
> old version of mysql? It seems to me that this may be a paradox. Because in
> my opinion, it is very difficult to 100% adapt to the SQL dialect of all
> databases, because different dialects of each database need to be
> considered, and there may even be differences between versions of different
> versions of databases.
>
> Can anyone explain it to me? I would be very grateful.



Re: Some questions about calcite

2024-02-22 Thread Julian Hyde
Don’t let the perfect be the enemy of the good. Or as they say in open source, 
“Release early and often”.

Just about everyone who wants a LOG2 function is intending to apply it to 
positive numbers. So they won’t notice, or care, that the function doesn’t do 
exactly what they expected when you apply it to zero. You should release a LOG2 
function that does the right thing for the positive numbers, if it’s less 
effort than handling all non-negative numbers.

Don’t listen too much to the QA folks. Their job is to find the corner cases. 
But they forget that the corner cases are usually not as important as the core 
cases. So, let the QA folks log bugs (or you can a log yourself, when you 
submit an imperfect implementation). Just release early and often. 

Also, note that the implementation of a function in Java, so that it can be 
executed by Calcite, does not have to be the *only* implementation. It is often 
better to have the JDBC adapter push the function down. That is exactly what 
Bertil is doing for geospatial functions in 
https://issues.apache.org/jira/browse/CALCITE-6239, and Tanner is looking at 
making a map so that we know which SQL dialects can implement which functions.

Julian



> On Feb 22, 2024, at 6:11 AM, Cancai Cai  wrote:
> 
> When I was working CALCITE-6224
> , I encountered some
> problems and I always had some doubts in my heart.
> I thought about it for a long time, maybe I think I already understand the
> doubts in my heart.
> 
> As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
> own type rules. What calcite currently does is to use Java syntax to adapt
> to the SQL rules of each database to complete execution optimization. In
> some extreme scenarios, the SQL rules of various databases are
> inconsistent. Calcite
> needs to be sure to adapt to these extreme situations. But, I mean, if one
> day, for example, mysql returns the result of log10(0) as an error instead
> of null, then does calcite need to adapt to the new version of mysql? If it
> adapts to the new version of mysql, does calcite still need to adapt to the
> old version of mysql? It seems to me that this may be a paradox. Because in
> my opinion, it is very difficult to 100% adapt to the SQL dialect of all
> databases, because different dialects of each database need to be
> considered, and there may even be differences between versions of different
> versions of databases.
> 
> Can anyone explain it to me? I would be very grateful.



Remove a specific type coersion rule

2024-02-22 Thread Gonzalo Ortiz Jaureguizar
Hello there,

In the context of https://issues.apache.org/jira/browse/CALCITE-6210, the
Apache Pinot team is thinking about forbidding casting from VARCHAR to
VARBINARY.

I've been trying to implement that, but I'm not sure if it is possible or
not. Following the Javadoc of SqlTypeCoercionRule (which, btw, seems a bit
outdated) I've tried to create my own coercion rule as:

```
  private static SqlTypeCoercionRule createPinotCoercionRule() {
// Initialize a Builder instance with the default mappings.
Map> pinotTypeMapping = new
HashMap<>(
SqlTypeCoercionRule.instance().getTypeMapping()
);
pinotTypeMapping.put(SqlTypeName.BINARY,
ImmutableSet.of(SqlTypeName.VARBINARY));
pinotTypeMapping.put(SqlTypeName.VARBINARY,
ImmutableSet.of(SqlTypeName.BINARY));

// Initialize a SqlTypeCoercionRules with the new builder mappings.
return SqlTypeCoercionRule.instance(pinotTypeMapping);
  }
```

Then I've tried to execute a query like: `select 1 from Table where
varBinaryField = 'some text'` and even when that SqlTypeCoercionRule is
used, the Validator turns that into `select 1 from Table where
varBinaryField = cast('some text'` as VARBINARY)`, which should be illegal.
That expression is then simplified when transformed into a RelRoot and then
the error described in https://issues.apache.org/jira/browse/CALCITE-6210
is thrown. Same cast is added with other queries like `select 1 from Table
where OCTET_LENGTH('80c062bc98021f94f1404e9bda0f6b0202') > 0`.

It seems that the reason why this CAST is being added is because
AbstractTypeCoersion.commonTypeForBinaryComparison and
AbstractTypeCoersion.implicitCast assume some coercions are always valid.

The question then is: Is this working as expected? Should we assume that
rules can be added to SqlTypeCoercionRule but they cannot be not removed?
What are the alternatives we have if we want to be more restrictive than
the castings explained in
https://docs.google.com/spreadsheets/d/1GhleX5h5W8-kJKh7NMJ4vtoE78pwfaZRJl88ULX_MgU
?

It would be fine to me if I could add extra enforcement at validation time.
Specifically, if that enforcement could be added after Validator modified
the AST so I can be sure it will catch any possible CAST. I can do that in
a RelOptRule, but it would be better to enforce the restriction in SqlNode
in order to be able to include in the error message the position in the
original expression.

Bests

Gonzalo


Some questions about calcite

2024-02-22 Thread Cancai Cai
When I was working CALCITE-6224
, I encountered some
problems and I always had some doubts in my heart.
I thought about it for a long time, maybe I think I already understand the
doubts in my heart.

As @mihaibudiu said, Java grammar has its own type rules, and SQL has its
own type rules. What calcite currently does is to use Java syntax to adapt
to the SQL rules of each database to complete execution optimization. In
some extreme scenarios, the SQL rules of various databases are
inconsistent. Calcite
needs to be sure to adapt to these extreme situations. But, I mean, if one
day, for example, mysql returns the result of log10(0) as an error instead
of null, then does calcite need to adapt to the new version of mysql? If it
adapts to the new version of mysql, does calcite still need to adapt to the
old version of mysql? It seems to me that this may be a paradox. Because in
my opinion, it is very difficult to 100% adapt to the SQL dialect of all
databases, because different dialects of each database need to be
considered, and there may even be differences between versions of different
versions of databases.

Can anyone explain it to me? I would be very grateful.


[jira] [Created] (CALCITE-6278) Add REGEXP function (enabled in Spark library)

2024-02-22 Thread EveyWu (Jira)
 EveyWu created CALCITE-6278:


 Summary: Add REGEXP function (enabled in Spark library)
 Key: CALCITE-6278
 URL: https://issues.apache.org/jira/browse/CALCITE-6278
 Project: Calcite
  Issue Type: Improvement
Reporter:  EveyWu


Add Spark functions that have been implemented but have different 
OperandTypes/Returns.

Add Function 
[REGEXP|https://spark.apache.org/docs/latest/api/sql/index.html#regexp] 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)