[jira] [Created] (CALCITE-5565) Implement BigQuery LOG function

2023-03-07 Thread Tanner Clary (Jira)
Tanner Clary created CALCITE-5565:
-

 Summary: Implement BigQuery LOG function
 Key: CALCITE-5565
 URL: https://issues.apache.org/jira/browse/CALCITE-5565
 Project: Calcite
  Issue Type: Task
Reporter: Tanner Clary
Assignee: Tanner Clary


Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers 
the {{LOG}} function which allows for the base of the logarithm to be specified 
as a second argument. If the second argument is not provided, the base is 
defaulted to e (making it identical to the {{LN}} function. 

Example: {{LOG(64, 10)}} would return {{2}}.
Example 2: {{LOG(10)}} would return the same as {{LN(10)}}.

[BigQuery 
docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Future of the Spatial Type Extension

2023-03-07 Thread Julian Hyde



> On Mar 6, 2023, at 3:37 PM, Bertil Chapuis  wrote:
> 
> Thank you for your answers and for the pointers.
> 
>> PS Regarding which specification we choose to implement. The four principles 
>> you outline sound good to me. It’s always better to follow the standard. If 
>> leading implementations (e.g. PostGIS and H2GIS) diverge from the standard, 
>> we can make a note, and possibly support them as secondary implementations.
>> 
>> Sometimes there are compatible extensions. E.g. the OpenGIS function has two 
>> arguments, and the PostGIS version accepts two or three arguments. If 
>> someone calls Calcite's implementation of the function with three arguments, 
>> it is reasonable that they get the PostGIS behavior.
> 
> My feeling is that spatial is a rather small niche and that developers do not 
> necessarily seek compatibility with Postgis or H2GIS. Personally, I almost 
> always have to refer to the documentation of the RDBMS when writing queries 
> that use spatial functions and I have few expectations regarding their 
> behaviour beyond what is stated in the documentation. In this regard, I feel 
> that a good standalone documentation is preferable to a documentation with 
> references to other implementations that may differ in subtle ways.

It is a truism that the majority of SQL is generated by programs, not people. 
Those programs have already been written for industry-standard dialects (e.g. 
Oracle, Postgres) and therefore the best semantics is to be consistent with 
those standards.

For geospatial SQL, I suspect that a somewhat higher percentage of queries are 
written by people. Still, the principle of least surprise dictates that we 
should comply with the de facto standard unless there are compelling reasons 
not to.

If we depart from the standard for particular functions, let’s do it 
consciously, explicitly, and rarely.

Calcite does not currently have detailed documentation. We have benefited 
hugely from products that do, such as Postgres and H2GIS, because our users can 
safely assume that our semantics is consistent with the other products. 
Documenting our functions exhaustively - say, moving from one page for all 
functions to one page per function - would be a huge endeavor and I can’t see 
it happening with the current community. The best we could reasonably achieve 
would be separate pages for a very small number of exceptional functions.

> 
>>> Similar issues have come up with non-GIS functions. For example, the 
>>> DATEDIFF function [1]. Snowflake and MSSQL have ‘DATEDIFF(timeUnit, 
>>> datetime, datetime2)’, whereas MySQL has ‘DATEDIFF(date, date2)’. We 
>>> document which specification we implement, and potentially we could 
>>> implement both specifications, as either functions with different names, or 
>>> by selectively enabling the behavior that the user wants.
>>> 
>>> The most important thing is to document which specification we are 
>>> implementing (when there is ambiguity). Then if someone would prefer the 
>>> other specification, they can contribute an implementation, and we can 
>>> selectively enable it. 
> 
> Do you think we should eventually add a @LibraryOperator for spatial type 
> functions (i.e. OpenGIS, Postgis, H2GIS)? There is still a TODO (prior to the 
> changes I made) in the SpatialTypeFunctions class asking the following 
> question: Should we create aliases for functions in upper-case [1]?  My 
> current understanding of this question is that a possibility may be to have a 
> method named “buffer” in the class and an alias named “ST_Buffer” registered 
> somewhere else. Appart from resolving lots of sonar code smells related to 
> method names, this approach would allow to more easily introduce new methods 
> that don’t have ST_ counterparts in other RDBMS.

It would be good to move all spatial functions out of the “core” module at some 
point. If we did that, it would probably make sense to declare the geospatial 
functions in a class with similar structure to SqlLibraryOperators so that they 
can be selectively enabled.

At this point I don’t see much benefit for adding separate libraries for 
OpenGIS, Postgis, H2GIS. But that might change. As far as I can tell (correct 
me if I am wrong) they are mostly the same functions, with slightly different 
specifications. If so, a ‘compliance’ flag might be more appropriate.

I wouldn’t worry about code smells. Something done for a good reason isn’t a 
code smell. It is a false negative from a robot.

> After merging CALCITE-5367 [2], I think we will have a relatively decent 
> support for spatial types in core. At this stage, I think it would be good to 
> see if the organisation of the code can be improved before investigating 
> support for more advanced spatial functions (e.g. 3D, ST_AsMVT, 
> ST_AsFlatGeobuf, etc.).

Sounds good. Identify problems and propose solutions. Algebraic approaches - 
e.g. decomposing functions into smaller composable units, like how AVG 
decomposes into SUM and 

Re: A new logo for avatica?

2023-03-07 Thread Julian Hyde
Oops, yes.

I want people to see it as a standalone component.

On Tue, Mar 7, 2023 at 8:38 AM Michael Mior  wrote:
>
> > I don't want people to see it as a standalone component.
>
> Can I assume you meant that you *do* want it to be seen as a standalone
> component?
>
> --
> Michael Mior
> mm...@apache.org
>
>
> On Mon, Mar 6, 2023 at 8:50 PM Julian Hyde  wrote:
>
> > A logo would be great.
> >
> > I don't think the logo should be a variation of Calcite's logo.
> > Although Avatica is developed by the Calcite community, I don't want
> > people to see it as a standalone component.
> >
> > By the way, the name Avatica is inspired by the scientific name of the
> > Barn spider, Araneus cavaticus. So, if people are looking for an
> > inspiration for a logo, a spider might be a good place to start.
> >
> > Julian
> >
> > On Mon, Mar 6, 2023 at 12:33 AM Francis Chuang 
> > wrote:
> > >
> > > Calcite had a new logo designed in 2018/2019 [1] [2].
> > >
> > > I was wondering if it would be possible to get one made for Avatica. As
> > > Avatica is a subproject, perhaps its logo should be a variation of the
> > > Calcite logo to indicate that?
> > >
> > > Francis
> > >
> > > [1] https://lists.apache.org/thread/px48o0o16h6t3l22gvxkoxpnozcwocfn
> > > [2] https://lists.apache.org/thread/6pv73tdrox1lpf8ohz46gj78j4t3ko7s
> >


Re: A new logo for avatica?

2023-03-07 Thread Michael Mior
> I don't want people to see it as a standalone component.

Can I assume you meant that you *do* want it to be seen as a standalone
component?

--
Michael Mior
mm...@apache.org


On Mon, Mar 6, 2023 at 8:50 PM Julian Hyde  wrote:

> A logo would be great.
>
> I don't think the logo should be a variation of Calcite's logo.
> Although Avatica is developed by the Calcite community, I don't want
> people to see it as a standalone component.
>
> By the way, the name Avatica is inspired by the scientific name of the
> Barn spider, Araneus cavaticus. So, if people are looking for an
> inspiration for a logo, a spider might be a good place to start.
>
> Julian
>
> On Mon, Mar 6, 2023 at 12:33 AM Francis Chuang 
> wrote:
> >
> > Calcite had a new logo designed in 2018/2019 [1] [2].
> >
> > I was wondering if it would be possible to get one made for Avatica. As
> > Avatica is a subproject, perhaps its logo should be a variation of the
> > Calcite logo to indicate that?
> >
> > Francis
> >
> > [1] https://lists.apache.org/thread/px48o0o16h6t3l22gvxkoxpnozcwocfn
> > [2] https://lists.apache.org/thread/6pv73tdrox1lpf8ohz46gj78j4t3ko7s
>


[jira] [Created] (CALCITE-5564) Add 2-argument support for PERCENTILE_CONT/DISC

2023-03-07 Thread Tanner Clary (Jira)
Tanner Clary created CALCITE-5564:
-

 Summary: Add 2-argument support for PERCENTILE_CONT/DISC
 Key: CALCITE-5564
 URL: https://issues.apache.org/jira/browse/CALCITE-5564
 Project: Calcite
  Issue Type: Improvement
Reporter: Tanner Clary
Assignee: Tanner Clary


Calcite currently has implementations for the {{PERCENTILE_CONT}} and 
{{PERCENTILE_DISC}} functions. Their syntax may be found 
[here|https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql?view=sql-server-ver16].
 

BigQuery offers these functions as well, but the syntax is slightly different, 
and may be found 
[here|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont].
 The main difference is that instead of using a {{WITHIN GROUP}} clause, the 
array is passed in directly as the first argument to the function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input

2023-03-07 Thread asdfgh19 (Jira)
asdfgh19 created CALCITE-5563:
-

 Summary: Add a break to the inner loop of RelSubset#getParents and 
RelSubset#getParentSubsets after we find a matching Relsubset from its parent 
input
 Key: CALCITE-5563
 URL: https://issues.apache.org/jira/browse/CALCITE-5563
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: asdfgh19
Assignee: asdfgh19
 Fix For: 1.34.0


 
{code:java}
/**
 * Returns the collection of RelNodes one of whose inputs is in this
 * subset.
 */
Set getParents() {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  // see usage of this method in propagateCostImprovements0()
  if (rel == this) {
list.add(parent);
break;
  }
}
  }
  return list;
}

/**
 * Returns the collection of distinct subsets that contain a RelNode one
 * of whose inputs is in this subset.
 */
Set getParentSubsets(VolcanoPlanner planner) {
  final Set list = new LinkedHashSet<>();
  for (RelNode parent : set.getParentRels()) {
for (RelSubset rel : inputSubsets(parent)) {
  if (rel.set == set && rel.getTraitSet().equals(traitSet)) {
list.add(planner.getSubsetNonNull(parent));
break;
  }
}
  }
  return list;
}{code}
 

Once we have found a matching Relsubset from its parent input, we can 
immediately end the inner loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [QUESTION]: Bug Fix Release

2023-03-07 Thread Charles Givre
Will do.  

Sent from my iPhone

> On Mar 6, 2023, at 17:12, Stamatis Zampetakis  wrote:
> 
> Hey Charles,
> 
> Please test Drill with the latest calcite-1.34.0-SNAPSHOT [1] and if all is
> good on your end I will prepare an RC for vote.
> 
> Best,
> Stamatis
> 
> [1]
> https://repository.apache.org/content/groups/snapshots/org/apache/calcite/calcite-core/1.34.0-SNAPSHOT/
> 
>> On Sun, Mar 5, 2023 at 7:16 PM Charles Givre  wrote:
>> 
>> Julian,
>> Now that Drill is on main Calcite instead of the fork, I'll commit that
>> the Drill community will do our best to try Drill with the RC candidates to
>> see if we can catch issues during the release cycle.
>> Thanks,
>> -- C
>> 
>> 
 On Mar 5, 2023, at 12:20 PM, Julian Hyde  wrote:
>>> 
>>> It was indeed a regression, but it didn’t break any of Calcite’s tests
>> and no one spoke up during the release vote. Mistakes are expensive to fix
>> after a release, cheaper during the release vote, and cheapest of all if
>> found by the test suite.
>>> 
 On Mar 5, 2023, at 6:33 AM, Charles Givre  wrote:
 
 That would be great!  Again I’m only asking because this was a
>> regression.   I really do appreciate it.  Thanks!
 
 Sent from my iPhone
 
> On Mar 4, 2023, at 13:59, Stamatis Zampetakis 
>> wrote:
> 
> If we get the 1.34.0 out a bit sooner than usual I guess this will be
>> good
> enough for Drill. If the others agree I can try to prepare an RC during
> next week. WDYT ?
> 
> Best,
> Stamatis
> 
> 
>> On Sat, Mar 4, 2023, 6:13 PM Alessandro Solimando <
>> alessandro.solima...@gmail.com> wrote:
>> 
>> The second option Benchao mentions is what Hive currently does as
>> well.
>> 
>> Best regards,
>> Alessandro
>> 
 On Sat 4 Mar 2023, 13:19 Benchao Li,  wrote:
 
 Hi Charles,
 
 Thank for reaching out!
 
 IIRC, the idea of releasing bugfix version has been brought up in
>> the
>> past,
>>> but I couldn't find the discussion (in Jira and dev ML).
>>> 
>>> I'd like to share my understanding why we chose not to release bug
>> fix
>>> versions, please correct me if I'm wrong,
>>> - Calcite has many bug fixes that span multi versions (even more
>> that 10
>>> versions), then only keeping several (such as 3) bug fix releases
>> does
>> not
>>> solve all these problems.
>>> - Actually we usually do not distinguish too much between "bugfix"
>> and
>> "new
>>> feature", so maintaining bug fix releases is not that easy.
>>> - Calcite lacks reviewers and also release managers, only keeping
>> linear
>>> releasing in rhythm could save us some efforts.
>>> 
>>> For regressions, I agree that this hurts downstream projects. For
>> such
>>> cases, there are two approaches come into my mind:
>>> - We can release a new version quickly than usual.
>>> - The projects that need the fix/feature before our next scheduled
>> release,
>>> they could copy these files into their projects, as we already did in
>>> Flink[1]. They could remove these files once they adopt the new
>> release
>> of
>>> Calcite.
>>> 
>>> I hope this helps.
>>> 
>>> [1]
>>> 
>>> 
>> 
>> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
>>> 
>>> 
>>> Charles Givre  于2023年3月2日周四 06:22写道:
>>> 
 Hello Calcite Devs,
 I wanted to thank everyone for the recent release of Calcite 1.33.
>> I
>> am
 the PMC Chair for Apache Drill and we just released Drill 1.21[0]
>> which
>>> is
 now using the latest version of Calcite instead of our 2-3 year old
>> fork!
 
 However, we encountered a small issue with Calcite 1.33 that does
>> not
 affect just Drill.  Specifically, there was a regression which was
>> caused
 by CALCITE-5447[1] which effectively broke the DATE_TRUNC function.
>> The
 bugfix has been fixed and merged in CALCITE-5522[2].
 
 In any event, given that this function is fairly important and the
>>> lengthy
 release schedules of both Drill and Calcite, I wanted to ask whether
>> the
 Calcite might consider doing a quick bugfix release with this and
>> any
>>> other
 regressions that may have popped up in 1.33 and have since been
>> fixed.
 
 Thank you very much for all your work!
 Best,
 -- Charles
 
 
 [0]:
 
>>> 
>> 
>> https://github.com/apache/drill-site/blob/master/blog/_posts/en/2023-02-21-drill-1.21.0-released.md
 [1]: https://issues.apache.org/jira/browse/CALCITE-5447
 [2]: https://issues.apache.org/jira/browse/CALCITE-5522
>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best,
>>> Benchao Li
>>> 
>> 
>> 
>> 


Re: Insert data exceeding fixed length column.

2023-03-07 Thread stanilovsky evgeny
I rewrite test with end-to-end usage (hope it`s correct) and put it into  
blank.iq (test is near) and found that it not follow the sql standard.  
Also probably someone can suggest how can i write test with dynamic params  
involved ? I found that Quidem has no appropriate functionality.

Thanks !

create table fix_char_col (i int, c char(5));
(0 rows modified)

!update
insert into fix_char_col values (1, 'a'), (2, 'aa'), (3, '  
aa'), (4, 'aa'), (5, ' aa');

(1 row modified)

!update
SELECT * FROM fix_char_col ORDER BY i;
+---+---+
| I | C |
+---+---+
| 1 | a |
| 2 | a |
| 3 |   |
| 4 | aa|
| 5 |  aa   |
+---+---+
(2 rows)

!ok

and as a result :

expected:
< +---+---+
< | I | C |
< +---+---+
< | 1 | a |
< | 2 | a |
< | 3 |   |
< | 4 | aa|
< | 5 |  aa   |
< +---+---+

calcite output:

+---+-+
| I | C   |
+---+-+
| 1 | a   |
| 2 | aa  |
| 3 |  aa |
| 4 | aa  |
| 5 |  aa |
+---+-+


Ok, thanks i understand what you are talking about, i will move the test  
and fill issue if it necessary.



I agree that that query should throw when executed. If it doesn’t, you  
should log a bug.


However, that test in SqlToRelConverterTest doesn’t prove that Calcite  
doesn’t throw at execute time.


On Feb 8, 2023, at 11:20 PM, stanilovsky evgeny  
 wrote:



Hello, seems sql standard regulate situation when insertion (char,  
varchar) data exceeds fixed len column as :


If the declared type T is fixed length string with length in  
characters L and the

length in characters M of V is larger than L, then
Case:
a) If the right part M–L characters of V are all space`s, then the  
value of T is set to the first

L characters of V.
b) If one or more of the right part M–L characters of V are not  
space`s, then exception is raised.


But fast test [1] in SqlToRelConverterTest highlights that calcite not  
satisfy of such a rule.


I miss something or need to fill the issue ?

Thanks !


[1]
 @Test void testInsert1() {
   String moreThanVarcharLimit = "a".repeat(30);

   final String sql = "insert into emp (empno, ename, job, mgr,  
hiredate,\n"

   + "  sal, comm, deptno, slacker)\n"
   + "values(1, '" + moreThanVarcharLimit + "', 'job', 0,\n"
   + "  timestamp '1970-01-01 00:00:00', 1, 1, 1, false)";
   sql(sql).ok();
 }


[jira] [Created] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set

2023-03-07 Thread Yuxin Wu (Jira)
Yuxin Wu created CALCITE-5562:
-

 Summary: The result data is disordered when u tried to get data 
from an ordered set
 Key: CALCITE-5562
 URL: https://issues.apache.org/jira/browse/CALCITE-5562
 Project: Calcite
  Issue Type: Bug
Reporter: Yuxin Wu


*It's based on CSV-TEST-MODEL. The same as {{lex=MYSQL}}.*
{quote}!connect jdbc:calcite:model=src/test/resources/mode
{quote}

Step1: You need to get a table named 'emps' with data as below;
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps;
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from 
emps order by empid}}*{color}{_};
0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps 
order by empid;
+---+---+
| NAME  | EMPID |
+---+---+
| Wilma | 1     |
| John  | 2     |
| Alice | 2     |
| Eric  | 3     |
| Fred  | 30    |
+---+---+
Step3: Using the statement in step2 as a sub query like this:
0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, 
empid from emps order by empid);
+---+---+
| NAME  | EMPID |
+---+---+
| Fred  | 30    |
| Eric  | 3     |
| John  | 2     |
| Wilma | 1     |
| Alice | 2     |
+---+---+
*THE QUESTION IS:* 

*The result data is disordered while it's supposed to be the same as it is in 
step2.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)