[jira] [Created] (CALCITE-6239) Add a postgis dialect that supports ST functions

2024-02-02 Thread Bertil Chapuis (Jira)
Bertil Chapuis created CALCITE-6239:
---

 Summary: Add a postgis dialect that supports ST functions
 Key: CALCITE-6239
 URL: https://issues.apache.org/jira/browse/CALCITE-6239
 Project: Calcite
  Issue Type: New Feature
  Components: core
Reporter: Bertil Chapuis


Calcite implements support for spatial types (geometry, point, etc.) and 
spatial functions (ST_), and it can connect to PostGIS via a JdbcSchema. 
However, the Postgresql dialect does not currently handle spatial types and 
functions. As a result, Calcite tries to execute the spatial functions at the 
level of the JVM instead of pushing them down to postgis.

As a result, the following query gets executed, but the type of the geom column 
is incorrect:
SELECT id, geom FROM public.spatial_table

The following query fails with a ClassCastException as Calcite tries to use the 
java implementation of the ST_SRID function:
SELECT id, ST_SRID(geom) FROM public.spatial_table
java.lang.ClassCastException: class org.postgresql.util.PGobject cannot be cast 
to class org.locationtech.jts.geom.Geometry (org.postgresql.util.PGobject and 
org.locationtech.jts.geom.Geometry are in unnamed module of loader 'app')

In my current understanding, this issue could be addressed with a new 
PostgisSqlDialect that extends PostgresqlSqlDialect and adds support for 
spatial types and functions. Here is a tentative roadmap:
- Add all the spatial functions to the SqlKind class
- Create a PostgisSqlDialect class that extends PostgresqlSqlDialect
- Add support for the spatial types (geometry) by overriding the getCastSpec 
method of the SqlDialect class
- Add support for the spatial functions by overriding the supportsFunction 
method of the SqlDialect class
- Add support for the spatial aggregate functions by overriding the 
supportsAggregateFunction method of the SqlDialect class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-6238) Exception while evaluating ROUND function

2024-02-02 Thread Mihai Budiu (Jira)
Mihai Budiu created CALCITE-6238:


 Summary: Exception while evaluating ROUND function
 Key: CALCITE-6238
 URL: https://issues.apache.org/jira/browse/CALCITE-6238
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.36.0
Reporter: Mihai Budiu


The following test in CalciteSqlOperatorTest:

{code:java}
 @Test void testRoundFunc() {
final SqlOperatorFixture f = fixture();
f.checkScalar("round(42, CAST(2 as BIGINT))", 42, "INTEGER NOT NULL");
  }
{code}

causes an exception; here is the relevant part of the stack trace:

{code}
java.sql.SQLException: Error while executing SQL "values (round(42, CAST(2 as 
BIGINT)))": Unable to implement EnumerableCalc(expr#0=[{inputs}], expr#1=[42], 
expr#2=[2:BIGINT], expr#3=[ROUND($t1, $t2)], EXPR$0=[$t3]): rowcount = 1.0, 
cumulative cost = {2.0 rows, 6.0 cpu, 0.0 io}, id = 20
  EnumerableValues(tuples=[[{ 0 }]]): rowcount = 1.0, cumulative cost = {1.0 
rows, 1.0 cpu, 0.0 io}, id = 13
...
Suppressed: java.lang.RuntimeException: while resolving method 
'sround[int, long]' in class class org.apache.calcite.runtime.SqlFunctions
at 
org.apache.calcite.adapter.enumerable.EnumUtils.call(EnumUtils.java:679)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.call(RexImpTable.java:2818)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$MethodImplementor.implementSafe(RexImpTable.java:2799)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$AbstractRexCallImplementor.genValueStatement(RexImpTable.java:3857)
at 
org.apache.calcite.adapter.enumerable.RexImpTable$AbstractRexCallImplementor.implement(RexImpTable.java:3819)
{code}

And indeed, SqlFunctions does not have a function sround with this signature.

There are several possible fixes:
- reject calls to ROUND that have a BIGINT second argument
- have the validator insert an implicit cast for the second argument to INTEGER
- implement more Java versions of the SROUND function in SqlFunctions. Probably 
many more.

Which one of these is the right one? I suspect this problem applies to other 
SQL functions as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Registering rules from multiple RelNodes of identical type

2024-02-02 Thread Julian Hyde
That method is intended if you have a class MyTableScan and a generic rule that 
can handle any instance of MyTableScan. 

If you have (say) five instances of MyTableScan and five rule instances, one 
for each scan, then you should register each rule instance individually in the 
planner. 

There are a couple of pitfalls with your approach. Suppose you have a self join 
query “select * from t as t1, t as t2”, and t is of type MyTable. There is only 
one table but two scans, so you will need to register two rule instances. 

Those rule instances contain state, and therefore hold memory and other 
resources, you won’t want to keep them for a long time (e.g. in a cache). 

For these reasons, the recommended policy is to include the state in the table 
instance (not in the scan, and not in the rule) and therefore you can write 
generic rules that can be safely cached and reused. 

This approach makes it possible to write more complex rules - e.g. a rule that 
matches two scans of MyTable under a Join. In your model, if there are N scans 
of a MyTable in a query you would have to create and register N^2 rule 
instances. 

Julian 


> On Feb 2, 2024, at 5:45 AM, Austin Richardson 
>  wrote:
> 
> Hello Calcite devs,
> 
> Our team currently has a setup in which we have a single TableScan
> implementation (e.g. MyTableScan) serving multiple Calcite tables. Each
> MyTableScan handles the registration of rules tailored to its respective
> Calcite table. I've included an example of the code structure at the end of
> my email.
> 
> What we’ve run into is the class-based deduplication behaviour here
> ,
> meaning only one instance of MyTableScan can successfully register its
> rules.
> 
> We’re exploring solutions to split these such that each table has a unique
> TableScan implementation, but also wanted to check first: is there any
> alternate way we could keep the single class approach, but still have
> Calcite register each MyTableScan?
> 
> Best,
> Austin
> 
> 
> ---
> 
> 
> Example code structure:
> 
> 
> abstract class MyTable {
>abstract RelOptRule getRule();
> 
>@Override
>public RelNode toRel(...) {
>return new MyTableScan(getRule());
>}
> }
> 
> ...
> 
> class Table1 extends MyTable {
>@Override
>RelOptRule getRule() { ... }
> }
> 
> ...
> 
> class Table2 extends MyTable {
>@Override
>RelOptRule getRule() { ... }
> }
> 
> ...
> 
> class MyTableScan {
>RelOptRule rule;
> 
>// ... constructor where rule is injected ...
> 
>@Override
>public void register(RelOptPlanner planner) {
>planner.addRule(rule);
>}
> }


Re: Joint OGC / ASF / OSGeo codesprint in 3 weeks

2024-02-02 Thread Charles Givre
Hi Bertil, 
Let me explain a bit about Drill and Calcite.   Drill uses Calcite for query 
planning.  Until fairly recently, Drill had a fork of Calcite that had some 
special features which Drill required.  However, about 1-2 versions ago, we 
were able to get Drill off of the fork an onto "mainstream" Calcite.   We're 
currently using version 1.34 (I think).  However...with version 1.35 of Calcite 
there were some breaking changes for Drill, so we haven't upgraded the 
dependency yet.  For someone who knows Calcite really well, I don't think that 
would be too difficult but the breaking issues had to do with the data types 
returned by some of the date functions... Anyway...

With respect to SQL functions, Drill does this on a case-by-case basis.  For 
certain situations, it relies on Calcite for the functions, and in other cases, 
it uses its own logic.  I'm not an expert on Drill's query planning, but I've 
tinkered with this a few times.  In any event, all of the geo functions in 
Drill are considered UDF and are in the contrib folder. [1]. Additionally, the 
ESRI reader is also in the contrib folder of the project. [2].   I think the 
rationale for this was that when the Geo functions were implemented, Drill was 
still stuck on the Calcite fork which did not have the Geo functions.  Another 
issue which we may encounter is that Drill does not have a specific spatial 
data type.  It relies on the VARBINARY data type for spatial data.   

Take all of this with a grain of salt. I'm not an expert on Calcite or GIS.  :-)
Best,
-- C


[1]: https://github.com/apache/drill/tree/master/contrib/udfs
[2]: https://github.com/apache/drill/tree/master/contrib/format-esri


> On Feb 2, 2024, at 01:25, Bertil Chapuis  wrote:
> 
> Hello Jia and Charles,
> 
> I'm really interested in this topic as well. Apache Calcite transitionned 
> from ESRI Geometry to JTS, and many ST functions have been implemented there 
> as well [1, 2, 3]. Sharing experiences and code could benefit all projects.
> 
> I haven’t looked into the details of each project, but from what I 
> understand, Sedona depends on Calcite through Flink, and Drill depends 
> directly on Calcite. Is that correct? Since these functions are available in 
> Calcite’s core, it means they may already be available in the respective 
> class paths.
> 
> Best regards,
> 
> Bertil
> 
> [1] 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SpatialTypeFunctions.java
> [2] 
> https://github.com/apache/calcite/blob/main/core/src/test/resources/sql/spatial.iq
> [3] 
> https://calcite.apache.org/docs/reference.html#geometry-conversion-functions-2d
> 
> 
>> On 2 Feb 2024, at 06:47, Jia Yu  wrote:
>> 
>> Hi Charles,
>> 
>> This is Jia Yu from Apache Sedona. I think what you did is fantastic.
>> As a project of this Joint codespring, I am proposing to implement a
>> comprehensive set of spatial functions to Apache Drill using Apache
>> Sedona.
>> 
>> Apache Sedona has implemented over 130 ST functions and a
>> high-performance geometry serializer in pure Java. All these functions
>> have been ported to Apache Spark, Apache Flink and Snowflake. They are
>> being downloaded over 1.5 million times per month.
>> 
>> This porting process is fairly simple. Let's take Sedona on Apache
>> Flink as an example:
>> 
>> 1. Call a Sedona java function in a UDF template:
>> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
>> 2. Register this function in a catalog file:
>> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/Catalog.java
>> 
>> What do you think?
>> 
>> Thanks,
>> Jia
>> 
>> On Thu, Feb 1, 2024 at 2:44 PM Charles Givre  wrote:
>>> 
>>> Hi Martin,
>>> Thanks for sending.  I'd love for Drill to be included in this.  I have a 
>>> question for you.  A while ago, I started work on a collection of UDFs for 
>>> interacting with H3 Geo Indexes.  I'm not an expert on this but would this 
>>> be useful?  Here's the repo: https://github.com/datadistillr/drill-h3-udf   
>>> If someone would like to collaborate to complete this and get it 
>>> integrated, I'm all for that.
>>> Best,
>>> -- C
>>> 
>>> 
>>> 
 On Jan 31, 2024, at 10:20, Martin Desruisseaux 
  wrote:
 
 Hello all
 
 The Open Geospatial Consortium (OGC), The Apache Software Foundation (ASF) 
 and The Open Source Geospatial Foundation (OSGeo) hold a join code sprint 
 on February 26 to 28 [1]. The main goals are to support the development of 
 open standards for geospatial information and to support the development 
 of free and open source software which implements those standards, as well 
 as creating awareness about the standards and software projects. This is 
 the fourth year that this joint code sprint is organized, and this year 
 will be physically located in Évora (Portugal). The event can also be 
 attended on-line. 

Registering rules from multiple RelNodes of identical type

2024-02-02 Thread Austin Richardson
Hello Calcite devs,

Our team currently has a setup in which we have a single TableScan
implementation (e.g. MyTableScan) serving multiple Calcite tables. Each
MyTableScan handles the registration of rules tailored to its respective
Calcite table. I've included an example of the code structure at the end of
my email.

What we’ve run into is the class-based deduplication behaviour here
,
meaning only one instance of MyTableScan can successfully register its
rules.

We’re exploring solutions to split these such that each table has a unique
TableScan implementation, but also wanted to check first: is there any
alternate way we could keep the single class approach, but still have
Calcite register each MyTableScan?

Best,
Austin


---


Example code structure:


abstract class MyTable {
abstract RelOptRule getRule();

@Override
public RelNode toRel(...) {
return new MyTableScan(getRule());
}
}

...

class Table1 extends MyTable {
@Override
RelOptRule getRule() { ... }
}

...

class Table2 extends MyTable {
@Override
RelOptRule getRule() { ... }
}

...

class MyTableScan {
RelOptRule rule;

// ... constructor where rule is injected ...

@Override
public void register(RelOptPlanner planner) {
planner.addRule(rule);
}
}


Re: Create adapter for Apache Arrow

2024-02-02 Thread Michael Mior
Thanks for picking this up. One of my students and myself wrote a lot of
the original code. For a while this was blocked on necessary changes to
Apache Arrow, but this has since been resolved. I'd love to see this get
merged. It would help to rebase into a single commit. I'd also appreciate
one more set of eyes before this gets wrapped up. Thanks again Hongyu!

--
Michael Mior
mm...@apache.org


On Fri, Feb 2, 2024 at 6:40 AM Hongyu Guo  wrote:

> Hi devs,
>
> Recently I found a [CALCITE-2040] "Create adapter for Apache Arrow"[1] that
> is almost finished in PR#2810[2], but no one continues to push it forward,
> so I want to take it. I made some changes to the existing PR, including
> - Upgrade arrow to 15.0
> - Prohibit arrow adapter CI on Windows (for arrow system compatibility[3])
> - Add JAVA_OPTIONS for JDK19 CI --add-opens=java.base/java.nio=ALL-UNNAMED
> (for arrow java compatibility[4])
>
> This is my PR link: https://github.com/apache/calcite/pull/3666
>
> I haven't made any feature changes because I'm not familiar enough with
> Apache Arrow. I found that Jenkins CI will fail due to the JAVA_OPTIONS,
> and I will fix it later.
>
> Best,
> Hongyu
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2040
> [2] https://github.com/apache/calcite/pull/2810/commits
> [3] https://arrow.apache.org/docs/java/install.html#system-compatibility
> [4] https://arrow.apache.org/docs/java/install.html#java-compatibility
>


Create adapter for Apache Arrow

2024-02-02 Thread Hongyu Guo
Hi devs,

Recently I found a [CALCITE-2040] "Create adapter for Apache Arrow"[1] that
is almost finished in PR#2810[2], but no one continues to push it forward,
so I want to take it. I made some changes to the existing PR, including
- Upgrade arrow to 15.0
- Prohibit arrow adapter CI on Windows (for arrow system compatibility[3])
- Add JAVA_OPTIONS for JDK19 CI --add-opens=java.base/java.nio=ALL-UNNAMED
(for arrow java compatibility[4])

This is my PR link: https://github.com/apache/calcite/pull/3666

I haven't made any feature changes because I'm not familiar enough with
Apache Arrow. I found that Jenkins CI will fail due to the JAVA_OPTIONS,
and I will fix it later.

Best,
Hongyu

[1] https://issues.apache.org/jira/browse/CALCITE-2040
[2] https://github.com/apache/calcite/pull/2810/commits
[3] https://arrow.apache.org/docs/java/install.html#system-compatibility
[4] https://arrow.apache.org/docs/java/install.html#java-compatibility