Re: Joint OGC / ASF / OSGeo codesprint in 3 weeks

2024-02-02 Thread Charles Givre
Hi Bertil, 
Let me explain a bit about Drill and Calcite.   Drill uses Calcite for query 
planning.  Until fairly recently, Drill had a fork of Calcite that had some 
special features which Drill required.  However, about 1-2 versions ago, we 
were able to get Drill off of the fork an onto "mainstream" Calcite.   We're 
currently using version 1.34 (I think).  However...with version 1.35 of Calcite 
there were some breaking changes for Drill, so we haven't upgraded the 
dependency yet.  For someone who knows Calcite really well, I don't think that 
would be too difficult but the breaking issues had to do with the data types 
returned by some of the date functions... Anyway...

With respect to SQL functions, Drill does this on a case-by-case basis.  For 
certain situations, it relies on Calcite for the functions, and in other cases, 
it uses its own logic.  I'm not an expert on Drill's query planning, but I've 
tinkered with this a few times.  In any event, all of the geo functions in 
Drill are considered UDF and are in the contrib folder. [1]. Additionally, the 
ESRI reader is also in the contrib folder of the project. [2].   I think the 
rationale for this was that when the Geo functions were implemented, Drill was 
still stuck on the Calcite fork which did not have the Geo functions.  Another 
issue which we may encounter is that Drill does not have a specific spatial 
data type.  It relies on the VARBINARY data type for spatial data.   

Take all of this with a grain of salt. I'm not an expert on Calcite or GIS.  :-)
Best,
-- C


[1]: https://github.com/apache/drill/tree/master/contrib/udfs
[2]: https://github.com/apache/drill/tree/master/contrib/format-esri


> On Feb 2, 2024, at 01:25, Bertil Chapuis  wrote:
> 
> Hello Jia and Charles,
> 
> I'm really interested in this topic as well. Apache Calcite transitionned 
> from ESRI Geometry to JTS, and many ST functions have been implemented there 
> as well [1, 2, 3]. Sharing experiences and code could benefit all projects.
> 
> I haven’t looked into the details of each project, but from what I 
> understand, Sedona depends on Calcite through Flink, and Drill depends 
> directly on Calcite. Is that correct? Since these functions are available in 
> Calcite’s core, it means they may already be available in the respective 
> class paths.
> 
> Best regards,
> 
> Bertil
> 
> [1] 
> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/runtime/SpatialTypeFunctions.java
> [2] 
> https://github.com/apache/calcite/blob/main/core/src/test/resources/sql/spatial.iq
> [3] 
> https://calcite.apache.org/docs/reference.html#geometry-conversion-functions-2d
> 
> 
>> On 2 Feb 2024, at 06:47, Jia Yu  wrote:
>> 
>> Hi Charles,
>> 
>> This is Jia Yu from Apache Sedona. I think what you did is fantastic.
>> As a project of this Joint codespring, I am proposing to implement a
>> comprehensive set of spatial functions to Apache Drill using Apache
>> Sedona.
>> 
>> Apache Sedona has implemented over 130 ST functions and a
>> high-performance geometry serializer in pure Java. All these functions
>> have been ported to Apache Spark, Apache Flink and Snowflake. They are
>> being downloaded over 1.5 million times per month.
>> 
>> This porting process is fairly simple. Let's take Sedona on Apache
>> Flink as an example:
>> 
>> 1. Call a Sedona java function in a UDF template:
>> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
>> 2. Register this function in a catalog file:
>> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/Catalog.java
>> 
>> What do you think?
>> 
>> Thanks,
>> Jia
>> 
>> On Thu, Feb 1, 2024 at 2:44 PM Charles Givre  wrote:
>>> 
>>> Hi Martin,
>>> Thanks for sending.  I'd love for Drill to be included in this.  I have a 
>>> question for you.  A while ago, I started work on a collection of UDFs for 
>>> interacting with H3 Geo Indexes.  I'm not an expert on this but would this 
>>> be useful?  Here's the repo: https://github.com/datadistillr/drill-h3-udf   
>>> If someone would like to collaborate to complete this and get it 
>>> integrated, I'm all for that.
>>> Best,
>>> -- C
>>> 
>>> 
>>> 
 On Jan 31, 2024, at 10:20, Martin Desruisseaux 
  wrote:
 
 Hello all
 
 The Open Geospatial Consortium (OGC), The Apache Software Foundation (ASF) 
 and The Open Source Geospatial Foundation (OSGeo) hold a join code sprint 
 on February 26 to 28 [1]. The main goals are to support the development of 
 open standards for geospatial information and to support the development 
 of free and open source software which implements those standards, as well 
 as creating awareness about the standards and software projects. This is 
 the fourth year that this joint code sprint is organized, and this year 
 will be physically located in Évora (Portugal). The event can also be 
 attended on-line. 

Re: Joint OGC / ASF / OSGeo codesprint in 3 weeks

2024-02-02 Thread Charles Givre
Hi Jia, 
Thanks for your interest!  I'd be happy to work with you to build out a Drill 
<> Sedona collab.  This sounds really interesting and I think would be a great 
addition to both projects.
With that said, I'm totally unfamiliar with Sedona unfortunately, so I'm not 
sure how much help I can be on that side, but I do know something about 
Drill... :-)
Best,
-- C


> On Feb 2, 2024, at 00:47, Jia Yu  wrote:
> 
> Hi Charles,
> 
> This is Jia Yu from Apache Sedona. I think what you did is fantastic.
> As a project of this Joint codespring, I am proposing to implement a
> comprehensive set of spatial functions to Apache Drill using Apache
> Sedona.
> 
> Apache Sedona has implemented over 130 ST functions and a
> high-performance geometry serializer in pure Java. All these functions
> have been ported to Apache Spark, Apache Flink and Snowflake. They are
> being downloaded over 1.5 million times per month.
> 
> This porting process is fairly simple. Let's take Sedona on Apache
> Flink as an example:
> 
> 1. Call a Sedona java function in a UDF template:
> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/expressions/Functions.java
> 2. Register this function in a catalog file:
> https://github.com/apache/sedona/blob/master/flink/src/main/java/org/apache/sedona/flink/Catalog.java
> 
> What do you think?
> 
> Thanks,
> Jia
> 
> On Thu, Feb 1, 2024 at 2:44 PM Charles Givre  wrote:
>> 
>> Hi Martin,
>> Thanks for sending.  I'd love for Drill to be included in this.  I have a 
>> question for you.  A while ago, I started work on a collection of UDFs for 
>> interacting with H3 Geo Indexes.  I'm not an expert on this but would this 
>> be useful?  Here's the repo: https://github.com/datadistillr/drill-h3-udf   
>> If someone would like to collaborate to complete this and get it integrated, 
>> I'm all for that.
>> Best,
>> -- C
>> 
>> 
>> 
>>> On Jan 31, 2024, at 10:20, Martin Desruisseaux 
>>>  wrote:
>>> 
>>> Hello all
>>> 
>>> The Open Geospatial Consortium (OGC), The Apache Software Foundation (ASF) 
>>> and The Open Source Geospatial Foundation (OSGeo) hold a join code sprint 
>>> on February 26 to 28 [1]. The main goals are to support the development of 
>>> open standards for geospatial information and to support the development of 
>>> free and open source software which implements those standards, as well as 
>>> creating awareness about the standards and software projects. This is the 
>>> fourth year that this joint code sprint is organized, and this year will be 
>>> physically located in Évora (Portugal). The event can also be attended 
>>> on-line. Registration is free [2].
>>> 
>>> Apache SIS, Sedona, Baremaps, Parquet, Drill and Camel projects 
>>> participated in the past. It would be great if participation was possible 
>>> this year too. Some ideas could be:
>>> 
>>> * Experiment the use of Apache SIS in Sedona for referencing and grid
>>>  coverage services (could be a join effort between Sedona and SIS
>>>  developers).
>>> * Any work related to Geoparquet [3] (an incubating OGC standard based
>>>  on Apache Parquet).
>>> * Any work related to Drill GIS functions [4].
>>> * Any work related to Camel Geocoder [5]. For example, exploring the
>>>  pertinence of using the ISO 19112 standard (could be a join effort
>>>  between Camel and GeoAPI developers).
>>> 
>>> If anyone is interested, the wiki page [1] can be edited directly. If 
>>> particular you can add your project in the "Which Apache projects are going 
>>> to participate?" section. If an introduction to a project can be presented 
>>> as a tutorial, it can also be added in the "Mentor streams" section of [1].
>>> 
>>>Martin
>>> 
>>> [1]https://github.com/opengeospatial/developer-events/wiki/2024-Joint-OGC-%E2%80%93-OSGeo-%E2%80%93-ASF-Code-Sprint
>>> [2]https://developer.ogc.org/sprints/23/
>>> [3]https://geoparquet.org/
>>> [4]https://drill.apache.org/docs/gis-functions/
>>> [5]https://camel.apache.org/components/4.0.x/geocoder-component.html
>> 



Re: Joint OGC / ASF / OSGeo codesprint in 3 weeks

2024-02-02 Thread Martin Desruisseaux

Hello Charles

Le 2024-02-01 à 23 h 43, Charles Givre a écrit :


I'd love for Drill to be included in this.


That would be great. You are welcome to add Drill in the wiki page at [1].


A while ago, I started work on a collection of UDFs for interacting 
with H3 Geo Indexes. I'm not an expert on this but would this be useful?


I'm not an expert on this neither, but this H3 Geo Indexes seems to me 
in the scope of OGC Discrete Global Grid Systems (DGGS) [2], isn't it? 
DGGS is indeed something that we will probably need to support soon or 
later, and it seems to me that H3 could be an implementation of DGGS. On 
Apache SIS side, we may start some DGGS work later this year (we are not 
sure yet), but it would not be on time for the February code sprint.


    Martin

[1]https://github.com/opengeospatial/developer-events/wiki/2024-Joint-OGC-%E2%80%93-OSGeo-%E2%80%93-ASF-Code-Sprint#which-apache-projects-are-going-to-participate
[2]https://ogcapi.ogc.org/dggs/