Re: [Suggest] Add geo function to core

2023-01-17 Thread Bjørn Jørgensen
Mosaic by Databricks Labs 



tir. 17. jan. 2023 kl. 15:53 skrev Grigory Pomadchin :

> Hey Mo,
>
> That is awesome, great to hear!
>
> Best,
>
> Grigory
>
> On Tue, Jan 17, 2023 at 9:03 AM Mo Sarwat  wrote:
>
>> Grigory,
>>
>> Thanks a lot for chiming - I really like the PostGIS to PostgreSQL
>> analogy. That is exactly what Sedona (an Apache project) is to Spark. Spark
>> core should remain light / generic enough (similar to PostgreSQL) and all
>> spatial functionalities should be pluggable extensions (Sedona). Otherwise,
>> the core will be unnecessarily heavy to maintain, release, and integrate.
>>
>> Sedona already supports geo-hashing among many other geospatial standard
>> functionality, which work seamlessly with Spark without any issues to the
>> end user. If there is something missing, I would highly recommend that we
>> bring it to the Sedona community, and that will directly feed into the
>> benefit of Spark uses who are doing geo.
>>
>> Implementing geospatial functionality in the core Spark will be a
>> replication of work done already. Databricks for instance already uses
>> Sedona internally with their geospatial capabilities.
>>
>> Finally, I would like to mention that I am totally willing to be
>> corrected on that. Especially, if you tried Sedona with Spark and figured
>> that it does not serve the purpose at all. But, please try it first and
>> let's come up with a few capabilities it cannot provide unless it is
>> implemented in Spark core. And, then we can suggest those capabilities to
>> the Spark community.
>>
>> Thanks,
>> -Mo
>>
>>
>> On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
>> > Hey folks,
>> >
>> > Traditionally GIS functionality is distributed a bit separately - i.e.
>> > PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
>> > GeoWave may work out; I think GeoMesa implements GeoHash (see
>> >
>> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
>> > -
>> > could be used as an inspiration at least);
>> >
>> > I'm pretty sure DataBricks provides some GIS functions (H3) at this
>> point.
>> > Could be an argument for having smth in the core / officially supported
>> by
>> > Spark community?
>> >
>> > I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
>> > library with basic expressions and optimization rules in the wild that
>> is
>> > usable in the Spark native interfaces primarily; so there is no need to
>> > figure out the API / way to set it up and / or resolve peculiar
>> > dependencies. Could be a step towards Spark GIS types standardization.
>> >
>> > Best,
>> >
>> > Grigory
>> >
>> > On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat  wrote:
>> >
>> > > Martin, thanks for chiming in and mentioning Apache SIS. However,
>> Mark was
>> > > asking about Geo in Spark, which Sedona already supports.
>> > >
>> > > Yet, I like the idea of making all dependencies within the Apache
>> family.
>> > > I believe a good solution would be for you (or the SIS community at
>> large)
>> > > to include Apache SIS in Sedona to replace libs like GeoTools. The
>> Sedona
>> > > community would definitely welcome your contribution :)
>> > >
>> > > Regards,
>> > > -Mo
>> > >
>> > > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
>> > > > Hello Mark
>> > > >
>> > > > Indeed Sedona is surely a serious candidate. Maybe one aspect to
>> take in
>> > > consideration, depending how "core" the geospatial services would be,
>> is
>> > > that Sedona depends on a LGPL library (GeoTools, bundled separately)
>> for
>> > > map projections, Shapefile and GeoTIFF support. So those features
>> could not
>> > > be in core since category X dependencies shall be optional.
>> > > >
>> > > > Regarding referencing by coordinates (including map projections),
>> I'm
>> > > aware of 3 libraries having a license compatible with Apache:
>> > > >
>> > > > * Apache SIS (Apache License)
>> > > > * PROJ4J (Apache license)
>> > > > * PROJ-JNI (MIT license)
>> > > >
>> > > > PROJ-JNI is a binding to PROJ native library using Java Native
>> Interface
>> > > (JNI). PROJ is the most well known map projection library, but it is
>> > > difficult to bundle native code in a Java application.
>> > > >
>> > > > I'm not in a neutral position to said that, but I believe that
>> Apache
>> > > SIS is the most powerful open source pure-Java referencing library.
>> But it
>> > > is relatively big, about 4 Mb for the referencing module with its
>> > > dependencies, not counting the optional EPSG geodetic dataset
>> (because not
>> > > compatible with Apache license). Apache SIS is not the library with
>> the
>> > > largest amount of map projections (PROJ4J has more), but it handles
>> some
>> > > difficult problems and scale well with three- or four-dimensional
>> data (or
>> > > more).
>> > > >
>> > > > PROJ4J is a lightweight library which may be sufficient if data are
>> > > mostly two-dimensional (limited 3D support seems also 

Re: [Suggest] Add geo function to core

2023-01-17 Thread Grigory Pomadchin
Hey Mo,

That is awesome, great to hear!

Best,

Grigory

On Tue, Jan 17, 2023 at 9:03 AM Mo Sarwat  wrote:

> Grigory,
>
> Thanks a lot for chiming - I really like the PostGIS to PostgreSQL
> analogy. That is exactly what Sedona (an Apache project) is to Spark. Spark
> core should remain light / generic enough (similar to PostgreSQL) and all
> spatial functionalities should be pluggable extensions (Sedona). Otherwise,
> the core will be unnecessarily heavy to maintain, release, and integrate.
>
> Sedona already supports geo-hashing among many other geospatial standard
> functionality, which work seamlessly with Spark without any issues to the
> end user. If there is something missing, I would highly recommend that we
> bring it to the Sedona community, and that will directly feed into the
> benefit of Spark uses who are doing geo.
>
> Implementing geospatial functionality in the core Spark will be a
> replication of work done already. Databricks for instance already uses
> Sedona internally with their geospatial capabilities.
>
> Finally, I would like to mention that I am totally willing to be corrected
> on that. Especially, if you tried Sedona with Spark and figured that it
> does not serve the purpose at all. But, please try it first and let's come
> up with a few capabilities it cannot provide unless it is implemented in
> Spark core. And, then we can suggest those capabilities to the Spark
> community.
>
> Thanks,
> -Mo
>
>
> On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
> > Hey folks,
> >
> > Traditionally GIS functionality is distributed a bit separately - i.e.
> > PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
> > GeoWave may work out; I think GeoMesa implements GeoHash (see
> >
> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
> > -
> > could be used as an inspiration at least);
> >
> > I'm pretty sure DataBricks provides some GIS functions (H3) at this
> point.
> > Could be an argument for having smth in the core / officially supported
> by
> > Spark community?
> >
> > I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
> > library with basic expressions and optimization rules in the wild that is
> > usable in the Spark native interfaces primarily; so there is no need to
> > figure out the API / way to set it up and / or resolve peculiar
> > dependencies. Could be a step towards Spark GIS types standardization.
> >
> > Best,
> >
> > Grigory
> >
> > On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat  wrote:
> >
> > > Martin, thanks for chiming in and mentioning Apache SIS. However, Mark
> was
> > > asking about Geo in Spark, which Sedona already supports.
> > >
> > > Yet, I like the idea of making all dependencies within the Apache
> family.
> > > I believe a good solution would be for you (or the SIS community at
> large)
> > > to include Apache SIS in Sedona to replace libs like GeoTools. The
> Sedona
> > > community would definitely welcome your contribution :)
> > >
> > > Regards,
> > > -Mo
> > >
> > > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > > > Hello Mark
> > > >
> > > > Indeed Sedona is surely a serious candidate. Maybe one aspect to
> take in
> > > consideration, depending how "core" the geospatial services would be,
> is
> > > that Sedona depends on a LGPL library (GeoTools, bundled separately)
> for
> > > map projections, Shapefile and GeoTIFF support. So those features
> could not
> > > be in core since category X dependencies shall be optional.
> > > >
> > > > Regarding referencing by coordinates (including map projections), I'm
> > > aware of 3 libraries having a license compatible with Apache:
> > > >
> > > > * Apache SIS (Apache License)
> > > > * PROJ4J (Apache license)
> > > > * PROJ-JNI (MIT license)
> > > >
> > > > PROJ-JNI is a binding to PROJ native library using Java Native
> Interface
> > > (JNI). PROJ is the most well known map projection library, but it is
> > > difficult to bundle native code in a Java application.
> > > >
> > > > I'm not in a neutral position to said that, but I believe that Apache
> > > SIS is the most powerful open source pure-Java referencing library.
> But it
> > > is relatively big, about 4 Mb for the referencing module with its
> > > dependencies, not counting the optional EPSG geodetic dataset (because
> not
> > > compatible with Apache license). Apache SIS is not the library with the
> > > largest amount of map projections (PROJ4J has more), but it handles
> some
> > > difficult problems and scale well with three- or four-dimensional data
> (or
> > > more).
> > > >
> > > > PROJ4J is a lightweight library which may be sufficient if data are
> > > mostly two-dimensional (limited 3D support seems also possible) and if
> > > uncertainty of a few metres in coordinate transformations (depending
> how
> > > datum shifts are specified) is acceptable.
> > > >
> > > > It is possible to write some code in an implementation-independent
> way
> > > using GeoAPI interfaces, 

Re: [Suggest] Add geo function to core

2023-01-17 Thread Mo Sarwat
Grigory,

Thanks a lot for chiming - I really like the PostGIS to PostgreSQL analogy. 
That is exactly what Sedona (an Apache project) is to Spark. Spark core should 
remain light / generic enough (similar to PostgreSQL) and all spatial 
functionalities should be pluggable extensions (Sedona). Otherwise, the core 
will be unnecessarily heavy to maintain, release, and integrate. 

Sedona already supports geo-hashing among many other geospatial standard 
functionality, which work seamlessly with Spark without any issues to the end 
user. If there is something missing, I would highly recommend that we bring it 
to the Sedona community, and that will directly feed into the benefit of Spark 
uses who are doing geo.

Implementing geospatial functionality in the core Spark will be a replication 
of work done already. Databricks for instance already uses Sedona internally 
with their geospatial capabilities.

Finally, I would like to mention that I am totally willing to be corrected on 
that. Especially, if you tried Sedona with Spark and figured that it does not 
serve the purpose at all. But, please try it first and let's come up with a few 
capabilities it cannot provide unless it is implemented in Spark core. And, 
then we can suggest those capabilities to the Spark community.

Thanks,
-Mo
 

On 2023/01/17 03:09:06 Grigory Pomadchin wrote:
> Hey folks,
> 
> Traditionally GIS functionality is distributed a bit separately - i.e.
> PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
> GeoWave may work out; I think GeoMesa implements GeoHash (see
> https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
> -
> could be used as an inspiration at least);
> 
> I'm pretty sure DataBricks provides some GIS functions (H3) at this point.
> Could be an argument for having smth in the core / officially supported by
> Spark community?
> 
> I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
> library with basic expressions and optimization rules in the wild that is
> usable in the Spark native interfaces primarily; so there is no need to
> figure out the API / way to set it up and / or resolve peculiar
> dependencies. Could be a step towards Spark GIS types standardization.
> 
> Best,
> 
> Grigory
> 
> On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat  wrote:
> 
> > Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was
> > asking about Geo in Spark, which Sedona already supports.
> >
> > Yet, I like the idea of making all dependencies within the Apache family.
> > I believe a good solution would be for you (or the SIS community at large)
> > to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona
> > community would definitely welcome your contribution :)
> >
> > Regards,
> > -Mo
> >
> > On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > > Hello Mark
> > >
> > > Indeed Sedona is surely a serious candidate. Maybe one aspect to take in
> > consideration, depending how "core" the geospatial services would be, is
> > that Sedona depends on a LGPL library (GeoTools, bundled separately) for
> > map projections, Shapefile and GeoTIFF support. So those features could not
> > be in core since category X dependencies shall be optional.
> > >
> > > Regarding referencing by coordinates (including map projections), I'm
> > aware of 3 libraries having a license compatible with Apache:
> > >
> > > * Apache SIS (Apache License)
> > > * PROJ4J (Apache license)
> > > * PROJ-JNI (MIT license)
> > >
> > > PROJ-JNI is a binding to PROJ native library using Java Native Interface
> > (JNI). PROJ is the most well known map projection library, but it is
> > difficult to bundle native code in a Java application.
> > >
> > > I'm not in a neutral position to said that, but I believe that Apache
> > SIS is the most powerful open source pure-Java referencing library. But it
> > is relatively big, about 4 Mb for the referencing module with its
> > dependencies, not counting the optional EPSG geodetic dataset (because not
> > compatible with Apache license). Apache SIS is not the library with the
> > largest amount of map projections (PROJ4J has more), but it handles some
> > difficult problems and scale well with three- or four-dimensional data (or
> > more).
> > >
> > > PROJ4J is a lightweight library which may be sufficient if data are
> > mostly two-dimensional (limited 3D support seems also possible) and if
> > uncertainty of a few metres in coordinate transformations (depending how
> > datum shifts are specified) is acceptable.
> > >
> > > It is possible to write some code in an implementation-independent way
> > using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> > databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> > interfaces, so by using those interfaces you can let users choose among
> > those two implementations. I think that GeoAPI wrappers could easily be
> > contributed to PROJ4J as well if there is a desire 

Re: [Suggest] Add geo function to core

2023-01-16 Thread Grigory Pomadchin
Hey folks,

Traditionally GIS functionality is distributed a bit separately - i.e.
PostGIS is a great example; and indeed for GIS needs Sedona / GeoMesa /
GeoWave may work out; I think GeoMesa implements GeoHash (see
https://www.geomesa.org/documentation/stable/user/spark/sparksql_functions.html
-
could be used as an inspiration at least);

I'm pretty sure DataBricks provides some GIS functions (H3) at this point.
Could be an argument for having smth in the core / officially supported by
Spark community?

I'd really love to see some relatively lightweight (JTS + Proj4j / SIS)
library with basic expressions and optimization rules in the wild that is
usable in the Spark native interfaces primarily; so there is no need to
figure out the API / way to set it up and / or resolve peculiar
dependencies. Could be a step towards Spark GIS types standardization.

Best,

Grigory

On Mon, Jan 16, 2023 at 6:21 PM Mo Sarwat  wrote:

> Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was
> asking about Geo in Spark, which Sedona already supports.
>
> Yet, I like the idea of making all dependencies within the Apache family.
> I believe a good solution would be for you (or the SIS community at large)
> to include Apache SIS in Sedona to replace libs like GeoTools. The Sedona
> community would definitely welcome your contribution :)
>
> Regards,
> -Mo
>
> On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> > Hello Mark
> >
> > Indeed Sedona is surely a serious candidate. Maybe one aspect to take in
> consideration, depending how "core" the geospatial services would be, is
> that Sedona depends on a LGPL library (GeoTools, bundled separately) for
> map projections, Shapefile and GeoTIFF support. So those features could not
> be in core since category X dependencies shall be optional.
> >
> > Regarding referencing by coordinates (including map projections), I'm
> aware of 3 libraries having a license compatible with Apache:
> >
> > * Apache SIS (Apache License)
> > * PROJ4J (Apache license)
> > * PROJ-JNI (MIT license)
> >
> > PROJ-JNI is a binding to PROJ native library using Java Native Interface
> (JNI). PROJ is the most well known map projection library, but it is
> difficult to bundle native code in a Java application.
> >
> > I'm not in a neutral position to said that, but I believe that Apache
> SIS is the most powerful open source pure-Java referencing library. But it
> is relatively big, about 4 Mb for the referencing module with its
> dependencies, not counting the optional EPSG geodetic dataset (because not
> compatible with Apache license). Apache SIS is not the library with the
> largest amount of map projections (PROJ4J has more), but it handles some
> difficult problems and scale well with three- or four-dimensional data (or
> more).
> >
> > PROJ4J is a lightweight library which may be sufficient if data are
> mostly two-dimensional (limited 3D support seems also possible) and if
> uncertainty of a few metres in coordinate transformations (depending how
> datum shifts are specified) is acceptable.
> >
> > It is possible to write some code in an implementation-independent way
> using GeoAPI interfaces, which aim to do what JDBC interfaces do for
> databases. Apache SIS and PROJ-JNI are implementations of GeoAPI
> interfaces, so by using those interfaces you can let users choose among
> those two implementations. I think that GeoAPI wrappers could easily be
> contributed to PROJ4J as well if there is a desire for that.
> >
> > Regarding Geohash, if we are talking about the algorithm described at
> https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS
> supports also the Military Grid Reference System (MGRS), which can be seen
> as another kind of geohash with better characteristics.
> >
> > Regards,
> >
> > Martin
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [Suggest] Add geo function to core

2023-01-16 Thread Mo Sarwat
Martin, thanks for chiming in and mentioning Apache SIS. However, Mark was 
asking about Geo in Spark, which Sedona already supports. 

Yet, I like the idea of making all dependencies within the Apache family. I 
believe a good solution would be for you (or the SIS community at large) to 
include Apache SIS in Sedona to replace libs like GeoTools. The Sedona 
community would definitely welcome your contribution :)

Regards,
-Mo

On 2023/01/16 22:24:14 Martin Desruisseaux wrote:
> Hello Mark
> 
> Indeed Sedona is surely a serious candidate. Maybe one aspect to take in 
> consideration, depending how "core" the geospatial services would be, is that 
> Sedona depends on a LGPL library (GeoTools, bundled separately) for map 
> projections, Shapefile and GeoTIFF support. So those features could not be in 
> core since category X dependencies shall be optional.
> 
> Regarding referencing by coordinates (including map projections), I'm aware 
> of 3 libraries having a license compatible with Apache:
> 
> * Apache SIS (Apache License)
> * PROJ4J (Apache license)
> * PROJ-JNI (MIT license)
> 
> PROJ-JNI is a binding to PROJ native library using Java Native Interface 
> (JNI). PROJ is the most well known map projection library, but it is 
> difficult to bundle native code in a Java application.
> 
> I'm not in a neutral position to said that, but I believe that Apache SIS is 
> the most powerful open source pure-Java referencing library. But it is 
> relatively big, about 4 Mb for the referencing module with its dependencies, 
> not counting the optional EPSG geodetic dataset (because not compatible with 
> Apache license). Apache SIS is not the library with the largest amount of map 
> projections (PROJ4J has more), but it handles some difficult problems and 
> scale well with three- or four-dimensional data (or more).
> 
> PROJ4J is a lightweight library which may be sufficient if data are mostly 
> two-dimensional (limited 3D support seems also possible) and if uncertainty 
> of a few metres in coordinate transformations (depending how datum shifts are 
> specified) is acceptable.
> 
> It is possible to write some code in an implementation-independent way using 
> GeoAPI interfaces, which aim to do what JDBC interfaces do for databases. 
> Apache SIS and PROJ-JNI are implementations of GeoAPI interfaces, so by using 
> those interfaces you can let users choose among those two implementations. I 
> think that GeoAPI wrappers could easily be contributed to PROJ4J as well if 
> there is a desire for that.
> 
> Regarding Geohash, if we are talking about the algorithm described at 
> https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS 
> supports also the Military Grid Reference System (MGRS), which can be seen as 
> another kind of geohash with better characteristics.
> 
> Regards,
> 
> Martin
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [Suggest] Add geo function to core

2023-01-16 Thread Martin Desruisseaux
Hello Mark

Indeed Sedona is surely a serious candidate. Maybe one aspect to take in 
consideration, depending how "core" the geospatial services would be, is that 
Sedona depends on a LGPL library (GeoTools, bundled separately) for map 
projections, Shapefile and GeoTIFF support. So those features could not be in 
core since category X dependencies shall be optional.

Regarding referencing by coordinates (including map projections), I'm aware of 
3 libraries having a license compatible with Apache:

* Apache SIS (Apache License)
* PROJ4J (Apache license)
* PROJ-JNI (MIT license)

PROJ-JNI is a binding to PROJ native library using Java Native Interface (JNI). 
PROJ is the most well known map projection library, but it is difficult to 
bundle native code in a Java application.

I'm not in a neutral position to said that, but I believe that Apache SIS is 
the most powerful open source pure-Java referencing library. But it is 
relatively big, about 4 Mb for the referencing module with its dependencies, 
not counting the optional EPSG geodetic dataset (because not compatible with 
Apache license). Apache SIS is not the library with the largest amount of map 
projections (PROJ4J has more), but it handles some difficult problems and scale 
well with three- or four-dimensional data (or more).

PROJ4J is a lightweight library which may be sufficient if data are mostly 
two-dimensional (limited 3D support seems also possible) and if uncertainty of 
a few metres in coordinate transformations (depending how datum shifts are 
specified) is acceptable.

It is possible to write some code in an implementation-independent way using 
GeoAPI interfaces, which aim to do what JDBC interfaces do for databases. 
Apache SIS and PROJ-JNI are implementations of GeoAPI interfaces, so by using 
those interfaces you can let users choose among those two implementations. I 
think that GeoAPI wrappers could easily be contributed to PROJ4J as well if 
there is a desire for that.

Regarding Geohash, if we are talking about the algorithm described at 
https://en.wikipedia.org/wiki/Geohash, then SIS already supports it. SIS 
supports also the Military Grid Reference System (MGRS), which can be seen as 
another kind of geohash with better characteristics.

Regards,

Martin

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [Suggest] Add geo function to core

2023-01-16 Thread Mo Sarwat
Mark,

There is already another Apache project (namely Apache Sedona) that provides 
comprehensive support of geospatial operations in Spark. Please check it out:

Github: https://github.com/apache/sedona
Website: https://sedona.apache.org

Please feel free to contribute more geospatial functions to Sedona too!

Regards,
-Mo

On 2023/01/06 18:03:38 Mark Andreev wrote:
> Hi,
> 
> I suggest adding geographical functions to Apache Core like Clickhouse (
> https://clickhouse.com/docs/en/sql-reference/functions/geo/).
> 
> - Geographical Coordinates Functions
> - Geohash Functions
> - H3 Indexes
> - S2 Indexes
> 
> What do you think? What is current policy about core evolution? Should we
> create a separate module (standalone repository out of apache) and after
> success merge into the main branch?
> 
> --
> Best regards,
> Mark Andreev
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org