[jira] [Commented] (SEDONA-36) Parquet Support in Sedona
[ https://issues.apache.org/jira/browse/SEDONA-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17348954#comment-17348954 ] Jia Yu commented on SEDONA-36: -- I believe this is the same issue as in SEDONA-14: https://issues.apache.org/jira/browse/SEDONA-14 This is due to a bug in Sedona UDT. It has been solved in 1.0.1. The release of v1.0.1 is in the voting phase of ASF incubator. Will be out in one or two weeks. > Parquet Support in Sedona > - > > Key: SEDONA-36 > URL: https://issues.apache.org/jira/browse/SEDONA-36 > Project: Apache Sedona > Issue Type: New Feature >Reporter: Swaminathan Balachandran >Priority: Normal > > Description: > Implement Parquet Readers & Writers for Spatial RDD & Dataframes in Sedona. > DOD: > Go Live -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (SEDONA-36) Parquet Support in Sedona
Swaminathan Balachandran created SEDONA-36: -- Summary: Parquet Support in Sedona Key: SEDONA-36 URL: https://issues.apache.org/jira/browse/SEDONA-36 Project: Apache Sedona Issue Type: New Feature Reporter: Swaminathan Balachandran Description: Implement Parquet Readers & Writers in Sedona. DOD: Go Live -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also a small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace the content of those with random numbers and strings (i.e., avoid anything with copy-right issues). [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into `sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636462051 ## File path: core/src/test/resources/points.json ## @@ -0,0 +1,7 @@ +{"type":"Feature","geometry":{"type":"Point","coordinates":[-88.1234,32.]},"properties":{"FIELD1":"testattribute0","FIELD4":"testattribute1","FIELD5":"testattribute2"}}, Review comment: This file contains some random numbers (so, no copy right issue here) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also a small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace the content of those with random numbers and strings (i.e., avoid anything with copy-right issues). [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into `sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also a small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace the content of those with random numbers and strings (i.e., avoid anything with copy-right issues). [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also a small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace the content of those with random numbers and attributes (i.e., avoid anything with copy-right issues). [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace the content of those with random numbers and attributes (i.e., avoid anything with copy-right issues). [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also small number of other data files that will be part of the R package for illustration purposes [*]. I think I can simply replace those ones with random numbers and avoid anything with copy-right issues. [*] Illustration purposes: because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636424058 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Sure. Will do. Besides test data, there are also small number of other data files that will be part of the R package for illustration purposes. I think I can simply replace those ones with random numbers and avoid anything with copy-right issues. Because nowadays CRAN requires R packages to have runnable examples, and part of `sparklyr.sedona` involves reading spatial files of various formats, a small number of example data files will need to be packaged into sparklyr.sedona`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [VOTE] Release Apache Sedona 1.0.1-incubating-rc1
1. You probably shouldn't put .md doc on release share - if I understand the suggestion is to include it into the src.tar.gz (and into the source repo) so that they are git tagged and packaged together 2. Do you have java artifacts that should be staged? 3. checklist incubating in name signature and hash fine DISCLAIMER is fine LICENSE and NOTICE are fine No unexpected binary files All source have ASF headers 4. what is /docs/archive for? On Wed, May 19, 2021 at 2:41 PM Adam Binford wrote: > +1 (non-binding) > > On Wed, May 19, 2021 at 12:58 AM Jia Yu wrote: > > > Hi all, > > > > This is a call for vote on Apache Sedona 1.0.1-incubating-rc1. Please > refer > > to the changes listed at the bottom of this email. > > > > Release notes: > > > > > https://github.com/apache/incubator-sedona/blob/sedona-1.0.1-incubating-rc1/docs/download/release-notes.md > > > > Build instructions: > > > > > https://github.com/apache/incubator-sedona/blob/sedona-1.0.1-incubating-rc1/docs/download/compile.md > > > > GitHub tag: > > > > > https://github.com/apache/incubator-sedona/releases/tag/sedona-1.0.1-incubating-rc1 > > > > GPG public key to verify the Release: > > https://dist.apache.org/repos/dist/dev/incubator/sedona/KEYS > > > > Source code and binaries: > > > > > https://dist.apache.org/repos/dist/dev/incubator/sedona/1.0.1-incubating-rc1/ > > > > The vote will be open for at least 72 hours or until a majority of at > least > > 3 +1 PMC votes are cast > > > > Please vote accordingly: > > > > [ ] +1 approve > > > > [ ] +0 no opinion > > > > [ ] -1 disapprove with the reason > > > > Checklist for reference (because of DISCLAIMER-WIP, other checklist items > > are not blockers): > > > > [ ] Download links are valid. > > > > [ ] Checksums and PGP signatures are valid. > > > > [ ] DISCLAIMER is included. > > > > [ ] Source code artifacts have correct names matching the current > release. > > > > For a detailed checklist please refer to: > > > > > https://cwiki.apache.org/confluence/display/INCUBATOR/Incubator+Release+Checklist > > > > > > Changes according to the comments of Justin Mclean on the > 1.0.0-incubating > > release > > Original comment URL: > > > > > https://lists.apache.org/thread.html/r828873cbb2685dcfb0719680f3aac6dbf982720fcd9cd5f69a26ec55%40%3Cgeneral.incubator.apache.org%3E > > > > 1. There are some test files I think I like to know where they come from > > e.g. county_small_wkb.tsv and what license the contents are under. > > > > License for test data has been added to Sedona license > > > > > https://github.com/apache/incubator-sedona/blob/sedona-1.0.1-incubating-rc1/LICENSE > > > > 2. The LICENSE here seems odd why does it have "Copyright (c) 2019-2020, > > Apache Sedona” in it? > > > > The issue has been fixed. The license for zeppelin plugin has been added > to > > LICENSE as well. > > > > > https://github.com/apache/incubator-sedona/blob/sedona-1.0.1-incubating-rc1/LICENSE > > > > 3. Please put instructions on how to build in the release, instructions > can > > change over time so pointing to a URL may not be helpful when trying to > > build older versions. > > > > Build instruction has been added to svn/dist/dev > > > > > https://dist.apache.org/repos/dist/dev/incubator/sedona/1.0.1-incubating-rc1/ > > > > > -- > Adam Binford >
[GitHub] [incubator-sedona] lorenzwalthert commented on pull request #521: [SEDONA-31] R interface for Apache Sedona
lorenzwalthert commented on pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#issuecomment-845242643 @yitao-li great to see this happening. Maybe a good moment to run {styler} through it before merging this? I only noted a few deviations from the style guide though... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] yitao-li commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
yitao-li commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r636115842 ## File path: R/sparklyr.sedona/DESCRIPTION ## @@ -0,0 +1,29 @@ +Type: Package +Package: sparklyr.sedona +Title: Sparklyr Extension for Apache Sedona +Version: 0.1.0 +Authors@R: Review comment: If it's a mailing list then that should be fine. I imagine we will vote before starting a release of this R package to CRAN anyways. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] jiayuasu edited a comment on pull request #521: [SEDONA-31] R interface for Apache Sedona
jiayuasu edited a comment on pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#issuecomment-844830313 @yitao-li Thank you again for this great PR. Please see my comments above. I believe two current TO-DOs for this PR is (1) re-use existing test data in core/test/resources by referring to their relative paths. (2) Put the R docs in a separate PR and we may need to have some discussion on it. Moreover, once this R module gets merged into Apache Sedona, I would like to invite you to be a PMC member of Apache Sedona (of course, this needs to pass the community voting phase). You will be in charge of the R module in Sedona. Currently, we don't have any R experts in Sedona PMC. We already have several great members on board: @Sarwat and I are the initial architect of Sedona. @jinxuan @zongsizhang and @netanel246 contribute to Sedona architecture, Shapefile reader and serializer. @Imbruced is in charge of Sedona Python. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] jiayuasu commented on pull request #521: [SEDONA-31] R interface for Apache Sedona
jiayuasu commented on pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#issuecomment-844830313 @yitao-li Thank you again for this great PR. Please see my comments above. I believe two current TO-DOs for this PR is (1) re-use existing test data in core/test/resources by referring to their relative paths. (2) Put the R docs in a separate PR and we may need to have some discussion on it. Moreover, once this R module gets merged into Apache Sedona, I would like to invite you to be a PMC member of Apache Sedona (of course, this needs to pass the community voting phase). We already have several great members on board: @Sarwat and I are the initial architect of Sedona. @jinxuan @zongsizhang and @netanel246 contribute to Sedona architecture, Shapefile reader and serializer. @Imbruced is in charge of Sedona Python. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] jiayuasu commented on a change in pull request #521: [SEDONA-31] R interface for Apache Sedona
jiayuasu commented on a change in pull request #521: URL: https://github.com/apache/incubator-sedona/pull/521#discussion_r635837137 ## File path: .github/workflows/r.yml ## @@ -0,0 +1,129 @@ +name: R build + +on: + push: +branches: + - master + pull_request: +branches: + - '*' + +jobs: + build: + +runs-on: ubuntu-18.04 +strategy: + fail-fast: false + matrix: +include: + - name: 'Spark 3.0.1 (R release)' +r: 'release' +env: + SPARK_VERSION: '3.0.1' Review comment: Could you please upgrade the CI to support Spark 3.1.1? Please see our latest CI example https://github.com/apache/incubator-sedona/blob/master/.github/workflows/java.yml#L43 ## File path: R/sparklyr.sedona/DESCRIPTION ## @@ -0,0 +1,29 @@ +Type: Package +Package: sparklyr.sedona +Title: Sparklyr Extension for Apache Sedona +Version: 0.1.0 +Authors@R: Review comment: Could you change the maintainer name to Apache Sedona? See here: https://github.com/apache/incubator-sedona/blob/master/python/setup.py#L33 ## File path: R/sparklyr.sedona/inst/extdata/arealm-tiny.csv ## @@ -0,0 +1,5 @@ +testattribute0,-88.331492,32.324142,testattribute1,testattribute2 Review comment: Can you re-use the test data provided in https://github.com/apache/incubator-sedona/tree/master/core/src/test/resources, by referring to their relative path? Currently Sedona has too many test data files. It leads to some maintenance-wise and copyright-wise issues. We plan to reuse and remove some of them in the future. See [SEDONA-34](https://issues.apache.org/jira/browse/SEDONA-34) ## File path: R/sparklyr.sedona/man/sedona_read_geojson_to_typed_rdd.Rd ## @@ -0,0 +1,64 @@ +% Generated by roxygen2: do not edit by hand Review comment: For the documentation, please do not include generated docs in the PR. Please include the source code of docs in the PR. Generated R docs will be manually integrated with generated html files (based on incubator-sedona/docs/) and uploaded to https://github.com/apache/incubator-sedona-website/tree/asf-site I believe the R docs itself could be a separate PR since it requires some structure change or doc change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [incubator-sedona] jiayuasu commented on a change in pull request #528: [Sedona-27] Add ST_Subdivide and ST_SubdivideExplode functions.
jiayuasu commented on a change in pull request #528: URL: https://github.com/apache/incubator-sedona/pull/528#discussion_r635828960 ## File path: sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/subdivide/implicits.scala ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.spark.sql.sedona_sql.expressions.subdivide + +import org.locationtech.jts.geom.Geometry + Review comment: There is an implicits.scala in the directory that is one level up. It basically provides the same functionalities. Will you be able to merge them together? ## File path: docs/api/sql/Function.md ## @@ -612,3 +612,64 @@ Spark SQL example: ```SQL SELECT ST_MinimumBoundingCircle(ST_GeomFromText('POLYGON((1 1,0 0, -1 1, 1 1))')) ``` + +## ST_SubDivide + +Introduction: Returns list of geometries divided based of given maximum number of vertices. + +Format: `ST_SubDivide(geom: geometry, maxVertices: int)` + +Since: `v1.0.2` + +Spark SQL example: +```SQL +SELECT ST_SubDivide(ST_GeomFromText("POLYGON((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))"), 5) + +``` + +Output: +``` +[ +POLYGON((37.857142857142854 20, 35 10, 10 20, 37.857142857142854 20)), +POLYGON((15 20, 10 20, 15 40, 15 20)), +POLYGON((20 20, 15 20, 15 30, 20 30, 20 20)), +POLYGON((26.428571428571427 20, 20 20, 20 30, 26.4285714 23.5714285, 26.4285714 20)), +POLYGON((15 30, 15 40, 20 40, 20 30, 15 30)), +POLYGON((20 40, 26.4285714 40, 26.4285714 32.1428571, 20 30, 20 40)), +POLYGON((37.8571428 20, 30 20, 34.0476190 32.1428571, 37.8571428 32.1428571, 37.8571428 20)), +POLYGON((34.0476190 34.6825396, 26.4285714 32.1428571, 26.4285714 40, 34.0476190 40, 34.0476190 34.6825396)), +POLYGON((34.0476190 32.1428571, 35 35, 37.8571428 35, 37.8571428 32.1428571, 34.0476190 32.1428571)), +POLYGON((35 35, 34.0476190 34.6825396, 34.0476190 35, 35 35)), +POLYGON((34.0476190 35, 34.0476190 40, 37.8571428 40, 37.8571428 35, 34.0476190 35)), +POLYGON((30 20, 26.4285714 20, 26.4285714 23.5714285, 30 20)), +POLYGON((15 40, 37.8571428 43.8095238, 37.8571428 40, 15 40)), +POLYGON((45 45, 37.8571428 20, 37.8571428 43.8095238, 45 45)) +] +``` + +Spark SQL example: + +```SQL +SELECT ST_SubDivide(ST_GeomFromText("LINESTRING(0 0, 85 85, 100 100, 120 120, 21 21, 10 10, 5 5)"), 5) +``` + +Output: +``` +[ +LINESTRING(0 0, 5 5) +LINESTRING(5 5, 10 10) +LINESTRING(10 10, 21 21) +LINESTRING(21 21, 60 60) +LINESTRING(60 60, 85 85) +LINESTRING(85 85, 100 100) +LINESTRING(100 100, 120 120) +] +``` + +## ST_SubDivideExplode + +Introduction: It works the same as ST_SubDivide but returns new rows with geometries instead of list. + +Format: `ST_SubDivideExplode(geom: geometry, maxVertices: int)` Review comment: Can you add an example usage for this? Probably both in Spark Scala DSL and pure SQL "lateral view"? ## File path: sql/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/Functions.scala ## @@ -1122,4 +1123,48 @@ case class ST_FlipCoordinates(inputExpressions: Seq[Expression]) override def dataType: DataType = GeometryUDT override def children: Seq[Expression] = inputExpressions +} + + +case class ST_SubDivide(inputExpressions: Seq[Expression]) + extends Expression with CodegenFallback { + override def nullable: Boolean = true + + override def eval(input: InternalRow): Any = { +inputExpressions.validateLength(2) +val geometryRaw = inputExpressions.head +val maxVerticesRaw = inputExpressions(1) +geometryRaw.toGeometry(input) match { + case geom: Geometry => ArrayData.toArrayData( +GeometrySubDivider.subDivide(geom, maxVerticesRaw.toInt(input)).map(_.toGenericArrayData) + ) + case null => null +} + + } + + override def dataType: DataType = ArrayType(GeometryUDT) + + override def children: Seq[Expression] = inputExpressions +} + +case class ST_SubDivideExplode(children: Seq[Expression]) extends Generator { Review comment: I noticed that in Spark 3.1, the behavior of generator seems to be changed in "Lateral view". Can you add one test