[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #95106 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95106/testReport)** for PR 22063 at commit [`e7abb67`](https://github.com/apache/spark/commit/e7abb67bcb66b21a41818e435b8ec848df62edd8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2446/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211985668 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + +## Deploying +The `spark-avro` module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, + +./bin/spark-submit --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +For experimenting on `spark-shell`, you can also use `--packages` to add `org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its dependencies directly, + +./bin/spark-shell --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Load and Save Functions + +Since `spark-avro` module is external, there is no `.avro` API in +`DataFrameReader` or `DataFrameWriter`. + +To load/save data in Avro format, you need to specify the data source option `format` as `avro`(or `org.apache.spark.sql.avro`). + + +{% highlight scala %} + +val usersDF = spark.read.format("avro").load("examples/src/main/resources/users.avro") +usersDF.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight java %} + +Dataset usersDF = spark.read().format("avro").load("examples/src/main/resources/users.avro"); +usersDF.select("name", "favorite_color").write().format("avro").save("namesAndFavColors.avro"); + +{% endhighlight %} + + +{% highlight python %} + +df = spark.read.format("avro").load("examples/src/main/resources/users.avro") +df.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight r %} + +df <- read.df("examples/src/main/resources/users.avro", "avro") +write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro") + +{% endhighlight %} + + + +## to_avro() and from_avro() +Spark SQL provides function `to_avro` to encode a struct as a string and `from_avro()` to retrieve the struct as a complex type. + +Using Avro record as columns are useful when reading from or writing to a streaming source like Kafka. Each +Kafka key-value record will be augmented with some metadata, such as the ingestion timestamp into Kafka, the offset in Kafka, etc. +* If the "value" field that contains your data is in Avro, you could use `from_avro()` to extract your data, enrich it, clean it, and then push it downstream to Kafka again or write it out to a file. +* `to_avro()` can be used to turn structs into Avro records. This method is particularly useful when you would like to re-encode multiple columns into a single one when writing data out to Kafka. + +Both methods are presently only available in Scala and Java. --- End diff -- Do not use `presently`, we should say `As of Spark 2.4, ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211985537 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- @mgaido91 How can i say "no" to more testing :-) ? I will add the tests. Thanks !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 AFAIC, the fix should forbid illegal extra value passing. If less values than fields it should get a `AttributeError` while accessing as the currently implement, not ban it here? What do you think :) @HyukjinKwon @BryanCutler Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211985059 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + +## Deploying +The `spark-avro` module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, + +./bin/spark-submit --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +For experimenting on `spark-shell`, you can also use `--packages` to add `org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its dependencies directly, + +./bin/spark-shell --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Load and Save Functions + +Since `spark-avro` module is external, there is no `.avro` API in +`DataFrameReader` or `DataFrameWriter`. + +To load/save data in Avro format, you need to specify the data source option `format` as `avro`(or `org.apache.spark.sql.avro`). + + +{% highlight scala %} + +val usersDF = spark.read.format("avro").load("examples/src/main/resources/users.avro") +usersDF.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight java %} + +Dataset usersDF = spark.read().format("avro").load("examples/src/main/resources/users.avro"); +usersDF.select("name", "favorite_color").write().format("avro").save("namesAndFavColors.avro"); + +{% endhighlight %} + + +{% highlight python %} + +df = spark.read.format("avro").load("examples/src/main/resources/users.avro") +df.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight r %} + +df <- read.df("examples/src/main/resources/users.avro", "avro") +write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro") + +{% endhighlight %} + + + +## to_avro() and from_avro() +Spark SQL provides function `to_avro` to encode a struct as a string and `from_avro()` to retrieve the struct as a complex type. --- End diff -- `encode a struct as a string`, I think it's not "string", but "binary"? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211984616 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + +## Deploying +The `spark-avro` module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, + +./bin/spark-submit --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +For experimenting on `spark-shell`, you can also use `--packages` to add `org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its dependencies directly, + +./bin/spark-shell --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Load and Save Functions + +Since `spark-avro` module is external, there is no `.avro` API in +`DataFrameReader` or `DataFrameWriter`. + +To load/save data in Avro format, you need to specify the data source option `format` as `avro`(or `org.apache.spark.sql.avro`). + + +{% highlight scala %} + +val usersDF = spark.read.format("avro").load("examples/src/main/resources/users.avro") +usersDF.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight java %} + +Dataset usersDF = spark.read().format("avro").load("examples/src/main/resources/users.avro"); +usersDF.select("name", "favorite_color").write().format("avro").save("namesAndFavColors.avro"); + +{% endhighlight %} + + +{% highlight python %} + +df = spark.read.format("avro").load("examples/src/main/resources/users.avro") +df.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight r %} + +df <- read.df("examples/src/main/resources/users.avro", "avro") +write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro") + +{% endhighlight %} + + + +## to_avro() and from_avro() +Spark SQL provides function `to_avro` to encode a struct as a string and `from_avro()` to retrieve the struct as a complex type. --- End diff -- not "Spark SQL", it should be "The Avro package" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95105/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95105/testReport)** for PR 22121 at commit [`8da8250`](https://github.com/apache/spark/commit/8da82506e06e36d63bf91fdda194a866f2d977ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22181 good catch! LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22165: [SPARK-25017][Core] Add test suite for BarrierCoordinato...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22165 My pleasure, just find this during glance over jira in recent days. :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211979620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- yes, thanks, but that doesn't test when the outer values are null, right? I think it would be good to have also cases with: - more than 2 attributes; - with the outer values being null; - complex data types involved (eg. structs) What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 FYI I've implemented the support of "repeatable" RDD action in my local branch. It needs to add a new parameter to the public `SparkContext#runJob`, so I'm a little hesitant to push it. Please let me know if you have different ideas. thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95105/testReport)** for PR 22121 at commit [`8da8250`](https://github.com/apache/spark/commit/8da82506e06e36d63bf91fdda194a866f2d977ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2445/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22157 The failure in OrcQuerySuite looks legitimate. It's because it corrupts the third file of three, then sets the reader to not ignore corrupt files, but never actually reads the third file now with this change. I think that might be a good thing. @dongjoon-hyun do you have an opinion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211971929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- @mgaido91 >> I don't see any test (please correct me if I am wrong) where multiple attributes are used as output of the subquery. Can we add and compare with other RDBMS? Thanks. In [here](https://github.com/apache/spark/blob/844a3ff82a688e7398bb130a44750aec78420698/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/nested-not-in.sql#L113-L134) ? Is this what you meant ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22179: [SPARK-23131][BUILD] Upgrade Kryo to 4.0.2
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22179 That looks like a major version bump -- the usual question here -- what are the key changes we need, what are possible incompatible changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22112 > how does the user then tell spark that the result stage becomes repeatable because they did the checkpoint? There are 2 concepts here: 1. The random level of the RDD computing function (see my PR description). There are 3 random levels: IDEMPOTENT, RANDOM_ORDER, COMPLETE_RANDOM. e.g. file reading is IDEMPOTENT, shuffle fetching is RANDOM_ORDER, shuffle fetching + repartition/zip is COMPLETE_RANDOM. Spark only needs to retry the succeeding stages if we retry a stage which is COMPLETE_RANDOM. 2. Whether the result stage is repeatable. e.g. "collect" is repeatable, writing with hadoop output committer is not. For concept 1, it's a property of RDD, so users can specify it by implementing a custom RDD, or marking the RDD map function as order-sensitive(e.g. `zip`). This PR does not design proper public APIs for it. For concept 2, it's a property of the RDD action. Users usually don't need to specify it, as we will specify it for each RDD action. e.g. `collect` is repeatable. `saveAsHadoopDataset` is not. Spark only fails the job if the RDD is COMPLETE_RANDOM (shuffle + repartition/zip), and the action is not repeatable. If users checkpoint the RDD before repartition/zip(e.g. shuffle + checkpoint + repartition/zip), then the RDD becomes IDEMPOTENT(see ) and Spark will not fail the job even if the action is not repeatable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211969009 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- to be able to see Not(In) first before (In) ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20345 **[Test build #95104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95104/testReport)** for PR 20345 at commit [`39462fb`](https://github.com/apache/spark/commit/39462fbee952ec574b4c04d7718fd73bb5f56d9d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21770: [SPARK-24806][SQL] Brush up generated code so that JDK c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21770 **[Test build #95103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95103/testReport)** for PR 21770 at commit [`5a70a7c`](https://github.com/apache/spark/commit/5a70a7cb33c6fbdf114b39fc8f0196b8d01f8582). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211961738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { --- End diff -- why did you change this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2444/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21770: [SPARK-24806][SQL] Brush up generated code so that JDK c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2443/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21770: [SPARK-24806][SQL] Brush up generated code so that JDK c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21770 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20345 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21770: [SPARK-24806][SQL] Brush up generated code so that JDK c...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21770 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/21546#discussion_r211964996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala --- @@ -183,34 +178,106 @@ private[sql] object ArrowConverters { } /** - * Convert a byte array to an ArrowRecordBatch. + * Load a serialized ArrowRecordBatch. */ - private[arrow] def byteArrayToBatch( + private[arrow] def loadBatch( batchBytes: Array[Byte], allocator: BufferAllocator): ArrowRecordBatch = { -val in = new ByteArrayReadableSeekableByteChannel(batchBytes) -val reader = new ArrowFileReader(in, allocator) - -// Read a batch from a byte stream, ensure the reader is closed -Utils.tryWithSafeFinally { - val root = reader.getVectorSchemaRoot // throws IOException - val unloader = new VectorUnloader(root) - reader.loadNextBatch() // throws IOException - unloader.getRecordBatch -} { - reader.close() -} +val in = new ByteArrayInputStream(batchBytes) +MessageSerializer.deserializeRecordBatch( + new ReadChannel(Channels.newChannel(in)), allocator) // throws IOException } + /** + * Create a DataFrame from a JavaRDD of serialized ArrowRecordBatches. + */ private[sql] def toDataFrame( - payloadRDD: JavaRDD[Array[Byte]], + arrowBatchRDD: JavaRDD[Array[Byte]], schemaString: String, sqlContext: SQLContext): DataFrame = { -val rdd = payloadRDD.rdd.mapPartitions { iter => +val schema = DataType.fromJson(schemaString).asInstanceOf[StructType] +val timeZoneId = sqlContext.sessionState.conf.sessionLocalTimeZone +val rdd = arrowBatchRDD.rdd.mapPartitions { iter => val context = TaskContext.get() - ArrowConverters.fromPayloadIterator(iter.map(new ArrowPayload(_)), context) + ArrowConverters.fromBatchIterator(iter, schema, timeZoneId, context) } -val schema = DataType.fromJson(schemaString).asInstanceOf[StructType] sqlContext.internalCreateDataFrame(rdd, schema) } + + /** + * Read a file as an Arrow stream and parallelize as an RDD of serialized ArrowRecordBatches. + */ + private[sql] def readArrowStreamFromFile( + sqlContext: SQLContext, + filename: String): JavaRDD[Array[Byte]] = { +val fileStream = new FileInputStream(filename) +try { + // Create array so that we can safely close the file + val batches = getBatchesFromStream(fileStream.getChannel).toArray + // Parallelize the record batches to create an RDD + JavaRDD.fromRDD(sqlContext.sparkContext.parallelize(batches, batches.length)) +} finally { + fileStream.close() +} + } + + /** + * Read an Arrow stream input and return an iterator of serialized ArrowRecordBatches. + */ + private[sql] def getBatchesFromStream(in: SeekableByteChannel): Iterator[Array[Byte]] = { + +// Create an iterator to get each serialized ArrowRecordBatch from a stream +new Iterator[Array[Byte]] { + var batch: Array[Byte] = readNextBatch() + + override def hasNext: Boolean = batch != null + + override def next(): Array[Byte] = { +val prevBatch = batch +batch = readNextBatch() +prevBatch + } + + def readNextBatch(): Array[Byte] = { +val msgMetadata = MessageSerializer.readMessage(new ReadChannel(in)) +if (msgMetadata == null) { + return null +} + +// Get the length of the body, which has not be read at this point +val bodyLength = msgMetadata.getMessageBodyLength.toInt + +// Only care about RecordBatch data, skip Schema and unsupported Dictionary messages +if (msgMetadata.getMessage.headerType() == MessageHeader.RecordBatch) { + + // Create output backed by buffer to hold msg length (int32), msg metadata, msg body + val bbout = new ByteBufferOutputStream(4 + msgMetadata.getMessageLength + bodyLength) --- End diff -- Add a comment that this is the deserialized form of an Arrow Record Batch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211961021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- makes sense, thanks for your answer @dilipbiswal --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211959406 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + +## Deploying +The `spark-avro` module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, + +./bin/spark-submit --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +For experimenting on `spark-shell`, you can also use `--packages` to add `org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its dependencies directly, + +./bin/spark-shell --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Load and Save Functions + +Since `spark-avro` module is external, there is not such API as `.avro` in --- End diff -- there is no '.avro' API in --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22112 > I'm proposing an option 3: > Retry all the tasks of all the succeeding stages if a stage with repartition/zip failed. All RDD actions should tell Spark if it's "repeatable", which becomes a property of the result stage. When we retry a result stage that has several tasks finished, if the result stage is "repeatable" (e.g. collect), retry it. If the result stage is not "repeatable", fail the job with the error message to ask users to checkpoint the RDD before repartition/zip. how does the user then tell spark that the result stage becomes repeatable because they did the checkpoint? Add an option to the api? Or does Spark automatically try to figure that out?I'm still a bit hesitant about making our long term solution that these operations aren't resilient, but I as long as the user can make them resilient perhaps its ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r211955605 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() + val inConditions = values.zip(sub.output).map(EqualTo.tupled) + val nullAwareJoinConds = inConditions.map(c => Or(c, IsNull(c))) --- End diff -- @mgaido91 Thanks !! Actually i have been thinking about it for last few days :-). We probably need a new optimizer rule that simplifies the join conditions based on its child's constraints. So we should be able to simplify - ``` SQL select * from t1 join t2 on (t1c1 = t2c1 OR isnull(t1c1 = t2c1) where t1c1 is not null and t2c1 is not null ``` to ```SQL select * from t1 join t2 on (t1c1 = t2c1) where t1c1 is not null and t2c1 is not null I wanted to handle it as a follow-up. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22163: [SPARK-25166][CORE]Reduce the number of write ope...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/22163#discussion_r211954019 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -206,14 +211,21 @@ private void writeSortedFile(boolean isLastFile) { long recordReadPosition = recordOffsetInPage + uaoSize; // skip over record length while (dataRemaining > 0) { final int toTransfer = Math.min(diskWriteBufferSize, dataRemaining); -Platform.copyMemory( - recordPage, recordReadPosition, writeBuffer, Platform.BYTE_ARRAY_OFFSET, toTransfer); -writer.write(writeBuffer, 0, toTransfer); +if (bufferOffset > 0 && bufferOffset + toTransfer > DISK_WRITE_BUFFER_SIZE) { --- End diff -- Not a bad idea, but codes here may not work as you expect. If we got a record with size `X` < `diskWriteBufferSize `(same as `DISK_WRITE_BUFFER_SIZE `), then we will only call `writer.write()` once. And if we got a record with size `Y` >= `diskWriteBufferSize `, then we will call `writer.write()` for (`Y` + `diskWriteBufferSize ` - 1) / `diskWriteBufferSize` times. And this do not change with the new code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...
Github user appleyuchi commented on the issue: https://github.com/apache/spark/pull/18099 the following occur to me when I run lab with ALS in spark 8/08/22 21:24:14 ERROR Utils: Uncaught exception in thread stdout writer for python j**ava.lang.AssertionError: assertion failed: Block rdd_7_0 is not locked for reading** at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299) at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170) Exception in thread "stdout writer for python" java.lang.AssertionError: assertion failed: Block rdd_7_0 is not locked for reading at scala.Predef$.assert(Predef.scala:170) at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299) at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:769) at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:540) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:44) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:33) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:213) at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:407) at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:215) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1991) at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:170) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18099: [SPARK-18406][CORE][Backport-2.1] Race between end-of-ta...
Github user appleyuchi commented on the issue: https://github.com/apache/spark/pull/18099 it this fix available to spark2.3.1? thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211940709 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,377 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + +## Deploying +The `spark-avro` module is external and not included in `spark-submit` or `spark-shell` by default. + +As with any Spark applications, `spark-submit` is used to launch your application. `spark-avro_{{site.SCALA_BINARY_VERSION}}` +and its dependencies can be directly added to `spark-submit` using `--packages`, such as, + +./bin/spark-submit --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +For experimenting on `spark-shell`, you can also use `--packages` to add `org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}` and its dependencies directly, + +./bin/spark-shell --packages org.apache.spark:spark-avro_{{site.SCALA_BINARY_VERSION}}:{{site.SPARK_VERSION_SHORT}} ... + +See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Load and Save Functions + +Since `spark-avro` module is external, there is not such API as `.avro` in +`DataFrameReader` or `DataFrameWriter`. + +To load/save data in Avro format, you need to specify the data source option `format` as `avro`(or `org.apache.spark.sql.avro`). + + +{% highlight scala %} + +val usersDF = spark.read.format("avro").load("examples/src/main/resources/users.avro") +usersDF.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight java %} + +Dataset usersDF = spark.read().format("avro").load("examples/src/main/resources/users.avro"); +usersDF.select("name", "favorite_color").write().format("avro").save("namesAndFavColors.avro"); + +{% endhighlight %} + + +{% highlight python %} + +df = spark.read.format("avro").load("examples/src/main/resources/users.avro") +df.select("name", "favorite_color").write.format("avro").save("namesAndFavColors.avro") + +{% endhighlight %} + + +{% highlight r %} + +df <- read.df("examples/src/main/resources/users.avro", "avro") +write.df(select(df, "name", "favorite_color"), "namesAndFavColors.avro", "avro") + +{% endhighlight %} + + + +## to_avro() and from_avro() +Spark SQL provides function `to_avro` to encode a struct as a string and `from_avro()` to retrieve the struct as a complex type. + +Using Avro record as columns are useful when reading from or writing to a streaming source like Kafka. Each +Kafka key-value record will be augmented with some metadata, such as the ingestion timestamp into Kafka, the offset in Kafka, etc. +* If the "value" field that contains your data is in Avro, you could use `from_avro()` to extract your data, enrich it, clean it, and then push it downstream to Kafka again or write it out to a file. +* `to_avro()` can be used to turn structs into Avro records. This method is particularly useful when you would like to re-encode multiple columns into a single one when writing data out to Kafka. + +Both methods are presently only available in Scala and Java. + + + +{% highlight scala %} +import org.apache.spark.sql.avro._ + +// `from_avro` requires Avro schema in JSON string format. +val jsonFormatSchema = new String(Files.readAllBytes(Paths.get("./examples/src/main/resources/user.avsc"))) + +val df = spark + .readStream + .format("kafka") + .option("kafka.bootstrap.servers", "host1:port1,host2:port2") + .option("subscribe", "topic1") + .load() + +// 1. Decode the Avro data into a struct; +// 2. Filter by column `favorite_color`; +// 3. Encode the column `name` in Avro format. +val output = df + .select(from_avro('value, jsonFormatSchema) as 'user) + .where("user.favorite_color == \"red\"") + .select(to_avro($"user.name") as 'value) + +val ds = output + .writeStream + .format("kafka") + .option("kafka.bootstrap.servers", "host1:port1,host2:port2") + .option("topic", "topic2") + .start() + +{% endhighlight %} + + +{% highlight java %} +import org.apache.spark.sql.avro.* + +// `from_avro` requires Avro schema in JSON string format. +String jsonFormatSchema = new
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17400 **[Test build #95102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95102/testReport)** for PR 17400 at commit [`5482b1b`](https://github.com/apache/spark/commit/5482b1be6308ddf7e77dc25c0bdfca3ede2d61a7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2442/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95096/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #95096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95096/testReport)** for PR 22112 at commit [`2a88a47`](https://github.com/apache/spark/commit/2a88a473f036c2da3612f3e53e17d1c05dff4458). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22182: [SPARK-25184][SS] Fixed race condition in StreamExecutio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22182 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22182: [SPARK-25184][SS] Fixed race condition in StreamExecutio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95098/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22182: [SPARK-25184][SS] Fixed race condition in StreamExecutio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22182 **[Test build #95098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95098/testReport)** for PR 22182 at commit [`319990f`](https://github.com/apache/spark/commit/319990ff60ad7b6fad6fd0cea5cada0b22e3f3c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22181 cc @zsxwing @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22181 **[Test build #95101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95101/testReport)** for PR 22181 at commit [`77e108a`](https://github.com/apache/spark/commit/77e108a18788502d05b1b3dacc21c3e72eac4264). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22181 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22181 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2441/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95094/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16478 **[Test build #95094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95094/testReport)** for PR 16478 at commit [`8b83ec7`](https://github.com/apache/spark/commit/8b83ec7242fe44847485c0591c90bc41dbdfea4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16478 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22181 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21546 **[Test build #95093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95093/testReport)** for PR 21546 at commit [`89d7836`](https://github.com/apache/spark/commit/89d78364d93490b1b301c5ec766e4390bdc0b8a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BarrierTaskContext(TaskContext):` * `class BarrierTaskInfo(object):` * `case class StateStoreCustomSumMetric(name: String, desc: String) extends StateStoreCustomMetric` * `sealed trait StreamingAggregationStateManager extends Serializable ` * `abstract class StreamingAggregationStateManagerBaseImpl(` * `class StreamingAggregationStateManagerImplV1(` * `class StreamingAggregationStateManagerImplV2(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream format for c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95093/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #95095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95095/testReport)** for PR 22163 at commit [`f91e18c`](https://github.com/apache/spark/commit/f91e18c7d4b8eab53c4983320a0eab0403c37a48). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95095/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22141 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95091/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22141 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22141 **[Test build #95091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95091/testReport)** for PR 22141 at commit [`844a3ff`](https://github.com/apache/spark/commit/844a3ff82a688e7398bb130a44750aec78420698). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22181 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22181 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95088/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22181: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22181 **[Test build #95088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95088/testReport)** for PR 22181 at commit [`77e108a`](https://github.com/apache/spark/commit/77e108a18788502d05b1b3dacc21c3e72eac4264). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22184: [SPARK-25132][SQL][DOC] Add migration doc for case-insen...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22184 @gatorsmile Could you kindly help trigger Jenkins and review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22153 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95089/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22153: [SPARK-23034][SQL] Show RDD/relation names in RDD/In-Mem...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22153 **[Test build #95089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95089/testReport)** for PR 22153 at commit [`da76a1b`](https://github.com/apache/spark/commit/da76a1beb31e972b41b7015e666bc1ee4e18007f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17400 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95097/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17400: [SPARK-19981][SQL] Respect aliases in output partitionin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17400 **[Test build #95097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95097/testReport)** for PR 17400 at commit [`91809e5`](https://github.com/apache/spark/commit/91809e5942e5f90c802234f815593ccec92a0c54). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait AliasAwareOutputPartitioning extends UnaryExecNode ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95100/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95100/testReport)** for PR 22121 at commit [`d9c5352`](https://github.com/apache/spark/commit/d9c5352c8ffc70d271a8aa68c3ffec41b4158ece). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21931 cc @cloud-fan @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/21860 cc @cloud-fan @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22184: [SPARK-25132][SQL][DOC] Add migration doc for case-insen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22184 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22184: [SPARK-25132][SQL][DOC] Add migration doc for case-insen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22184 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...
GitHub user seancxmao opened a pull request: https://github.com/apache/spark/pull/22184 [SPARK-25132][SQL][DOC] Add migration doc for case-insensitive field resolution when reading from Parquet ## What changes were proposed in this pull request? #22148 introduces a behavior change. We need to document it in the migration guide. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/seancxmao/spark SPARK-25132-DOC Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22184.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22184 commit eae8a3c98f146765d25bbf529421ce3c7a92639b Author: seancxmao Date: 2018-08-22T09:17:55Z [SPARK-25132][SQL][DOC] Case-insensitive field resolution when reading from Parquet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22184: [SPARK-25132][SQL][DOC] Add migration doc for case-insen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22184 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95100/testReport)** for PR 22121 at commit [`d9c5352`](https://github.com/apache/spark/commit/d9c5352c8ffc70d271a8aa68c3ffec41b4158ece). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2440/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95099/testReport)** for PR 22121 at commit [`d2681ec`](https://github.com/apache/spark/commit/d2681ec51a7dbc0296800cdbedb3d46827bf2b6f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95099/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22171 So this is an issue only related to `Dataset.show`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22121 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2439/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [SPARK-25133][SQL][Doc]Avro data source guide
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22121 @srowen @tgravescs @gatorsmile @HyukjinKwon @dongjoon-hyun Thanks for the reviews! I have added section `to_avro() and from_avro()` and `Compatibility with Databricks spark-avro`. Also attach html file for preview, please check it in PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22121: [WIP][SPARK-25133][SQL][Doc]Avro data source guide
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22121 **[Test build #95099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95099/testReport)** for PR 22121 at commit [`d2681ec`](https://github.com/apache/spark/commit/d2681ec51a7dbc0296800cdbedb3d46827bf2b6f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22121: [WIP][SPARK-25133][SQL][Doc]Avro data source guid...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22121#discussion_r211875231 --- Diff: docs/avro-data-source-guide.md --- @@ -0,0 +1,260 @@ +--- +layout: global +title: Apache Avro Data Source Guide +--- + +* This will become a table of contents (this text will be scraped). +{:toc} + +Since Spark 2.4 release, [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html) provides built-in support for reading and writing Apache Avro data. + --- End diff -- @tgravescs I have add an independent section for it :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 yep fair - the intent I think was clarity rather than necessarily perf: it's misleading to have a method named 'nan safe' which has no special handling of nans. I'll look at opening a different PR which could increase clarity/may have minor perf benefit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22171: [SPARK-25177][SQL] When dataframe decimal type co...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/22171#discussion_r211867603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -197,7 +197,7 @@ final class Decimal extends Ordered[Decimal] with Serializable { } } - override def toString: String = toBigDecimal.toString() + override def toString: String = toBigDecimal.bigDecimal.toPlainString() --- End diff -- I don't recall anything that is relevant:) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org