[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99807/testReport)** for PR 22683 at commit [`87a9d5a`](https://github.com/apache/spark/commit/87a9d5ad1ebfbb9b247e95ead3e1a4c34ee08020). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23252 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5843/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23252: [SPARK-26239] File-based secret key loading for SASL.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23252 **[Test build #99808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99808/testReport)** for PR 23252 at commit [`957cb15`](https://github.com/apache/spark/commit/957cb15a2d48b4cf2b5c7f1a8c124df3a53bf4d9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/23104 @guoxiaolongzte good job --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23245 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99810/testReport)** for PR 22683 at commit [`6a3c58b`](https://github.com/apache/spark/commit/6a3c58b119ed298e1cab8d9a9b341a667a86c8f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)** for PR 22683 at commit [`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc fo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23238#discussion_r239708569 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -141,6 +141,8 @@ displayTitle: Spark SQL Upgrading Guide - In Spark version 2.3 and earlier, HAVING without GROUP BY is treated as WHERE. This means, `SELECT 1 FROM range(10) HAVING true` is executed as `SELECT 1 FROM range(10) WHERE true` and returns 10 rows. This violates SQL standard, and has been fixed in Spark 2.4. Since Spark 2.4, HAVING without GROUP BY is treated as a global aggregate, which means `SELECT 1 FROM range(10) HAVING true` will return only one row. To restore the previous behavior, set `spark.sql.legacy.parser.havingWithoutGroupByAsWhere` to `true`. + - In version 2.3 and earlier, when reading from a Parquet data source table, Spark always returns null for any column whose column names in Hive metastore schema and Parquet schema are in different letter cases, no matter whether `spark.sql.caseSensitive` is set to true or false. Since 2.4, when `spark.sql.caseSensitive` is set to false, Spark does case insensitive column name resolution between Hive metastore schema and Parquet schema, so even column names are in different letter cases, Spark returns corresponding column values. An exception is thrown if there is ambiguity, i.e. more than one Parquet column is matched. This change also applies to Parquet Hive tables when `spark.sql.hive.convertMetastoreParquet` is set to true. --- End diff -- Hi, @seancxmao . Maybe, the followings? ``` - `spark.sql.caseSensitive` is set to true or false + `spark.sql.caseSensitive` is set to `true` or `false` ``` ``` - `spark.sql.caseSensitive` is set to false + `spark.sql.caseSensitive` is set to `false` ``` ``` - `spark.sql.hive.convertMetastoreParquet` is set to true + `spark.sql.hive.convertMetastoreParquet` is set to `true` ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP] Add migration doc for case-...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23238 Thank you for adding this to the migration doc. cc @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23251 **[Test build #99802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99802/testReport)** for PR 23251 at commit [`b1e71ee`](https://github.com/apache/spark/commit/b1e71ee7a723d63f1cf3c0754f2372eb185439d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23251: [SPARK-26300][SS] The `checkForStreaming` mothod may be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99802/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)** for PR 23108 at commit [`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23239 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99801/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5845/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99809/testReport)** for PR 23245 at commit [`2e9b09c`](https://github.com/apache/spark/commit/2e9b09cc24c5ae877ff3b0fb9a769d24c05462ac). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowCollectSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99809/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23239 **[Test build #99801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99801/testReport)** for PR 23239 at commit [`84e3989`](https://github.com/apache/spark/commit/84e3989329da1e7bb8f26dc2ded7558ce6fd9b23). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23239 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22707: [SPARK-25717][SQL] Insert overwrite a recreated external...
Github user fjh100456 commented on the issue: https://github.com/apache/spark/pull/22707 Is there any more suggestions? @wangyum @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239690226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -22,13 +22,12 @@ import org.apache.spark.sql.types.{DataType, IntegerType} /** * Specifies how tuples that share common expressions will be distributed when a query is executed - * in parallel on many machines. Distribution can be used to refer to two distinct physical - * properties: - * - Inter-node partitioning of data: In this case the distribution describes how tuples are - *partitioned across physical machines in a cluster. Knowing this property allows some - *operators (e.g., Aggregate) to perform partition local operations instead of global ones. - * - Intra-partition ordering of data: In this case the distribution describes guarantees made - *about how tuples are distributed within a single partition. + * in parallel on many machines. + * + * Distribution here refers to inter-node partitioning of data: + * The distribution describes how tuples are partitioned across physical machines in a cluster. + * Knowing this property allows some operators (e.g., Aggregate) to perform partition local + * operations instead of global ones. */ --- End diff -- for ordering, I think people can look at `OrderedDistribution`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99797/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239693849 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -243,10 +248,19 @@ case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int) * Represents a partitioning where rows are split across partitions based on some total ordering of * the expressions specified in `ordering`. When data is partitioned in this manner the following --- End diff -- nit: add "," after "this manner". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23250: [SPARK-26298][BUILD] Upgrade Janino to 3.0.11
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23250 Thank you, @HyukjinKwon . Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5841/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99804/testReport)** for PR 23108 at commit [`d851169`](https://github.com/apache/spark/commit/d851169803861e24c3c251dcf936b4bf11a9c964). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23221 **[Test build #99798 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99798/testReport)** for PR 23221 at commit [`c9ab9bc`](https://github.com/apache/spark/commit/c9ab9bcc378168ff3430d8885899ccd74afe7b32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239698500 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val SUM_METRIC = "sum" private val SIZE_METRIC = "size" private val TIMING_METRIC = "timing" + private val NS_TIMING_METRIC = "nanosecond" --- End diff -- How about naming it as `NORMALIZE_TIMING_METRIC`, maybe it can be reused later for other timing metric which need normalize unit. If you think its strange name I'll change back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23072#discussion_r239701364 --- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R --- @@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", { expect_equal(length(local.posterior), sum(unlist(local.posterior))) }) +test_that("spark.assignClusters", { + df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), + list(1L, 2L, 1.0), list(3L, 4L, 1.0), + list(4L, 0L, 0.1)), schema = c("src", "dst", "weight")) + clusters <- spark.assignClusters(df, initMode = "degree", weightCol = "weight") + expected_result <- createDataFrame(list(list(4L, 1L), + list(0L, 0L), + list(1L, 0L), + list(3L, 1L), + list(2L, 0L)), + schema = c("id", "cluster")) --- End diff -- ditto for style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...
Github user wangjiaochun commented on a diff in the pull request: https://github.com/apache/spark/pull/23225#discussion_r239704796 --- Diff: core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java --- @@ -562,4 +562,18 @@ public void testPeakMemoryUsed() throws Exception { } } + @Test + public void writeEmptyIteratorNotCreateEmptySpillFile() throws Exception { +final UnsafeShuffleWriter writer = createWriter(true); +writer.write(Iterators.emptyIterator()); +final Option mapStatus = writer.stop(true); +assertTrue(mapStatus.isDefined()); +assertTrue(mergedOutputFile.exists()); +assertEquals(0, spillFilesCreated.size()); --- End diff -- I mean that before add code "if (sortedRecords.hasNext()) { return }" it will fail. now add assertEquals(0, spillFilesCreated.size()) to writeEmptyIterator seems good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/23218 do we need to relnote jvm compatibility? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23252#discussion_r239706529 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala --- @@ -16,10 +16,13 @@ */ package org.apache.spark.deploy.k8s.features -import scala.collection.JavaConverters._ +import java.io.File +import java.nio.charset.StandardCharsets +import java.nio.file.Files import io.fabric8.kubernetes.api.model._ import org.scalatest.BeforeAndAfter +import scala.collection.JavaConverters._ --- End diff -- ? Hi, @mccheah . We import `java.*` and `scala.*` before any others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20146 ping @dbtsai --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99811/testReport)** for PR 22683 at commit [`22e0589`](https://github.com/apache/spark/commit/22e0589b66b30110f0b579f4829339ee680fc93f). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99804/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23249 **[Test build #99812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99812/testReport)** for PR 23249 at commit [`04be19e`](https://github.com/apache/spark/commit/04be19e62caa8fd0365b4998e22cdcad846be6b8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99813/testReport)** for PR 22683 at commit [`bf150fb`](https://github.com/apache/spark/commit/bf150fb4bbc68627d19521a31a0d3a294d079862). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5846/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99815/testReport)** for PR 22683 at commit [`9dff9ee`](https://github.com/apache/spark/commit/9dff9eea09cbf3d5298bd6d261e1595cafaaae69). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23239 The change looks fine. Do we already have tests for cases 2 and 4? We know test for case 3 is [here](https://github.com/apache/spark/pull/23043). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22575: [SPARK-24630][SS] Support SQLStreaming in Spark
Github user sujith71955 commented on a diff in the pull request: https://github.com/apache/spark/pull/22575#discussion_r239500890 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -631,6 +631,33 @@ object SQLConf { .intConf .createWithDefault(200) + val SQLSTREAM_WATERMARK_ENABLE = buildConf("spark.sqlstreaming.watermark.enable") +.doc("Whether use watermark in sqlstreaming.") +.booleanConf +.createWithDefault(false) + + val SQLSTREAM_OUTPUTMODE = buildConf("spark.sqlstreaming.outputMode") +.doc("The output mode used in sqlstreaming") +.stringConf +.createWithDefault("append") + + val SQLSTREAM_TRIGGER = buildConf("spark.sqlstreaming.trigger") --- End diff -- so here stream-stream join is not supported right? to elaborate can i create two stream source tables and then join both and write to sink? because if i want to create two streams for 2 different topics, i may need to provide different configurations for watermark or window or rigger interval. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99774 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99774/testReport)** for PR 23241 at commit [`6dfa27a`](https://github.com/apache/spark/commit/6dfa27ad49fdaa52c8fb83a18238e9f724b9d550). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23239#discussion_r239507673 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -198,11 +198,45 @@ protected final void writeLong(long offset, long value) { Platform.putLong(getBuffer(), offset, value); } + // We need to take care of NaN and -0.0 in several places: + // 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + // treated as same. + // 2. In range partitioner, different NaNs should belong to the same partition, -0.0 and 0.0 --- End diff -- It turns out this is not a problem. The doc of `RangePartitioning` is misleading. I'm updating the doc at https://github.com/apache/spark/pull/23249 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239509724 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- That's what I'm wondering about. Is it actually desirable to not fail on a partial frame? I'm not sure. We *shouldn't* encounter it elsewhere. This changes a developer API, but may not even be a breaking change as there is a default implementation. We can take breaking changes in Spark 3 though. I think I agree with your approach here in the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5813/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23245 **[Test build #99768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99768/testReport)** for PR 23245 at commit [`021134c`](https://github.com/apache/spark/commit/021134cd2b6a0a82ef8ef36a5ce122bff397ab32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99768/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23245: [SPARK-26060][SQL][FOLLOW-UP] Rename the config name.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23245 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99778/testReport)** for PR 23241 at commit [`ff64adc`](https://github.com/apache/spark/commit/ff64adcdff37dee1e4ac14045c2cdb277d4acf4d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239525888 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala --- @@ -118,10 +118,12 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { case e: HaltReplayException => // Just stop replay. case _: EOFException if maybeTruncated => - case _: IOException if maybeTruncated => -logWarning(s"Failed to read Spark event log: $sourceName") case ioe: IOException => -throw ioe +if (maybeTruncated) { --- End diff -- I think this was already the behavior? if it doesn't match the 'if' it would just throw anyway --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23249 [SPARK-26297][SQL] improve the doc of Distribution/Partitioning ## What changes were proposed in this pull request? Some documents of `Distribution/Partitioning` are stale and misleading, this PR fixes them: 1. `ClusteredDistribution` doesn't have intra-partition requirement 2. `OrderedDistribution` does not require tuples that share the same value being colocated in the same partition. 3. `RangePartitioning` can provide a weaker guarantee for a prefix of its `ordering` expressions. ## How was this patch tested? comment-only PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23249.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23249 commit 24ea28abd5a385351703335df33b26838d203fe3 Author: Wenchen Fan Date: 2018-12-06T15:47:23Z improve the doc of Distribution/Partitioning --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23249 cc @maryannxue @hvanhovell @gatorsmile @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5811/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239508437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +116,13 @@ case class HashClusteredDistribution( /** * Represents data where tuples have been ordered according to the `ordering` - * [[Expression Expressions]]. This is a strictly stronger guarantee than - * [[ClusteredDistribution]] as an ordering will ensure that tuples that share the - * same value for the ordering expressions are contiguous and will never be split across - * partitions. + * [[Expression Expressions]]. + * + * Tuples that share the same values for the ordering expressions must be contiguous within a + * partition. They can also across partitions, but these partitions must be contiguous. For example, + * if value `v` is the biggest values in partition 3, it can also be in partition 4 as the smallest + * value. If all the values in partition 4 are `v`, it can also be in partition 5 as the smallest + * value. */ case class OrderedDistribution(ordering: Seq[SortOrder]) extends Distribution { --- End diff -- This is only used by sort, and sort doesn't require rows of same value to be colocated in the same partition. Actually we already use this knowledge to optimize `RangePartitioning.satisfy` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23249 **[Test build #99775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99775/testReport)** for PR 23249 at commit [`24ea28a`](https://github.com/apache/spark/commit/24ea28abd5a385351703335df33b26838d203fe3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239508488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +116,13 @@ case class HashClusteredDistribution( /** * Represents data where tuples have been ordered according to the `ordering` - * [[Expression Expressions]]. This is a strictly stronger guarantee than - * [[ClusteredDistribution]] as an ordering will ensure that tuples that share the - * same value for the ordering expressions are contiguous and will never be split across - * partitions. + * [[Expression Expressions]]. + * + * Tuples that share the same values for the ordering expressions must be contiguous within a + * partition. They can also across partitions, but these partitions must be contiguous. For example, + * if value `v` is the biggest values in partition 3, it can also be in partition 4 as the smallest + * value. If all the values in partition 4 are `v`, it can also be in partition 5 as the smallest + * value. */ case class OrderedDistribution(ordering: Seq[SortOrder]) extends Distribution { --- End diff -- This is only used by sort, and sort doesn't require rows of same value to be colocated in the same partition. Actually we already use this knowledge to optimize `RangePartitioning.satisfy` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5810/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23215 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99766/testReport)** for PR 23215 at commit [`25f7039`](https://github.com/apache/spark/commit/25f7039c6b836d40370b615d3d0259c9640dde4c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5812/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239516496 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- Thanks @srowen . > Is it actually desirable to not fail on a partial frame? I'm not sure. We shouldn't encounter it elsewhere. Yes. Ideally it shouldn't fail. Even for EventLoggingListener if the application is finished, the frame will close (That is why it is applicable for only running application). After analyzing again the zstd code, the impact seems lesser "Either throw exception or read the frame", and latter seems better. I can update the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99777/testReport)** for PR 23241 at commit [`7d6ad51`](https://github.com/apache/spark/commit/7d6ad5187542023943a5790096ff8d8927a06366). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5815/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23202: [SPARK-26248][SQL] Infer date type from CSV
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23202 I'd defer to @HyukjinKwon ; looks OK in broad strokes but he would know much more about the CSV parsing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23201 @cloud-fan May I ask you to look at this PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99770/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99776 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99776/testReport)** for PR 23215 at commit [`6f4e652`](https://github.com/apache/spark/commit/6f4e652add4157bfcdad4d7a924c74363f2b5cf2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99766/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5814/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23241: [SPARK-26283][CORE] Enable reading from open fram...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239521593 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { +new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- I have updated the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23202: [SPARK-26248][SQL] Infer date type from CSV
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23202 @HyukjinKwon @srowen Is there anything which worries you in the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23249 **[Test build #99779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99779/testReport)** for PR 23249 at commit [`3df1e44`](https://github.com/apache/spark/commit/3df1e446a8f9c9d04912856e617617c1ef7c8373). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5816/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/23228 cc @JoshRosen @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239547742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable { DecimalType(bigDecimal.precision, bigDecimal.scale) } decimalTry.getOrElse(StringType) - case VALUE_STRING => StringType + case VALUE_STRING => +val stringValue = parser.getText --- End diff -- `DateType` is not inferred at all but there is another type inference code that could be shared between JSON and CSV (maybe somewhere else). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99782/testReport)** for PR 23207 at commit [`d5ee249`](https://github.com/apache/spark/commit/d5ee2493478d11ba688172d4b27a15b18beaf559). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239548704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) --- End diff -- Reimplement done in a780b70. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23207 ``` Can we put the above in a closure and pass it into shuffle dependency? Then in SQL we just put the above in SQL using custom metrics. ``` Yea, the commit of a780b70 achieve this by adding `ShuffleWriteProcessor` abstract. And the read metrics rename reverted in 7d104eb, will do it and display change in another pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5819/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23159 cc @cloud-fan and @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22275 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239559037 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources which don't have a real catalog. Implementations must * have a public, 0-arg constructor. * - * The major responsibility of this interface is to return a {@link Table} for read/write. + * The major responsibility of this interface is to return a {@link Table} for read/write. If you + * want to allow end-users to write data to non-existing tables via write APIs in `DataFrameWriter` + * with `SaveMode`, you must return a {@link Table} instance even if the table doesn't exist. The + * table schema can be empty in this case. --- End diff -- What does it mean to write to a non-existing table? If you're writing somewhere, the table must exist. This is for creating a table directly from configuration and an implementation class in the DataFrameWriter API. The target of the write still needs to exist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send u...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 merged to master, thanks @holdenk @viirya and @felixcheung ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org