[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95971/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 **[Test build #95971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95971/testReport)** for PR 22376 at commit [`4a0cffb`](https://github.com/apache/spark/commit/4a0cffb3ce9e1bede43e6a89fdd7a7b912bf93d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22390: [SPARK-25402][SQL] Null handling in BooleanSimplificatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22390 **[Test build #95976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95976/testReport)** for PR 22390 at commit [`61b2d55`](https://github.com/apache/spark/commit/61b2d551b19755c741b527a6f3578c3a46c544c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22390: [SPARK-25402][SQL] Null handling in BooleanSimplificatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22390 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22390: [SPARK-25402][SQL] Null handling in BooleanSimplificatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3036/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22394 Hey @mallman, let's just target to fix the problem in the JIRA without other refactorings. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216903678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class ParquetSchemaPruningSuite checkAnswer(query.orderBy("id"), Row(1) :: Nil) } - private def testMixedCasePruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "true") { - test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) + private def testExactCasePruning(testName: String)(testThunk: => Unit) { +test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", +SQLConf.CASE_SENSITIVE.key -> "true") { +withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Parquet-mr reader - case-insensitive parser - mixed-case schema - $testName") { +test(s"Parquet-mr reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", +SQLConf.CASE_SENSITIVE.key -> "true") { withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Spark vectorized reader - case-insensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) +testMixedCasePruning(testName)(testThunk) + } + + private def testMixedCasePruning(testName: String)(testThunk: => Unit) { --- End diff -- `testMixedCasePruning` looks previously testing case sensitive case too. Now, it looks not. Would you mind if I ask the reason why? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22390: [SPARK-25402][SQL] Null handling in BooleanSimplificatio...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22390 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216903560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class ParquetSchemaPruningSuite checkAnswer(query.orderBy("id"), Row(1) :: Nil) } - private def testMixedCasePruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "true") { - test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) + private def testExactCasePruning(testName: String)(testThunk: => Unit) { --- End diff -- This looks testing case insensitivity too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22388 Incorrect import still may exist there. ``` [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala:40: object SupportsDeprecatedScanRow is not a member of package org.apache.spark.sql.sources.v2.reader [error] import org.apache.spark.sql.sources.v2.reader.{InputPartition, InputPartitionReader, SupportsDeprecatedScanRow} [error]^ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22400 **[Test build #95975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95975/testReport)** for PR 22400 at commit [`9c9178b`](https://github.com/apache/spark/commit/9c9178b919edc0ebc3d1c68edd8c9be5c4abe27f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3035/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22394 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95970/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22394 **[Test build #95970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95970/testReport)** for PR 22394 at commit [`c759aea`](https://github.com/apache/spark/commit/c759aeabc8b3fb3c426e432bff794deddef3e05e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216896517 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java --- @@ -72,11 +74,7 @@ public void stop() { @Override public synchronized void disconnect() { if (connection != null && connection.isOpen()) { - try { -connection.close(); - } catch (IOException ioe) { -// no-op. - } + IOUtils.closeQuietly(connection); --- End diff -- This library should not have any non-JRE dependencies. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22343 Compatibility is not a gold rule if it sacrifices correctness. Fast and **wrong** result doesn't looks like benefits to me. Do you think the customer want to get a wrong result like Hive? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22344 **[Test build #95974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95974/testReport)** for PR 22344 at commit [`534e982`](https://github.com/apache/spark/commit/534e9824f6ecfa8cb04f5eb0757ff45fc448cce1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22357 LGTM from me too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3034/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22344 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95972/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22344 **[Test build #95972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95972/testReport)** for PR 22344 at commit [`534e982`](https://github.com/apache/spark/commit/534e9824f6ecfa8cb04f5eb0757ff45fc448cce1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3033/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22400 **[Test build #95973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95973/testReport)** for PR 22400 at commit [`fc4b49e`](https://github.com/apache/spark/commit/fc4b49ed784581a85f5742a701046a1ca8e4d32e). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22400 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95973/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22400#discussion_r216886496 --- Diff: python/pyspark/sql/functions.py --- @@ -283,7 +283,8 @@ def approxCountDistinct(col, rsd=None): @since(2.1) def approx_count_distinct(col, rsd=None): -"""Aggregate function: returns a new :class:`Column` for approximate distinct count of column `col`. +"""Aggregate function: returns a new :class:`Column` for approximate distinct count of --- End diff -- Line too long --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22400 **[Test build #95973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95973/testReport)** for PR 22400 at commit [`fc4b49e`](https://github.com/apache/spark/commit/fc4b49ed784581a85f5742a701046a1ca8e4d32e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22400#discussion_r216886472 --- Diff: python/pyspark/ml/feature.py --- @@ -303,7 +303,7 @@ def _create_model(self, java_model): class BucketedRandomProjectionLSHModel(LSHModel, JavaMLReadable, JavaMLWritable): -""" +r""" --- End diff -- A few docstrings have backslash or backticks in them. This should make sure they don't have surprising effects some day. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22400#discussion_r216886428 --- Diff: dev/run-tests-jenkins.py --- @@ -115,7 +115,7 @@ def run_tests(tests_timeout): os.path.join(SPARK_HOME, 'dev', 'run-tests')]).wait() failure_note_by_errcode = { -1: 'executing the `dev/run-tests` script', # error to denote run-tests script failures +1: 'executing the dev/run-tests script', # error to denote run-tests script failures --- End diff -- Back-ticks invoke repr or something, I think? not the intent here so I removed them to quiet the warning --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warni...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22400 [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 (This change is a subset of the changes needed for the JIRA; see https://github.com/apache/spark/pull/22231) ## What changes were proposed in this pull request? Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings. Also, fix a few long lines. ## How was this patch tested? Existing tests, and some manual double-checking of the behavior of regexes in Python 2/3 to be sure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-25238.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22400.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22400 commit fc4b49ed784581a85f5742a701046a1ca8e4d32e Author: Sean Owen Date: 2018-09-12T03:19:55Z Use raw strings and simpler regex syntax consistently in Python, which also avoids warnings from pycodestyle about accidentally relying Python's non-escaping of non-reserved chars in normal strings --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22400: [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22400 CC @cclauss @holdenk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22393: [MINOR][DOCS] Axe deprecated doc refs
Github user MichaelChirico commented on the issue: https://github.com/apache/spark/pull/22393 @srowen thanks; done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22393: Axe deprecated doc refs
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22393 CC @felixcheung I'd also prefix the title with `[MINOR][DOCS]` to match our conventions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22398: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22398 **[Test build #4336 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4336/testReport)** for PR 22398 at commit [`15edc21`](https://github.com/apache/spark/commit/15edc21325f4d6cd249626032aae621880aaf75a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22398: [SPARK-23820][CORE] Enable use of long form of ca...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22398#discussion_r216882861 --- Diff: docs/configuration.md --- @@ -746,6 +746,13 @@ Apart from these, the following properties are also available, and may be useful *Warning*: This will increase the size of the event log considerably. + + spark.eventLog.longForm.enabled + false + +Whether to use the long form of call sites in the event log. --- End diff -- This seems fine. If you have to change anything, I might say this as "If true, use the long form of call sites in the event log. Otherwise use the short form." Just to clarify what long form is an alternative to. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216882375 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractAppHandle.java --- @@ -72,11 +74,7 @@ public void stop() { @Override public synchronized void disconnect() { if (connection != null && connection.isOpen()) { - try { -connection.close(); - } catch (IOException ioe) { -// no-op. - } + IOUtils.closeQuietly(connection); --- End diff -- I wouldn't bother with this; we don't really do it consistently, it's not a JDK standard class, and it doesn't really save much, while adding to the dependency on commons/io --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216881904 --- Diff: common/network-common/src/test/java/org/apache/spark/network/ChunkFetchIntegrationSuite.java --- @@ -143,61 +143,62 @@ public void releaseBuffers() { } private FetchResult fetchChunks(List chunkIndices) throws Exception { -TransportClient client = clientFactory.createClient(TestUtils.getLocalHost(), server.getPort()); -final Semaphore sem = new Semaphore(0); - final FetchResult res = new FetchResult(); -res.successChunks = Collections.synchronizedSet(new HashSet()); -res.failedChunks = Collections.synchronizedSet(new HashSet()); -res.buffers = Collections.synchronizedList(new LinkedList()); -ChunkReceivedCallback callback = new ChunkReceivedCallback() { - @Override - public void onSuccess(int chunkIndex, ManagedBuffer buffer) { -buffer.retain(); -res.successChunks.add(chunkIndex); -res.buffers.add(buffer); -sem.release(); - } +try(TransportClient client = clientFactory.createClient(TestUtils.getLocalHost(), server.getPort())) { --- End diff -- Nit: space after try --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216882060 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleIntegrationSuite.java --- @@ -133,37 +133,37 @@ private FetchResult fetchBlocks( final Semaphore requestsRemaining = new Semaphore(0); -ExternalShuffleClient client = new ExternalShuffleClient(clientConf, null, false, 5000); -client.init(APP_ID); -client.fetchBlocks(TestUtils.getLocalHost(), port, execId, blockIds, - new BlockFetchingListener() { -@Override -public void onBlockFetchSuccess(String blockId, ManagedBuffer data) { - synchronized (this) { -if (!res.successBlocks.contains(blockId) && !res.failedBlocks.contains(blockId)) { - data.retain(); - res.successBlocks.add(blockId); - res.buffers.add(data); - requestsRemaining.release(); -} - } -} - -@Override -public void onBlockFetchFailure(String blockId, Throwable exception) { - synchronized (this) { -if (!res.successBlocks.contains(blockId) && !res.failedBlocks.contains(blockId)) { - res.failedBlocks.add(blockId); - requestsRemaining.release(); -} - } -} - }, null); - -if (!requestsRemaining.tryAcquire(blockIds.length, 5, TimeUnit.SECONDS)) { - fail("Timeout getting response from the server"); +try(ExternalShuffleClient client = new ExternalShuffleClient(clientConf, null, false, 5000)) { + client.init(APP_ID); + client.fetchBlocks(TestUtils.getLocalHost(), port, execId, blockIds, + new BlockFetchingListener() { --- End diff -- Indent is now too deep here. I have the same general kind of doubt here.. it's touching a lot of lines for little actual gain. Still, I'd like to be able to improve code a bit here and there. If this is only going to master and Spark 3, the back-port issue lessens, because it's more unlikely to backport from 3.x to 2.x. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216882288 --- Diff: core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java --- @@ -18,12 +18,7 @@ package org.apache.spark.launcher; import java.time.Duration; -import java.util.Arrays; -import java.util.ArrayList; -import java.util.HashMap; -import java.util.List; -import java.util.Map; -import java.util.Properties; +import java.util.*; --- End diff -- Don't collapse these --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216882035 --- Diff: common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolverSuite.java --- @@ -98,19 +98,19 @@ public void testSortShuffleBlocks() throws IOException { resolver.registerExecutor("app0", "exec0", dataContext.createExecutorInfo(SORT_MANAGER)); -InputStream block0Stream = - resolver.getBlockData("app0", "exec0", 0, 0, 0).createInputStream(); -String block0 = CharStreams.toString( -new InputStreamReader(block0Stream, StandardCharsets.UTF_8)); -block0Stream.close(); -assertEquals(sortBlock0, block0); - -InputStream block1Stream = - resolver.getBlockData("app0", "exec0", 0, 0, 1).createInputStream(); -String block1 = CharStreams.toString( -new InputStreamReader(block1Stream, StandardCharsets.UTF_8)); -block1Stream.close(); -assertEquals(sortBlock1, block1); +try(InputStream block0Stream = + resolver.getBlockData("app0", "exec0", 0, 0, 0).createInputStream()) { --- End diff -- Same comment about space above; I'd also indent the continuation 4 spaces for clarity --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22399: [SPARK-25408] Move to mode ideomatic Java8
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22399#discussion_r216881870 --- Diff: common/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java --- @@ -383,7 +383,7 @@ public void testRefWithIntNaturalKey() throws Exception { LevelDBSuite.IntKeyType i = new LevelDBSuite.IntKeyType(); i.key = 1; i.id = "1"; -i.values = Arrays.asList("1"); +i.values = Collections.singletonList("1"); --- End diff -- I don't think this sort of thing is worth changing. I know, IntelliJ suggests it. Unless the mutability is an issue, I'd leave it as a shorter idiom. I wouldn't use static imports here personally. there's also a very small cost of changes in that they create potential merge conflicts for other changes later, so I tend to have a very low but finite minimum bar for value of code scrubbing like this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216881788 --- Diff: docs/sql-programming-guide.md --- @@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession `parquet.compression` is specified in the table-specific options/properties, the precedence would be `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. +Note that `zstd` needs to install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs to install +`brotliCodec`. --- End diff -- If the link looks expected to be rather permanent, it's fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22054: [SPARK-24703][SQL]: To add support to multiply CalendarI...
Github user priyankagargnitk commented on the issue: https://github.com/apache/spark/pull/22054 PLease review this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22343: [SPARK-25391][SQL] Make behaviors consistent when conver...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22343 It keeps Hive compatibility but loses performance benefit by setting spark.sql.hive.convertMetastoreParquet=false. We can do better by enabling the conversion and still keeping Hive compatibility. Though this makes our implementation more complex, I guess most end users may keep `spark.sql.hive.convertMetastoreParquet=true` and `spark.sql.caseSensitive=false` which are default values, this brings benefits to end users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3032/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3031/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3031/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22376 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new execut...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/22288 As I mentioned at https://github.com/apache/spark/pull/22288#discussion_r216874530, I'm quite worry about this killing behaviour. I thik we should kill a executor iff it is idle. By looking through dissuction above, give my thoughts below: * with dynamic allocation Maybe, we can add `onTaskCompletelyBlacklisted()` method in DA manager's `Listener` and pass a e.g. `TaskCompletelyBlacklistedEvent` to it. Thus, DA manger will allocate new executor for us. * with static allocation Set `spark.scheduler.unschedulableTaskSetTimeout` for a `TaskSet`. If a task blacklisted completely, kill some executors iff they're idle (Maybe, taking executors' allocation time into acount here, we should increase timeout upperbound for a little for this `TaskSet`.). Then, waiting until to be scheduled or timeout&abort. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3031/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22386: [SPARK-25399][SS] Continuous processing state should not...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22386 Great thanks for your comment and fix @mukulmurthy! We'll also port this soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22379 see comment above/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22344 **[Test build #95972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95972/testReport)** for PR 22344 at commit [`534e982`](https://github.com/apache/spark/commit/534e9824f6ecfa8cb04f5eb0757ff45fc448cce1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216875875 --- Diff: R/pkg/R/functions.R --- @@ -3720,3 +3720,22 @@ setMethod("current_timestamp", jc <- callJStatic("org.apache.spark.sql.functions", "current_timestamp") column(jc) }) + +#' @details +#' \code{from_csv}: Parses a column containing a CSV string into a Column of \code{structType} +#' with the specified \code{schema}. +#' If the string is unparseable, the Column will contain the value NA. +#' +#' @rdname column_collection_functions +#' @param schema a DDL-formatted string +#' @aliases from_csv from_csv,Column,character-method +#' +#' @note from_csv since 3.0.0 +setMethod("from_csv", signature(x = "Column", schema = "character"), + function(x, schema, ...) { --- End diff -- here https://github.com/apache/spark/blob/d2bfd9430f05d006accdecb6a62ed659fbd6a2f8/R/pkg/R/functions.R#L199 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216875804 --- Diff: R/pkg/R/functions.R --- @@ -3720,3 +3720,22 @@ setMethod("current_timestamp", jc <- callJStatic("org.apache.spark.sql.functions", "current_timestamp") column(jc) }) + +#' @details +#' \code{from_csv}: Parses a column containing a CSV string into a Column of \code{structType} +#' with the specified \code{schema}. +#' If the string is unparseable, the Column will contain the value NA. +#' +#' @rdname column_collection_functions +#' @param schema a DDL-formatted string +#' @aliases from_csv from_csv,Column,character-method +#' +#' @note from_csv since 3.0.0 +setMethod("from_csv", signature(x = "Column", schema = "character"), + function(x, schema, ...) { --- End diff -- no no, this will break - I am referring to find the original doc `@rdname column_collection_functions` that has `...` already documented, and then add this in --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22344 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22376 **[Test build #95971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95971/testReport)** for PR 22376 at commit [`4a0cffb`](https://github.com/apache/spark/commit/4a0cffb3ce9e1bede43e6a89fdd7a7b912bf93d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216875341 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class ParquetSchemaPruningSuite checkAnswer(query.orderBy("id"), Row(1) :: Nil) } - private def testMixedCasePruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "true") { - test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) + private def testExactCasePruning(testName: String)(testThunk: => Unit) { +test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", +SQLConf.CASE_SENSITIVE.key -> "true") { +withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Parquet-mr reader - case-insensitive parser - mixed-case schema - $testName") { +test(s"Parquet-mr reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", +SQLConf.CASE_SENSITIVE.key -> "true") { withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Spark vectorized reader - case-insensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) +testMixedCasePruning(testName)(testThunk) + } + + private def testMixedCasePruning(testName: String)(testThunk: => Unit) { --- End diff -- testCaseInSensitivePruning? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216875288 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class ParquetSchemaPruningSuite checkAnswer(query.orderBy("id"), Row(1) :: Nil) } - private def testMixedCasePruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "true") { - test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) + private def testExactCasePruning(testName: String)(testThunk: => Unit) { --- End diff -- testCaseSensitivePruning? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216875122 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class ParquetSchemaPruningSuite checkAnswer(query.orderBy("id"), Row(1) :: Nil) } - private def testMixedCasePruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "true") { - test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) + private def testExactCasePruning(testName: String)(testThunk: => Unit) { +test(s"Spark vectorized reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", +SQLConf.CASE_SENSITIVE.key -> "true") { +withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Parquet-mr reader - case-insensitive parser - mixed-case schema - $testName") { +test(s"Parquet-mr reader - case-sensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", +SQLConf.CASE_SENSITIVE.key -> "true") { withMixedCaseData(testThunk) } } -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true", - SQLConf.CASE_SENSITIVE.key -> "false") { - test(s"Spark vectorized reader - case-insensitive parser - mixed-case schema - $testName") { - withMixedCaseData(testThunk) +testMixedCasePruning(testName)(testThunk) + } + + private def testMixedCasePruning(testName: String)(testThunk: => Unit) { +test(s"Parquet-mr reader - case-insensitive parser - mixed-case schema - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false", --- End diff -- The vectorized reader cases in both `testSchemaPruning` and `testExactCasePruning` are put ahead of Parquet-mr reader cases. Shall we follow it too in `testMixedCasePruning`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22376: [SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark....
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22376 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95966/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22344 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22344 **[Test build #95966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95966/testReport)** for PR 22344 at commit [`534e982`](https://github.com/apache/spark/commit/534e9824f6ecfa8cb04f5eb0757ff45fc448cce1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22288: [SPARK-22148][SPARK-15815][Scheduler] Acquire new...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/22288#discussion_r216874530 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -414,9 +425,48 @@ private[spark] class TaskSchedulerImpl( launchedAnyTask |= launchedTaskAtCurrentMaxLocality } while (launchedTaskAtCurrentMaxLocality) } + if (!launchedAnyTask) { - taskSet.abortIfCompletelyBlacklisted(hostToExecutors) -} + taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match { +case taskIndex: Some[Int] => // Returns the taskIndex which was unschedulable + + // If the taskSet is unschedulable we kill an existing blacklisted executor/s and + // kick off an abortTimer which after waiting will abort the taskSet if we were + // unable to schedule any task from the taskSet. + // Note: We keep a track of schedulability on a per taskSet basis rather than on a + // per task basis. + val executor = hostToExecutors.valuesIterator.next().iterator.next() --- End diff -- I'm wondering is it worth to kill someone executor which has some tasks running on it ? After all, a task blaklisted on all executors(currently allocated) can not be guaranteed to run on a new allocated executor. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r216874181 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -156,20 +156,24 @@ class ParquetSchemaPruningSuite } private def testSchemaPruning(testName: String)(testThunk: => Unit) { -withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true") { - test(s"Spark vectorized reader - without partition data column - $testName") { +test(s"Spark vectorized reader - without partition data column - $testName") { + withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "true") { --- End diff -- Yea, the call of `test` only registers the test function, it is not actually invoked within the `withSQLConf`. We shouldn't wrap `test` inside `withSQLConf`. Good catch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCode...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22358#discussion_r216873177 --- Diff: docs/sql-programming-guide.md --- @@ -965,6 +965,8 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession `parquet.compression` is specified in the table-specific options/properties, the precedence would be `compression`, `parquet.compression`, `spark.sql.parquet.compression.codec`. Acceptable values include: none, uncompressed, snappy, gzip, lzo, brotli, lz4, zstd. +Note that `zstd` needs to install `ZStandardCodec` before Hadoop 2.9.0, `brotli` needs to install +`brotliCodec`. --- End diff -- @HyukjinKwon How about adding a link? Users may not know where to download it. ``` `brotliCodec` -> [`brotli-codec`](https://github.com/rdblue/brotli-codec) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22394 **[Test build #95970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95970/testReport)** for PR 22394 at commit [`c759aea`](https://github.com/apache/spark/commit/c759aeabc8b3fb3c426e432bff794deddef3e05e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22394 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3030/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22394 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22379 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22379 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95965/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22379 **[Test build #95965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95965/testReport)** for PR 22379 at commit [`2a0b65b`](https://github.com/apache/spark/commit/2a0b65b7774cfcfeab489795f980ed0e38d225ab). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22394 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95969/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22358 **[Test build #95969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95969/testReport)** for PR 22358 at commit [`64aef6b`](https://github.com/apache/spark/commit/64aef6ba6a0829bf490c6014521731b92630d716). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22358 **[Test build #95969 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95969/testReport)** for PR 22358 at commit [`64aef6b`](https://github.com/apache/spark/commit/64aef6ba6a0829bf490c6014521731b92630d716). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3029/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22358: [SPARK-25366][SQL]Zstd and brotli CompressionCodec are n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22358 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22348: [SPARK-25354][SQL] Reduce unneeded operation in nextKeyV...
Github user SongYadong commented on the issue: https://github.com/apache/spark/pull/22348 Sounds reasonable. I will close this PR. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22348: [SPARK-25354][SQL] Reduce unneeded operation in n...
Github user SongYadong closed the pull request at: https://github.com/apache/spark/pull/22348 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22388 **[Test build #95968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95968/testReport)** for PR 22388 at commit [`031ad29`](https://github.com/apache/spark/commit/031ad29305326145510b8065f49ae51109e18653). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95968/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nu...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r216868343 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,8 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa } } else { val lit = InternalRow(expected, expected) - val expectedRow = -UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit) + val dtAsNullable = expression.dataType.asNullable --- End diff -- Sure, add some comments from the description --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [WIP][SPARK-25388][Test][SQL] Detect incorrect nu...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r216868299 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,8 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks with PlanTestBa } } else { val lit = InternalRow(expected, expected) - val expectedRow = -UnsafeProjection.create(Array(expression.dataType, expression.dataType)).apply(lit) + val dtAsNullable = expression.dataType.asNullable + val expectedRow = UnsafeProjection.create(Array(dtAsNullable, dtAsNullable)).apply(lit) --- End diff -- Thank you for your comment. Do you think what test is preferable? * Successfully pass (we already have these UTs since all UTs have been passed) * Expectedly failure (In other words, add a function that generated incorrect result) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95967/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22388 **[Test build #95967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95967/testReport)** for PR 22388 at commit [`234b67a`](https://github.com/apache/spark/commit/234b67a99f2575ec14fc395e8b1a44cc018721c4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22388 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22388 **[Test build #95968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95968/testReport)** for PR 22388 at commit [`031ad29`](https://github.com/apache/spark/commit/031ad29305326145510b8065f49ae51109e18653). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org