[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/22514 [SPARK-25271][SQL] Hive ctas commands should use data source if it is convertible ## What changes were proposed in this pull request? We have a [regression](https://github.com/apache/spark/pull/20521/files#r217254430) since 2.3.1 that Hive ctas command only uses Hive Serde to write data. Hive ctas command previously will use Parquet/Orc data source to write data if it is convertible. Because of it, the related regression reported by this JIRA is when writing a empty map in to Hive using ctas, it hits Hive's known issue and is thrown exception. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-25271-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22514.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22514 commit 5debc6096ae6e505d3386fd7eb569d154f158d55 Author: Liang-Chi Hsieh Date: 2018-09-12T10:33:53Z Hive ctas commands should use data source format if it is convertible. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18544 Can you explain how do we fix the problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96389/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22163 **[Test build #96389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96389/testReport)** for PR 22163 at commit [`2dc94a2`](https://github.com/apache/spark/commit/2dc94a24ab06141768413dc2bf6f9c5e29ce7249). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/22497 @kiszk @wangyum Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22508 LGTM, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22316 **[Test build #96404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96404/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22471: [SPARK-25469][SQL][Performance] Eval methods of Concat, ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22471 away from keyboard now, so will do when Iâm back. Thank! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96401/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22458 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96399/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22511 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96394/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96400/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96402/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96397/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22467 **[Test build #96400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96400/testReport)** for PR 22467 at commit [`813d19c`](https://github.com/apache/spark/commit/813d19c63477b82a76bdd0d1da73cf3cb1d38564). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96403/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22316 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96404/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22458 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22511 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22511 **[Test build #96394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96394/testReport)** for PR 22511 at commit [`aee82ab`](https://github.com/apache/spark/commit/aee82abe4cd9fbefa14fb280644276fe491bcf9a). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22513 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3332/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96406/testReport)** for PR 22494 at commit [`1ee9f02`](https://github.com/apache/spark/commit/1ee9f0208a3cb6de373e05366c19bf69967eecd8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22512 **[Test build #96405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96405/testReport)** for PR 22512 at commit [`39c5e92`](https://github.com/apache/spark/commit/39c5e92713b86f342e756591235f9cbe25126f90). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified// Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22512 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219402870 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- respecting `spark.sql.repl.eagerEval.maxNumRows` somewhat makes sense. but instead of changing `showDF` which has other cases beyond eagerEval, we could change where `showDF` is called by `show`, and pass the max rows. here https://github.com/apache/spark/pull/22455/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R249 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22514 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22455: [SPARK-24572][SPARKR] "eager execution" for R shell, IDE
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22455 Let's also update the doc of `REPL_EAGER_EVAL_ENABLED` in `SQLConf`. After this patch, eager evaluation is not only supported in PySpark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22305 @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219413664 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala --- @@ -193,4 +193,29 @@ class UDFSuite } } } + + test("SPARK-21318: The correct exception message should be thrown " + +"if a UDF/UDAF has already been registered") { +val UDAFName = "empty" +val UDAFClassName = classOf[org.apache.spark.sql.hive.execution.UDAFEmpty].getCanonicalName + +withTempDatabase { dbName => --- End diff -- why do we have to test it inside a database? can't the default database work? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22471: [SPARK-25469][SQL][Performance] Eval methods of Concat, ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22471 @maropu Do you want to merge this as your first work as a committer? I think this can be merged into master/2.4 because this is a performance regression fix. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3328/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22467: [SPARK-25465][TEST] Refactor Parquet test suites in proj...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22467 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22494 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96392/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22509 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3330/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22515 **[Test build #96402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96402/testReport)** for PR 22515 at commit [`0f32b01`](https://github.com/apache/spark/commit/0f32b0170fe6295bfef604b5a679f9391b5ec78f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22508 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3331/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96403/testReport)** for PR 22513 at commit [`1c3c0f6`](https://github.com/apache/spark/commit/1c3c0f692d38b361f35017df3e999f7838e28e48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 I see. I will wait in other PRs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...
Github user stanzhai commented on the issue: https://github.com/apache/spark/pull/18544 @cloud-fan User's hive UDFs are registered in externalCatalog which not exists in functionRegistry. It will throws a NoSuchFunctionException when an exception is encountered while loading a hive UDF. But we should throw the original exception. So, I just fix the issue by: ``` if (functionRegistry.functionExists(funcName)) { throw error } else { ... } ``` changed to: ``` if (super.functionExists(name)) { throw error } else { ... } ``` The following is implementation of `super.functionExists` ``` def functionExists(name: FunctionIdentifier): Boolean = { val db = formatDatabaseName(name.database.getOrElse(getCurrentDatabase)) requireDbExists(db) functionRegistry.functionExists(name) || externalCatalog.functionExists(db, name.funcName) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96403/testReport)** for PR 22513 at commit [`1c3c0f6`](https://github.com/apache/spark/commit/1c3c0f692d38b361f35017df3e999f7838e28e48). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22513 **[Test build #96397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96397/testReport)** for PR 22513 at commit [`9288933`](https://github.com/apache/spark/commit/9288933b4a71e646e67f551dcfd80f9ff9a470da). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22316 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #96401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96401/testReport)** for PR 22514 at commit [`5debc60`](https://github.com/apache/spark/commit/5debc6096ae6e505d3386fd7eb569d154f158d55). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22513 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22458 **[Test build #96399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96399/testReport)** for PR 22458 at commit [`f3d3100`](https://github.com/apache/spark/commit/f3d3100399be442da9fd5e417aeefb9662903c49). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22515 **[Test build #96402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96402/testReport)** for PR 22515 at commit [`0f32b01`](https://github.com/apache/spark/commit/0f32b0170fe6295bfef604b5a679f9391b5ec78f). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22316 **[Test build #96404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96404/testReport)** for PR 22316 at commit [`382640b`](https://github.com/apache/spark/commit/382640be9bb9739929daea0bceb3093836d7f78d). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219406170 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- Yes, you're right, my suggestion doesn't work in a case like that, sorry. We would need to make it checking recursively. But I think you got the idea of what I am proposing here. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22509 **[Test build #96391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96391/testReport)** for PR 22509 at commit [`8ad50d5`](https://github.com/apache/spark/commit/8ad50d5433ac5a0f888fb5909893317002d5aa51). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22508 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22509 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96391/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22331: [SPARK-25331][SS] Make FileStreamSink ignore part...
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/22331#discussion_r219399313 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StagingFileCommitProtocol.scala --- @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import org.apache.hadoop.fs.{FileAlreadyExistsException, FileContext, Path} +import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext} + +import org.apache.spark.internal.Logging +import org.apache.spark.internal.io.FileCommitProtocol +import org.apache.spark.internal.io.FileCommitProtocol.TaskCommitMessage + +class StagingFileCommitProtocol(jobId: String, path: String) + extends FileCommitProtocol with Serializable with Logging + with ManifestCommitProtocol { + private var stagingDir: Option[Path] = None --- End diff -- Looks like you're using Option but always call `.get` without any checking. In `setupTask` it is fine since assignment is placed in there, but in `newTaskTempFile` we may be better to guard with `require` which achieves fail-fast and let `.get` always succeed later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22512: [SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures w...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22512 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user seancxmao commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219403696 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Yes, we need this. Otherwise, Spark will read column `d` values as Catalyst type timestamp, which will fail the test. https://user-images.githubusercontent.com/12194089/45865915-9e730800-bdb1-11e8-9a42-a1394c601166.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22442#discussion_r219403620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3611,6 +3611,20 @@ object functions { */ def schema_of_json(e: Column): Column = withExpr(new SchemaOfJson(e.expr)) + /** + * Parses a column containing a JSON string and infers its schema using options. + * + * @param e a string column containing JSON data. + * @param options JSON datasource options that control JSON parsing and type inference. --- End diff -- As I see we don't fail . Simple example is if `multiLine` is enabled, `lineSep` is ignored. There are another examples. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22275 got it. so the size of the each batch could grow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 this seems like a big change, will we hit perf regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219412088 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1440,6 +1441,8 @@ abstract class SessionCatalogSuite extends AnalysisTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) +// SPARK-21318: the error message should contains the current database name --- End diff -- what's the full error message? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22513 Please also explain which module(core or sql?) these benchmark classes should be, in the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219414758 --- Diff: core/src/main/scala/org/apache/spark/sql/execution/benchmark/BenchmarkBase.scala --- @@ -15,7 +15,7 @@ * limitations under the License. */ -package org.apache.spark.util +package org.apache.spark.sql.execution.benchmark --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22494 **[Test build #96392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96392/testReport)** for PR 22494 at commit [`1ee9f02`](https://github.com/apache/spark/commit/1ee9f0208a3cb6de373e05366c19bf69967eecd8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22497 Thanks! merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22461: [SPARK-25453][SQL][TEST] OracleIntegrationSuite I...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22461#discussion_r219392779 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -462,6 +464,9 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo .option("lowerBound", "2018-07-04 03:30:00.0") .option("upperBound", "2018-07-27 14:11:05.0") .option("numPartitions", 2) + .option("oracle.jdbc.mapDateToTimestamp", "false") --- End diff -- Do we need this line? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22509 lgtm, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayBenchmar...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22497 Congratulation, @kiszk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #96401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96401/testReport)** for PR 22514 at commit [`5debc60`](https://github.com/apache/spark/commit/5debc6096ae6e505d3386fd7eb569d154f158d55). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22163: [SPARK-25166][CORE]Reduce the number of write operations...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22513 > KryoBenchmark is in core, and UnsafeProjectionBenchmark, HashByteArrayBenchmark and HashBenchmark are in catalyst. If we move the benchmark base class to sql, benchmarks mentioned above would not be able to inherit from the benchmark base class. What do you think? @wangyum The cases you mentioned are currently using the `org.apache.spark.sql.execution.benchmark.BenchmarkBase` (the old one), so it seems fine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3329/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219397646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -27,7 +27,7 @@ import org.apache.spark.sql.functions.monotonically_increasing_id import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.internal.SQLConf.ParquetOutputTimestampType import org.apache.spark.sql.types.{ByteType, Decimal, DecimalType, TimestampType} -import org.apache.spark.util.{Benchmark, BenchmarkBase => FileBenchmarkBase, Utils} +import org.apache.spark.util.Utils /** * Benchmark to measure read performance with Filter pushdown. --- End diff -- Thanks, I have updated the doc ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219397495 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelperSuite.scala --- @@ -35,6 +36,13 @@ class ExpressionEvalHelperSuite extends SparkFunSuite with ExpressionEvalHelper val e = intercept[RuntimeException] { checkEvaluation(BadCodegenExpression(), 10) } assert(e.getMessage.contains("some_variable")) } + + test("SPARK-25388: checkEvaluation should fail if nullable in DataType is incorrect") { +val e = intercept[RuntimeException] { + checkEvaluation(MapIncorrectDataTypeExpression(), Map(3 -> 7, 6 -> null)) --- End diff -- The your first is correct since this patch addresses only codegen-on case. We can add another code to address codegen-off case. Regarding the your second point, have we ever distingished a wrong output from a bad written UT when we defect the difference between `expression` and `expected`. I think that the distinguishment is nice to have, but not mandatory to have. I have one question about your approach: ``` assert(containsNull(expected) && isNullable(expression.dataType)) ``` Since the above two conditions evaluates `expected` and `expression` independently, how this works for the following case? I think that the assertion would be passed ``` expression: dataType = StructType(ArrayType(IntegerType, false), ArrayType(IntegerType, true)) Struct(Array(0, null), Array(1, 0)) expected: Struct(Array(0, 0), Array(1, null)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22494 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22455: [SPARK-24572][SPARKR] "eager execution" for R she...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22455#discussion_r219404319 --- Diff: R/pkg/R/DataFrame.R --- @@ -244,11 +245,15 @@ setMethod("showDF", #' @note show(SparkDataFrame) since 1.4.0 setMethod("show", "SparkDataFrame", function(object) { -cols <- lapply(dtypes(object), function(l) { - paste(l, collapse = ":") -}) -s <- paste(cols, collapse = ", ") -cat(paste(class(object), "[", s, "]\n", sep = "")) +if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", "false")[[1]], "true")) { --- End diff -- I see. The document generated looks like https://spark.apache.org/docs/latest/api/R/index.html. Then rewriting the description is good and clear. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22500: [SPARK-25488][TEST] Refactor MiscBenchmark to use...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22500#discussion_r219404392 --- Diff: sql/core/benchmarks/MiscBenchmark-results.txt --- @@ -0,0 +1,132 @@ + +filter & aggregate without group + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +range/filter/sum:Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +range/filter/sum wholestage off 36618 / 41080 57.3 17.5 1.0X +range/filter/sum wholestage on2495 / 2609840.4 1.2 14.7X + + + +range/limit/sum + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +range/limit/sum: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +range/limit/sum wholestage off 117 / 121 4477.9 0.2 1.0X +range/limit/sum wholestage on 178 / 187 2938.1 0.3 0.7X + + + +sample + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +sample with replacement: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +sample with replacement wholestage off9142 / 9182 14.3 69.8 1.0X +sample with replacement wholestage on 5926 / 6107 22.1 45.2 1.5X + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +sample without replacement: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +sample without replacement wholestage off 1834 / 1837 71.5 14.0 1.0X +sample without replacement wholestage on 784 / 803167.2 6.0 2.3X + + + +collect + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +collect: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +collect 1 million 186 / 215 5.6 177.5 1.0X +collect 2 millions 361 / 393 2.9 344.2 0.5X +collect 4 millions 884 / 1053 1.2 843.4 0.2X + + + +collect limit + + +Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 +Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz + +collect limit: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + +collect limit 1 million206 / 225 5.1 196.6 1.0X +collect limit 2 millions 407 / 419 2.6 387.8 0.5X + + + +generate explode +
[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22494 > If your argument is, picking a precise precision for literal is an individual featue and not related to #20023, I'm OK to create a new config for it. Yes this is - I think - a better option. Indeed, what I meant was this: let's imagine I am a Spark 2.3.0 user and I have `DECIMAL_OPERATIONS_ALLOW_PREC_LOSS` turned to `false`. Before this patch, I can successfully run `select 1234567891 / (1.1 * 2 * 2 * 2 * 2)`. After this patch, this query would return `null` instead, as an overflow would happen. So this patch is "correcting" a regression from 2.2 but it is introducing another one from 2.3.0-2.3.1. Using another config is therefore a better workaround IMO. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22515 **[Test build #96412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96412/testReport)** for PR 22515 at commit [`0f32b01`](https://github.com/apache/spark/commit/0f32b0170fe6295bfef604b5a679f9391b5ec78f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsingNonempt...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22515 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219414641 --- Diff: core/src/main/scala/org/apache/spark/sql/execution/benchmark/Benchmark.scala --- @@ -15,7 +15,7 @@ * limitations under the License. */ -package org.apache.spark.util +package org.apache.spark.sql.execution.benchmark --- End diff -- this class is in core and we should not have `sql` in the package name --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Gently ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22497: [SPARK-25487][SQL][TEST] Refactor PrimitiveArrayB...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22497 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22515: [SPARK-19724][SQL] allowCreatingManagedTableUsing...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22515 [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix One more legacy config to go ... You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark allowCreatingManagedTableUsingNonemptyLocation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22515.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22515 commit f7c372e6f803c86e189e984fa6c1dd81f84454e9 Author: Reynold Xin Date: 2018-09-21T02:10:10Z [SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation should have legacy prefix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org