[GitHub] AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450544944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100544/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450544943 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450544944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100544/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450544943 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
SparkQA removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537610 **[Test build #100544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100544/testReport)** for PR 23398 at commit [`689a4d2`](https://github.com/apache/spark/commit/689a4d203dc2dec68c6b97c7d6529e90b094d31e). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
SparkQA commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450544874 **[Test build #100544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100544/testReport)** for PR 23398 at commit [`689a4d2`](https://github.com/apache/spark/commit/689a4d203dc2dec68c6b97c7d6529e90b094d31e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SimpleFunctionRegistry extends FunctionRegistry with Logging ` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23397: [SPARK-26495][SQL] Simplify the SelectedField extractor.
cloud-fan commented on issue #23397: [SPARK-26495][SQL] Simplify the SelectedField extractor. URL: https://github.com/apache/spark/pull/23397#issuecomment-450543864 looks much cleaner with this recursive version. +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType
twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType URL: https://github.com/apache/spark/pull/23400#issuecomment-450543202 @HyukjinKwon I ran the docker integration tests and they all passed ``` Run completed in 15 minutes, 45 seconds. Total number of tests run: 21 Suites: completed 5, aborted 0 Tests: succeeded 21, failed 0, canceled 0, ignored 5, pending 0 All tests passed. 18/12/29 22:59:25 INFO ShutdownHookManager: Shutdown hook called 18/12/29 22:59:25 INFO ShutdownHookManager: Deleting directory /Users/tdsilva/spark-twdsilva/external/docker-integration-tests/target/tmp/spark-0e1546d2-c0ca-41f1-a063-802348dee0fb [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 16:46 min [INFO] Finished at: 2018-12-29T22:59:25-08:00 [INFO] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450542026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100545/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450542024 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450542026 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100545/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
SparkQA commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450542005 **[Test build #100545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100545/testReport)** for PR 23411 at commit [`c9a94f4`](https://github.com/apache/spark/commit/c9a94f4399f7b7896ef5f9fcb9ce6497c91e39fc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450542024 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
SparkQA removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538226 **[Test build #100545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100545/testReport)** for PR 23411 at commit [`c9a94f4`](https://github.com/apache/spark/commit/c9a94f4399f7b7896ef5f9fcb9ce6497c91e39fc). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450541273 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450541274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6485/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450541274 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6485/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450541273 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450541237 **[Test build #100550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100550/testReport)** for PR 23412 at commit [`4af2b05`](https://github.com/apache/spark/commit/4af2b05e7b3c2bdd591e448eb49b8c6a1e3dd5a9). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450540760 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins removed a comment on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450540762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6484/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450540760 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
AmplabJenkins commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450540762 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6484/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
cloud-fan commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450540756 > I can only make within pivot. Can you make a try first? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
SparkQA commented on issue #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#issuecomment-450540738 **[Test build #100549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100549/testReport)** for PR 23388 at commit [`a12cf96`](https://github.com/apache/spark/commit/a12cf96ea2004586579110488789491129826f38). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
HyukjinKwon commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450540546 The time parser is kind of a breaking change but to me i'm fine with removing it considering thst we go for 3.0. It makes the codes difficult to maintain and blocks some features like date type inference. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon edited a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType
HyukjinKwon edited a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType URL: https://github.com/apache/spark/pull/23400#issuecomment-450540435 Looks good. For completeness, the integration test should be a good to do. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType
HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.makeGetter does not handle ByteType URL: https://github.com/apache/spark/pull/23400#issuecomment-450540435 Looks good. For completeness, the integration test should be a good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
KeiichiHirobe commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244524420 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { Review comment: I have already removed that check at https://github.com/apache/spark/pull/23288/commits/239cfa4792b6bebd386e4a962cdf4b0e9b2471aa#diff-7a6cb188d2ae31eb3347b5629a679cecR563 Or are you refering to `checkFilesExist` at line 557 and suggesting removing argument `checkFilesExist` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244524354 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, LambdaFunction, NamedLambdaVariable, UnaryExpression} +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * We need to take care of special floating numbers (NaN and -0.0) in several places: + * 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong + * to the same group. + * 3. In join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 4. In window partition keys, different NaNs should be treated as same, `-0.0` and `0.0` + * should be treated as same. + * + * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we + * recursively compare the fields/elements, so it's also fine. + * + * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different + * NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. + * + * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and Aggregate grouping + * expressions. + */ +object NormalizeFloatingNumbers extends Rule[LogicalPlan] { Review comment: The major reason is we create `Join`s during optimizaiton (for subquery), and I'm also worried about join reorder may break it. I'll add comment for it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244524335 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -0,0 +1,179 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions.{Alias, And, ArrayTransform, CreateArray, CreateMap, CreateNamedStruct, CreateNamedStructUnsafe, CreateStruct, EqualTo, ExpectsInputTypes, Expression, GetStructField, LambdaFunction, NamedLambdaVariable, UnaryExpression} +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, ExprCode} +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Subquery, Window} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * We need to take care of special floating numbers (NaN and -0.0) in several places: + * 1. When compare values, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 2. In GROUP BY, different NaNs should belong to the same group, -0.0 and 0.0 should belong + * to the same group. + * 3. In join keys, different NaNs should be treated as same, `-0.0` and `0.0` should be + * treated as same. + * 4. In window partition keys, different NaNs should be treated as same, `-0.0` and `0.0` + * should be treated as same. + * + * Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we + * recursively compare the fields/elements, so it's also fine. + * + * Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary directly, and different + * NaNs have different binary representation, and the same thing happens for -0.0 and 0.0. + * + * This rule normalizes NaN and -0.0 in Window partition keys, Join keys and Aggregate grouping + * expressions. + */ +object NormalizeFloatingNumbers extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = plan match { +// A subquery will be rewritten into join later, and will go through this rule +// eventually. Here we skip subquery, as we only need to run this rule once. +case _: Subquery => plan + +case _ => plan transform { + case w: Window if w.partitionSpec.exists(p => needNormalize(p.dataType)) => +w.copy(partitionSpec = w.partitionSpec.map(normalize)) Review comment: assume the query is `select a, a + sum(a) over (partition by a) ...`. Since the project list is evaluated for each input row, I think the `a` in the project list should retain the different of -0.0. Thus I think only `partitionSpec` needs to be normalized. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
cloud-fan commented on issue #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource URL: https://github.com/apache/spark/pull/23325#issuecomment-450539611 if it's only for troubleshooting, I guess users can do `if(IsNull(FromJson(col)), col, FromJson(col))`. My major concern is, `PERMISSIVE` mode doesn't work well in `from_json`, because we can return map/array. Maybe another choice is, if `from_jsom` returns array/map, we should forbid `PERMISSIVE` mode. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539414 **[Test build #100548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100548/testReport)** for PR 23400 at commit [`ae34533`](https://github.com/apache/spark/commit/ae34533c6f85d1441bf0adc5a6f2e0db1a557587). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539402 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6483/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244524192 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: @HyukjinKwon I have updated the PR, please take a look. @wangyum Sure, I will run the docker integration tests and post the results. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539404 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6483/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539402 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100547/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
SparkQA removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539253 **[Test build #100547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100547/testReport)** for PR 23400 at commit [`c202053`](https://github.com/apache/spark/commit/c202053a8297872faf70c17ebe38ef49fdd913ac). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins removed a comment on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539301 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100547/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539299 **[Test build #100547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100547/testReport)** for PR 23400 at commit [`c202053`](https://github.com/apache/spark/commit/c202053a8297872faf70c17ebe38ef49fdd913ac). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
AmplabJenkins commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539301 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
SparkQA commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450539253 **[Test build #100547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100547/testReport)** for PR 23400 at commit [`c202053`](https://github.com/apache/spark/commit/c202053a8297872faf70c17ebe38ef49fdd913ac). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] wangyum commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
wangyum commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244524099 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: @twdsilva Could you run [docker-integration-tests]( https://github.com/apache/spark/tree/master/external/docker-integration-tests) manually. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100546/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538753 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538753 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538752 **[Test build #100546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100546/testReport)** for PR 23412 at commit [`5e430ee`](https://github.com/apache/spark/commit/5e430eefa9e88e16dd3782f41f0f829e27d8023f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538537 **[Test build #100546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100546/testReport)** for PR 23412 at commit [`5e430ee`](https://github.com/apache/spark/commit/5e430eefa9e88e16dd3782f41f0f829e27d8023f). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100546/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6482/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6482/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538546 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538537 **[Test build #100546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100546/testReport)** for PR 23412 at commit [`5e430ee`](https://github.com/apache/spark/commit/5e430eefa9e88e16dd3782f41f0f829e27d8023f). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-450538546 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] kiszk opened a new pull request #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
kiszk opened a new pull request #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412 ## What changes were proposed in this pull request? The PR makes hardcoded `spark.unsafe` configs to use ConfigEntry and put them in the `config` package. ## How was this patch tested? Existing UTs This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore
srowen commented on a change in pull request #23288: [SPARK-26339][SQL]Throws better exception when reading files that start with underscore URL: https://github.com/apache/spark/pull/23288#discussion_r244523845 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala ## @@ -559,6 +559,25 @@ case class DataSource( } globPath }.toSeq + +if (checkFilesExist) { Review comment: What about removing that check entirely? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6481/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins removed a comment on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538249 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6481/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
SparkQA commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538226 **[Test build #100545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100545/testReport)** for PR 23411 at commit [`c9a94f4`](https://github.com/apache/spark/commit/c9a94f4399f7b7896ef5f9fcb9ce6497c91e39fc). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
AmplabJenkins commented on issue #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411#issuecomment-450538249 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523793 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: In that case, I think we don;t need to update migration guide. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523789 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: SGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen opened a new pull request #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled
srowen opened a new pull request #23411: [SPARK-26503][CORE] Get rid of spark.sql.legacy.timeParser.enabled URL: https://github.com/apache/spark/pull/23411 ## What changes were proposed in this pull request? Per discussion in https://github.com/apache/spark/pull/23391#discussion_r244414750 this proposes to just remove the old pre-Spark-3 time parsing behavior. ## How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244523719 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala ## @@ -99,9 +99,10 @@ object StaticSQLConf { .createWithDefault(false) val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions") -.doc("Name of the class used to configure Spark Session extensions. The class should " + +.doc("List of the class names used to configure Spark Session extensions. The classes should " + Review comment: I updated the documentation as suggested with respect to the ordering. I think the comment about "listeners" is related to code that is close in proximity to the code for this pull request, but it is a different configuration item. So I didn't update the `spark.sql.queryExecutionListeners` documentation. If you think that documentation should be updated, let me know if I should include it as a part of this pull request or as a separate pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
jamisonbennett commented on a change in pull request #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#discussion_r244523670 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala ## @@ -114,6 +124,25 @@ class SparkSessionExtensionSuite extends SparkFunSuite { stop(session) } } + + test("use multiple custom class for extensions") { +val session = SparkSession.builder() + .master("local[1]") + .config("spark.sql.extensions", Seq( +classOf[MyExtensions].getCanonicalName, +classOf[MyExtensions2].getCanonicalName).mkString(",")) + .getOrCreate() +try { + assert(session.sessionState.planner.strategies.contains(MySparkStrategy(session))) + assert(session.sessionState.analyzer.extendedResolutionRules.contains(MyRule(session))) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions.myFunction._1).isDefined) + assert(session.sessionState.functionRegistry +.lookupFunction(MyExtensions2.myFunction._1).isDefined) Review comment: Can we have a test case for duplicated extension names? Done Can we have a negative test case for function name conflicts? MyExtension2.myFunction and MyExtension3.myFunction? Done I added documentation for the behavior. I added a warning message if a registered function is replaced. I added a test case for the ordering. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6480/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537625 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins removed a comment on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537625 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
AmplabJenkins commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6480/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions
SparkQA commented on issue #23398: [SPARK-26493][SQL] Allow multiple spark.sql.extensions URL: https://github.com/apache/spark/pull/23398#issuecomment-450537610 **[Test build #100544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100544/testReport)** for PR 23398 at commit [`689a4d2`](https://github.com/apache/spark/commit/689a4d203dc2dec68c6b97c7d6529e90b094d31e). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523627 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: Sure that makes sense. However ```JdbcUtils.makeGetter``` does not handle ByteType currently, so if you have a dialect that maps TINYINT to ByteType it will fail with the following exception: ``` java.lang.IllegalArgumentException: Unsupported type tinyint at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter(JdbcUtils.scala:502) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters(JdbcUtils.scala:379) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.(JdbcUtils.scala:340) ``` I will update the PR with with only the change to handle ```ByteType``` in ```JdbcUtils.makeGetter``` and add a test for this as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen closed pull request #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookups in method `taskList`
srowen closed pull request #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookups in method `taskList` URL: https://github.com/apache/spark/pull/23310 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala index 312bcccb1cca1..0487f2f07c097 100644 --- a/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala +++ b/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala @@ -20,6 +20,7 @@ package org.apache.spark.status import java.util.{List => JList} import scala.collection.JavaConverters._ +import scala.collection.mutable.HashMap import org.apache.spark.{JobExecutionStatus, SparkConf} import org.apache.spark.status.api.v1 @@ -386,10 +387,9 @@ private[spark] class AppStatusStore( def taskList(stageId: Int, stageAttemptId: Int, maxTasks: Int): Seq[v1.TaskData] = { val stageKey = Array(stageId, stageAttemptId) - store.view(classOf[TaskDataWrapper]).index("stage").first(stageKey).last(stageKey).reverse() - .max(maxTasks).asScala.map { taskDataWrapper => - constructTaskData(taskDataWrapper) -}.toSeq.reverse +val taskDataWrapperIter = store.view(classOf[TaskDataWrapper]).index("stage") + .first(stageKey).last(stageKey).reverse().max(maxTasks).asScala +constructTaskDataList(taskDataWrapperIter).reverse } def taskList( @@ -428,9 +428,8 @@ private[spark] class AppStatusStore( } val ordered = if (ascending) indexed else indexed.reverse() -ordered.skip(offset).max(length).asScala.map { taskDataWrapper => - constructTaskData(taskDataWrapper) -}.toSeq +val taskDataWrapperIter = ordered.skip(offset).max(length).asScala +constructTaskDataList(taskDataWrapperIter) } def executorSummary(stageId: Int, attemptId: Int): Map[String, v1.ExecutorStageSummary] = { @@ -536,24 +535,29 @@ private[spark] class AppStatusStore( store.close() } - def constructTaskData(taskDataWrapper: TaskDataWrapper) : v1.TaskData = { -val taskDataOld: v1.TaskData = taskDataWrapper.toApi -val executorLogs: Option[Map[String, String]] = try { - Some(executorSummary(taskDataOld.executorId).executorLogs) -} catch { - case e: NoSuchElementException => e.getMessage -None -} -new v1.TaskData(taskDataOld.taskId, taskDataOld.index, - taskDataOld.attempt, taskDataOld.launchTime, taskDataOld.resultFetchStart, - taskDataOld.duration, taskDataOld.executorId, taskDataOld.host, taskDataOld.status, - taskDataOld.taskLocality, taskDataOld.speculative, taskDataOld.accumulatorUpdates, - taskDataOld.errorMessage, taskDataOld.taskMetrics, - executorLogs.getOrElse(Map[String, String]()), - AppStatusUtils.schedulerDelay(taskDataOld), - AppStatusUtils.gettingResultTime(taskDataOld)) + def constructTaskDataList(taskDataWrapperIter: Iterable[TaskDataWrapper]): Seq[v1.TaskData] = { +val executorIdToLogs = new HashMap[String, Map[String, String]]() +taskDataWrapperIter.map { taskDataWrapper => + val taskDataOld: v1.TaskData = taskDataWrapper.toApi + val executorLogs = executorIdToLogs.getOrElseUpdate(taskDataOld.executorId, { +try { + executorSummary(taskDataOld.executorId).executorLogs +} catch { + case e: NoSuchElementException => +Map.empty +} + }) + + new v1.TaskData(taskDataOld.taskId, taskDataOld.index, +taskDataOld.attempt, taskDataOld.launchTime, taskDataOld.resultFetchStart, +taskDataOld.duration, taskDataOld.executorId, taskDataOld.host, taskDataOld.status, +taskDataOld.taskLocality, taskDataOld.speculative, taskDataOld.accumulatorUpdates, +taskDataOld.errorMessage, taskDataOld.taskMetrics, +executorLogs, +AppStatusUtils.schedulerDelay(taskDataOld), +AppStatusUtils.gettingResultTime(taskDataOld)) +}.toSeq } - } private[spark] object AppStatusStore { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookups in method `taskList`
srowen commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookups in method `taskList` URL: https://github.com/apache/spark/pull/23310#issuecomment-450537300 Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API
HyukjinKwon commented on issue #23349: [SPARK-26403][SQL] Support pivoting using array column for `pivot(column)` API URL: https://github.com/apache/spark/pull/23349#issuecomment-450537268 Hey, @cloud-fan, I think this change doesn't make `Literal` or `pivot` worse. If you worry about that we support `WrappedArray` in `Literal`, I can only make within `pivot`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #20658: [SPARK-23488][python] Add missing catalog methods to python API
HyukjinKwon commented on issue #20658: [SPARK-23488][python] Add missing catalog methods to python API URL: https://github.com/apache/spark/pull/20658#issuecomment-450537190 This needs a bit of breaking change but that's quite unlikely corner case. Since we're going ahead for Spark 3.0, I think we can fix it. Please address https://github.com/apache/spark/pull/20658#issuecomment-368561007 and take this over if the author is being active. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23263: [SPARK-23674][ML] Adds Spark ML Events to Instrumentation
HyukjinKwon commented on issue #23263: [SPARK-23674][ML] Adds Spark ML Events to Instrumentation URL: https://github.com/apache/spark/pull/23263#issuecomment-450536961 Hey, @mengxr, can you check if your comment is addressed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #21826: [SPARK-24872] Replace the symbol '||' of Or operator with 'or'
cloud-fan commented on a change in pull request #21826: [SPARK-24872] Replace the symbol '||' of Or operator with 'or' URL: https://github.com/apache/spark/pull/21826#discussion_r244523473 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala ## @@ -442,7 +442,7 @@ case class Or(left: Expression, right: Expression) extends BinaryOperator with P override def inputType: AbstractDataType = BooleanType - override def symbol: String = "||" + override def symbol: String = "or" Review comment: is this only for display? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450536892 PySpark test failure looks orthogonal with this change. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244474776 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: Since `TINYINT` can represent an 8-bit unsigned integer value between 0 and 255, wouldn't it be more correct to use `ShortType`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23402: [SPARK-26500][CORE] Add conf to support ingoring data locality
HyukjinKwon commented on issue #23402: [SPARK-26500][CORE] Add conf to support ingoring data locality URL: https://github.com/apache/spark/pull/23402#issuecomment-450536703 I'm closing this. @printstacktrace, please let me know if I missed how much the configuration improves the time-consuming logic. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon closed pull request #23402: [SPARK-26500][CORE] Add conf to support ingoring data locality
HyukjinKwon closed pull request #23402: [SPARK-26500][CORE] Add conf to support ingoring data locality URL: https://github.com/apache/spark/pull/23402 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala index 6f4c326442e1e..08aa2137472f8 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala @@ -188,6 +188,8 @@ private[spark] class DAGScheduler( /** If enabled, FetchFailed will not cause stage retry, in order to surface the problem. */ private val disallowStageRetryForTest = sc.getConf.getBoolean("spark.test.noStageRetry", false) + private val localityIgnore = sc.conf.getBoolean("spark.locality.ignore", false) + /** * Whether to unregister all the outputs on the host in condition that we receive a FetchFailure, * this is set default to false, which means, we only unregister the outputs related to the exact @@ -1105,24 +1107,29 @@ private[spark] class DAGScheduler( outputCommitCoordinator.stageStart( stage = s.id, maxPartitionId = s.rdd.partitions.length - 1) } -val taskIdToLocations: Map[Int, Seq[TaskLocation]] = try { - stage match { -case s: ShuffleMapStage => - partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id))}.toMap -case s: ResultStage => - partitionsToCompute.map { id => -val p = s.partitions(id) -(id, getPreferredLocs(stage.rdd, p)) - }.toMap +val taskIdToLocations: Map[Int, Seq[TaskLocation]] = + if (localityIgnore) { +Map.empty + } else { +try { + stage match { +case s: ShuffleMapStage => + partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id)) }.toMap +case s: ResultStage => + partitionsToCompute.map { id => +val p = s.partitions(id) +(id, getPreferredLocs(stage.rdd, p)) + }.toMap + } +} catch { + case NonFatal(e) => +stage.makeNewStageAttempt(partitionsToCompute.size) +listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties)) +abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e)) +runningStages -= stage +return +} } -} catch { - case NonFatal(e) => -stage.makeNewStageAttempt(partitionsToCompute.size) -listenerBus.post(SparkListenerStageSubmitted(stage.latestInfo, properties)) -abortStage(stage, s"Task creation failed: $e\n${Utils.exceptionString(e)}", Some(e)) -runningStages -= stage -return -} stage.makeNewStageAttempt(partitionsToCompute.size, taskIdToLocations.values.toSeq) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523378 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: Also, in case of MySQL, I think we should always treat the worst case. it can be `TINYINT UNSIGNED`, which range from 0 to 255. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
AmplabJenkins removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450536541 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
AmplabJenkins removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450536543 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100543/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
AmplabJenkins commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450536541 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
AmplabJenkins commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450536543 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100543/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523342 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: If that can be varied per implementation, we should choose the most conservative one here and let other dialects handle it as you said. Can you check JDBC type mapping? I think we don't have to fix it then. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
SparkQA removed a comment on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450527784 **[Test build #100543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100543/testReport)** for PR 23409 at commit [`1c1b4df`](https://github.com/apache/spark/commit/1c1b4df5df083dfc5abbe4f2b7a88b640654bea1). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution
SparkQA commented on issue #23409: [SPARK-26502][SQL] Get rid of hiveResultString() in QueryExecution URL: https://github.com/apache/spark/pull/23409#issuecomment-450536492 **[Test build #100543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100543/testReport)** for PR 23409 at commit [`1c1b4df`](https://github.com/apache/spark/commit/1c1b4df5df083dfc5abbe4f2b7a88b640654bea1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
HyukjinKwon commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450536412 Migration guide is located in https://github.com/apache/spark/blob/master/docs/sql-migration-guide-upgrade.md This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] wangyum commented on a change in pull request #22903: [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation
wangyum commented on a change in pull request #22903: [SPARK-24196][SQL] Implement Spark's own GetSchemasOperation URL: https://github.com/apache/spark/pull/22903#discussion_r244523293 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.thriftserver + +import org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType +import org.apache.hive.service.cli._ +import org.apache.hive.service.cli.operation.GetSchemasOperation +import org.apache.hive.service.cli.session.HiveSession + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.catalog.SessionCatalog + +/** + * Spark's own GetSchemasOperation + * + * @param sqlContext SQLContext to use + * @param parentSession a HiveSession from SessionManager + * @param catalogName catalog name. null if not applicable. + * @param schemaName database name, null or a concrete database name + */ +private[hive] class SparkGetSchemasOperation( +sqlContext: SQLContext, +parentSession: HiveSession, +catalogName: String, +schemaName: String) + extends GetSchemasOperation(parentSession, catalogName, schemaName) with Logging { + + val catalog: SessionCatalog = sqlContext.sessionState.catalog + + private final val RESULT_SET_SCHEMA = new TableSchema() +.addStringColumn("TABLE_SCHEM", "Schema name.") +.addStringColumn("TABLE_CATALOG", "Catalog name.") + + private val rowSet = RowSetFactory.create(RESULT_SET_SCHEMA, getProtocolVersion) + + override def runInternal(): Unit = { +setState(OperationState.RUNNING) +// Always use the latest class loader provided by executionHive's state. +val executionHiveClassLoader = sqlContext.sharedState.jarClassLoader +Thread.currentThread().setContextClassLoader(executionHiveClassLoader) + +if (isAuthV2Enabled) { + val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName" + authorizeMetaGets(HiveOperationType.GET_TABLES, null, cmdStr) +} + +try { + catalog.listDatabases(convertSchemaPattern(schemaName)).foreach { dbName => +rowSet.addRow(Array[AnyRef](dbName, "")) + } + setState(OperationState.FINISHED) +} catch { + case e: HiveSQLException => +setState(OperationState.ERROR) +throw e +} + } + + override def getNextRowSet(orientation: FetchOrientation, maxRows: Long): RowSet = { Review comment: It won't get the result: ```scala [info] - Spark's own GetSchemasOperation(SparkGetSchemasOperation) *** FAILED *** (4 seconds, 364 milliseconds) [info] rs.next() was false (SparkMetadataOperationSuite.scala:78) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) [info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) [info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:78) [info] at org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite$$anonfun$1$$anonfun$org$apache$spark$sql$hive$thriftserver$SparkMetadataOperationSuite$$anonfun$$checkResult$1$1.apply(SparkMetadataOperationSuite.scala:77) ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450536275 I will investigate why the PySpark unit tests are failing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
twdsilva commented on issue #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#issuecomment-450536188 In order to update the migration guide, would I have to create a separate PR for https://github.com/apache/spark-website ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte…
twdsilva commented on a change in pull request #23400: [SPARK-26499] [SQL] JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of Byte… URL: https://github.com/apache/spark/pull/23400#discussion_r244523235 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala ## @@ -239,7 +239,7 @@ object JdbcUtils extends Logging { case java.sql.Types.TIMESTAMP => TimestampType case java.sql.Types.TIMESTAMP_WITH_TIMEZONE => null - case java.sql.Types.TINYINT => IntegerType + case java.sql.Types.TINYINT => ByteType Review comment: @HyukjinKwon Thank you for reviewing. The range of ```TINYINT``` varies by database. It ranges from -128 to 127 for MySQL(https://dev.mysql.com/doc/refman/8.0/en/integer-types.html), H2 (http://www.h2database.com/html/datatypes.html#tinyint_type) and Phoenix (https://phoenix.apache.org/language/datatypes.html#tinyint_type). It ranges from 0 to 255 for SQL Server (https://docs.microsoft.com/en-us/sql/t-sql/data-types/int-bigint-smallint-and-tinyint-transact-sql?view=sql-server-2017) I think users can implement ```JdbcDialect``` to override the default JDBC to catalyst type mapping when using the JDBC DataSource. Let me know what you think. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23405: [SPARK-26496][SS][TEST] Avoid to use Random.nextString in StreamingInnerJoinSuite
HyukjinKwon commented on issue #23405: [SPARK-26496][SS][TEST] Avoid to use Random.nextString in StreamingInnerJoinSuite URL: https://github.com/apache/spark/pull/23405#issuecomment-450534712 Thanks guys! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0 URL: https://github.com/apache/spark/pull/23388#discussion_r244522622 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala ## @@ -102,8 +102,8 @@ object ExtractEquiJoinKeys extends Logging with PredicateHelper { type ReturnType = (JoinType, Seq[Expression], Seq[Expression], Option[Expression], LogicalPlan, LogicalPlan) - def unapply(plan: LogicalPlan): Option[ReturnType] = plan match { -case join @ Join(left, right, joinType, condition) => + def unapply(join: Join): Option[ReturnType] = join match { +case Join(left, right, joinType, condition) => Review comment: It's same as before, except that, `j @ ExtractEquiJoinKeys...` the `j` becomes `Join` type. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org