[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17130 **[Test build #74383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74383/testReport)** for PR 17130 at commit [`9ce0093`](https://github.com/apache/spark/commit/9ce00930ea18c7bb8fe0cc59b98f6ece34d20311). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 I added `StackCoercision` at [here](https://github.com/apache/spark/pull/17251/commits/36d90811a77889b19c47347fc591a8e1a6a482f3), but reverted that. For `StackCoercision`, we need the schema which is based on the number of rows (and derived columns) from the first argument. Actually, the content of the first argument. The validation of the value is done by `Stack.checkInputDataTypes`. It seems we cannot add `StackCoercision`. For example, `StackCoercision` will fail for the followings. ``` // The first argument must be a positive constant integer. val m = intercept[AnalysisException] { df.selectExpr("stack(1.1, 1, 2, 3)") }.getMessage assert(m.contains("The number of rows must be a positive constant integer.")) val m2 = intercept[AnalysisException] { df.selectExpr("stack(-1, 1, 2, 3)") }.getMessage assert(m2.contains("The number of rows must be a positive constant integer.")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #74382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74382/testReport)** for PR 17251 at commit [`0cd5d88`](https://github.com/apache/spark/commit/0cd5d88609a2e36459498a86caec5046d9ebe2b1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17255 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17255 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74375/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17255 **[Test build #74375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74375/testReport)** for PR 17255 at commit [`e2d34b8`](https://github.com/apache/spark/commit/e2d34b8ac0f7a7ebf4c8c120a639de9451b20f6e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17251 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17251 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74376/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #74376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74376/testReport)** for PR 17251 at commit [`36d9081`](https://github.com/apache/spark/commit/36d90811a77889b19c47347fc591a8e1a6a482f3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r105528166 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,22 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved => +val schema = s.elementSchema --- End diff -- Oops. This breaks [GeneratorFunctionSuite](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala#L30). For `StackCoercision`, we need the schema which is based on the number of rows (and derived columns) from the **first argument**. Actually, the content of the first argument. The validate of the value is done by `Stack.checkInputDataTypes`. It seems we cannot add `StackCoercision` in this manner. How do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17209: [SPARK-19853][SS] uppercase kafka topics fail whe...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/17209#discussion_r105528025 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -450,10 +445,22 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister private[kafka010] object KafkaSourceProvider { private val STRATEGY_OPTION_KEYS = Set("subscribe", "subscribepattern", "assign") - private val STARTING_OFFSETS_OPTION_KEY = "startingoffsets" - private val ENDING_OFFSETS_OPTION_KEY = "endingoffsets" + private[kafka010] val STARTING_OFFSETS_OPTION_KEY = "startingoffsets" + private[kafka010] val ENDING_OFFSETS_OPTION_KEY = "endingoffsets" private val FAIL_ON_DATA_LOSS_OPTION_KEY = "failondataloss" --- End diff -- change for unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17087 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17243: [SPARK-19901][Core]Clean up the clunky method sig...
Github user ConeyLiu closed the pull request at: https://github.com/apache/spark/pull/17243 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17087 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74374/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17243: [SPARK-19901][Core]Clean up the clunky method signature ...
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/17243 ok, I will close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17087 **[Test build #74374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74374/testReport)** for PR 17087 at commit [`c5fc5f1`](https://github.com/apache/spark/commit/c5fc5f140a9c5d661748033f4bad0f59e1ca88bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class InterpretedPredicate(expression: Expression) extends GenPredicate ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74379/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17250 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17250 **[Test build #74379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74379/testReport)** for PR 17250 at commit [`a604dc5`](https://github.com/apache/spark/commit/a604dc5952a1c9939d433371abc969670a8ff6ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class Builder[T: ClassTag](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17209: [SPARK-19853][SS] uppercase kafka topics fail when start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17209 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17209: [SPARK-19853][SS] uppercase kafka topics fail when start...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17209 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74380/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17209: [SPARK-19853][SS] uppercase kafka topics fail when start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17209 **[Test build #74380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74380/testReport)** for PR 17209 at commit [`50ef0e1`](https://github.com/apache/spark/commit/50ef0e11b85fee7947d840ccafc59bd23b17c189). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17256: [SPARK-19919][SQL] Defer throwing the exception for empt...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17256 **[Test build #74381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74381/testReport)** for PR 17256 at commit [`9d91da1`](https://github.com/apache/spark/commit/9d91da124e0723adee7744a64999ea1c07acfe66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17256: [SPARK-19919][SQL] Defer throwing the exception f...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17256 [SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into `DataSource` ## What changes were proposed in this pull request? This PR proposes to defer throwing the exception within `DataSource`. Currently, if other datasources fail to infer the schema, it returns `None` and then this is being validated in `DataSource` as below: ``` scala> spark.read.json("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.; ``` ``` scala> spark.read.orc("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.; ``` ``` scala> spark.read.parquet("emptydir") org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.; ``` However, CSV it checks it within the datasource implementation and throws another exception message as below: ``` scala> spark.read.csv("emptydir") java.lang.IllegalArgumentException: requirement failed: Cannot infer schema from an empty set of files ``` We could remove this duplicated check and validate this in one place in the same way with the same message. ## How was this patch tested? Unit test in `CSVSuite` and manual test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-19919 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17256 commit 9d91da124e0723adee7744a64999ea1c07acfe66 Author: hyukjinkwonDate: 2017-03-11T06:53:39Z Defer input path validation into DataSource in CSV datasource --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17209: [SPARK-19853][SS] uppercase kafka topics fail when start...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17209 **[Test build #74380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74380/testReport)** for PR 17209 at commit [`50ef0e1`](https://github.com/apache/spark/commit/50ef0e11b85fee7947d840ccafc59bd23b17c189). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17188 @cloud-fan Could you check again? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17250 **[Test build #74379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74379/testReport)** for PR 17250 at commit [`a604dc5`](https://github.com/apache/spark/commit/a604dc5952a1c9939d433371abc969670a8ff6ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...
Github user shivaram closed the pull request at: https://github.com/apache/spark/pull/16290 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17254 **[Test build #74378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74378/testReport)** for PR 17254 at commit [`36a3463`](https://github.com/apache/spark/commit/36a34632dbb000799c35727c00d1542d4bb1ce00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74372/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #74372 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74372/testReport)** for PR 15435 at commit [`e629030`](https://github.com/apache/spark/commit/e629030c14d83dd330f3ff5cfb79ee3e95f35081). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17254 **[Test build #74377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74377/testReport)** for PR 17254 at commit [`191d8a1`](https://github.com/apache/spark/commit/191d8a1c434d3eb39d40821d3ee6ad304b052f63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17255 (let me wait for the tests before cc'ing someone) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17254 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74373/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17254 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17254 **[Test build #74373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74373/testReport)** for PR 17254 at commit [`c0dc3b7`](https://github.com/apache/spark/commit/c0dc3b72149d2b88384e6ce5208e355d8bd7f52a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #74376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74376/testReport)** for PR 17251 at commit [`36d9081`](https://github.com/apache/spark/commit/36d90811a77889b19c47347fc591a8e1a6a482f3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74371/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74371/testReport)** for PR 17188 at commit [`75e1884`](https://github.com/apache/spark/commit/75e188495a9c9a55a46dfaaf7cecf41f8b60c130). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74370/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74370/testReport)** for PR 17188 at commit [`2013679`](https://github.com/apache/spark/commit/2013679ea4f726b2a5b86387753b98da5171113d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17240: [SPARK-19915][SQL] Improve join reorder: simplify cost e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17240 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17240: [SPARK-19915][SQL] Improve join reorder: simplify cost e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17240 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17240: [SPARK-19915][SQL] Improve join reorder: simplify cost e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17240 **[Test build #74368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74368/testReport)** for PR 17240 at commit [`82a1740`](https://github.com/apache/spark/commit/82a17405a9abc0e5690e46855011e3755c2d8f90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17255: [SPARK-19918[SQL] Use TextFileFormat in implementation o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17255 **[Test build #74375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74375/testReport)** for PR 17255 at commit [`e2d34b8`](https://github.com/apache/spark/commit/e2d34b8ac0f7a7ebf4c8c120a639de9451b20f6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17255: [SPARK-19918[SQL] Use TextFileFormat in implement...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17255 [SPARK-19918[SQL] Use TextFileFormat in implementation of JsonFileFormat ## What changes were proposed in this pull request? This PR proposes to use text datasource when Json schema inference. This basically proposes the similar approach in https://github.com/apache/spark/pull/15813 If we use Dataset for initial loading when inferring the schema, there are advantages. Please refer SPARK-18362 It seems JSON one was supposed to be fixed together but taken out according to https://github.com/apache/spark/pull/15813 > A similar problem also affects the JSON file format and this patch originally fixed that as well, but I've decided to split that change into a separate patch so as not to conflict with changes in another JSON PR. Also, this affects some functionalities because it does not use FileScanRDD. This problem is described in SPARK-19885 (but it was CSV's case). ## How was this patch tested? Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark json-filescanrdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17255 commit 5e90d04eb9b1dc011188def339f92f3e8ef7e236 Author: hyukjinkwonDate: 2017-03-11T05:42:43Z Use TextFileFormat in implementation of JsonFileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17188 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74369/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74369/testReport)** for PR 17188 at commit [`1f31c27`](https://github.com/apache/spark/commit/1f31c2756a7f56e8e2f22a1e4054ea9e8e3d165c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17253: [SPARK-19916][SQL] simplify bad file handling
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17253 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17253: [SPARK-19916][SQL] simplify bad file handling
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17253 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74367/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17253: [SPARK-19916][SQL] simplify bad file handling
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17253 **[Test build #74367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74367/testReport)** for PR 17253 at commit [`05febbd`](https://github.com/apache/spark/commit/05febbdc58f566426796dbf814000381b309062f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Basically, we can do that by bringing the logic from `Stack.checkInputDataTypes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 @cloud-fan . For the following case, the values of `stack` consists of multiple columns. Then, by adding *StackCoercion* rule, we need to *insert Cast() for all NullType*? ```scala scala> sql("select stack(4, null, 1.0, 'a', true, 2, null, 'b', true, 3, 3.0, null, false, 4, 4.0, 'd', null)").show ++++-+ |col0|col1|col2| col3| ++++-+ |null| 1.0| a| true| | 2|null| b| true| | 3| 3.0|null|false| | 4| 4.0| d| null| ++++-+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15363 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15363 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15363: [SPARK-17791][SQL] Join reordering using star schema det...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15363 **[Test build #74364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74364/testReport)** for PR 15363 at commit [`fe0d390`](https://github.com/apache/spark/commit/fe0d390ba97d600bcc40445d0aade81ee5718f2b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class StarSchemaDetection(conf: CatalystConf) extends PredicateHelper ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17243: [SPARK-19901][Core]Clean up the clunky method signature ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17243 I'm not sure what this refers to. Unless it's giving a clear and non-trivial improvement to the method signatures, I don't think is worth the time to discuss and review. This particular change does not seem worthwhile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16788 @tgravescs , @vanzin - this PR for mesos changes how spark handles kerberos tokens fundamentally; would be good to have your views. +CC @jerryshao to also look at the PR, since you have worked on yarn security changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319503 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1028,7 +1028,7 @@ class DAGScheduler( val locs = taskIdToLocations(id) new ResultTask(stage.id, stage.latestInfo.attemptId, taskBinary, part, locs, id, properties, serializedTaskMetrics, - Option(jobId), Option(sc.applicationId), sc.applicationAttemptId) + Option(jobId), Option(sc.applicationId), sc.applicationAttemptId, Option(tokens)) } --- End diff -- Current spark model in yarn for managing tokens is to do it out of band with the actual tasks (unlike tez/MR iirc : where then the execution model is itself different). The tasks themselves do not propagate the credentials - the executors directly update the credentials based on driver updates. This allows for very long running spark tasks (> 24 hours for example) to run - which per task tokens might not allow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1006,7 +1006,7 @@ class DAGScheduler( runningStages -= stage return } - +val tokens = KerberosUtil.getHadoopDelegationTokens --- End diff -- See my comment about below about how current spark (on yarn) handles security https://github.com/apache/spark/pull/16788/files#r105319503. @tgravescs or @vanzin can correct me if I am wrong (in case I am misremembering) - In a secure hdfs, it is not necessary for principal/keytab to be provided - if the job will finishes before token renewal is necessitated. Given above : The call chain in KerberosUtil.getHadoopDelegationTokens will throw an exception if they are missing if ugi security is enabled. I am not sure if this is a requirement in mesos, but it is not for yarn. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526086 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUser.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.spark.{SparkConf, SparkEnv, SparkException} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.internal.Logging + +object KerberosUser extends Logging { + + def securize (principal: String, keytab: String) : Unit = { +val hadoopConf = SparkHadoopUtil.get.newConfiguration(new SparkConf()) +hadoopConf.set("hadoop.security.authentication", "Kerberos") +UserGroupInformation.setConfiguration(hadoopConf) +UserGroupInformation.loginUserFromKeytab(principal, keytab) + } --- End diff -- This method is duplicated in KerberosUtil ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319198 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -151,9 +152,13 @@ object SparkSubmit extends CommandLineUtils { val (childArgs, childClasspath, sysProps, childMainClass) = prepareSubmitEnvironment(args) def doRunMain(): Unit = { + if (args.principal != null && args.keytab!= null) { +KerberosUser.securize(args.principal, args.keytab) --- End diff -- This will cause multiple UGI.loginUserFromKeytab (in yarn case it happens in SparkSubmit.prepareSubmitEnvironment) - which causes various issues. In an application, there must be only one call to UGI.loginUserFromKeytab; if more than one, random things fail (dfs client, metastore, etc) due to the way the loginUser is cached/used and hadoop ipc renews unfortunately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -60,9 +60,10 @@ private[spark] class ResultTask[T, U]( serializedTaskMetrics: Array[Byte], jobId: Option[Int] = None, appId: Option[String] = None, -appAttemptId: Option[String] = None) +appAttemptId: Option[String] = None, +tokens: Option[Array[Byte]] = None) extends Task[U](stageId, stageAttemptId, partition.index, localProperties, serializedTaskMetrics, -jobId, appId, appAttemptId) +jobId, appId, appAttemptId,tokens) --- End diff -- Changes to both *Task.scala is changing the security model for not just mesos, but also yarn - and this is incompatible with existing public api (credential managers, etc) : unless the PR is planning to overhaul the security in spark for all cluster managers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319759 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUtil.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import java.security.PrivilegedExceptionAction + +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.mapred.Master +import org.apache.hadoop.security.{Credentials, UserGroupInformation} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.{SparkConf, SparkEnv, SparkException} +import org.apache.spark.internal.Logging + +object KerberosUtil extends Logging { + var proxyUser : Option[UserGroupInformation] = None + def securize (principal: String, keytab: String) : Unit = { +val hadoopConf = SparkHadoopUtil.get.newConfiguration(new SparkConf()) +hadoopConf.set("hadoop.security.authentication", "Kerberos") +UserGroupInformation.setConfiguration(hadoopConf) +UserGroupInformation.loginUserFromKeytab(principal, keytab) + } + + + def getHadoopDelegationTokens : Array[Byte] = { --- End diff -- Given the frequency (and where it is invoked from) of this method - how expensive is it ? +CC @tgravescs, @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105318927 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -257,10 +257,6 @@ private[deploy] class SparkSubmitArguments(args: Seq[String], env: Map[String, S "either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.") } } - -if (proxyUser != null && principal != null) { - SparkSubmit.printErrorAndExit("Only one of --proxy-user or --principal can be provided.") -} --- End diff -- This validation is relevant for spark yarn support, and should not be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17087: [SPARK-19372][SQL] Fix throwing a Java exception at df.f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17087 **[Test build #74374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74374/testReport)** for PR 17087 at commit [`c5fc5f1`](https://github.com/apache/spark/commit/c5fc5f140a9c5d661748033f4bad0f59e1ca88bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17055 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17251 just add a new rule like `object IfCoercion extends Rule[LogicalPlan]`, to handle `Stack` with null literal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17055 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17253: [SPARK-19916][SQL] simplify bad file handling
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17253#discussion_r105526137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -44,7 +44,7 @@ case class PartitionedFile( filePath: String, start: Long, length: Long, -locations: Array[String] = Array.empty) { +@transient locations: Array[String] = Array.empty) { --- End diff -- this is not for `FileScanRDD.filePartitions`, this is for `FilePartition`s that sent by scheduler. The location is only useful during planning, we should not send it to executors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17055#discussion_r105526116 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2016,48 +2016,79 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { } } + test("create datasource table with a non-existing location") { +withTable("t", "t1") { + withTempPath { dir => +spark.sql(s"CREATE TABLE t(a int, b int) USING parquet LOCATION '$dir'") + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t SELECT 1, 2") +assert(dir.exists()) + +checkAnswer(spark.table("t"), Row(1, 2)) + } + // partition table + withTempPath { dir => +spark.sql( + s""" + |CREATE TABLE t1(a int, b int) USING parquet PARTITIONED BY(a) LOCATION '$dir' + """.stripMargin) + +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t1")) +assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) + +spark.sql("INSERT INTO TABLE t1 PARTITION(a=1) SELECT 2") + +val partDir = new File(dir, "a=1") +assert(partDir.exists()) + +checkAnswer(spark.table("t1"), Row(2, 1)) + } +} + } + Seq(true, false).foreach { shouldDelete => -val tcName = if (shouldDelete) "non-existent" else "existed" +val tcName = if (shouldDelete) "non-existing" else "existed" test(s"CTAS for external data source table with a $tcName location") { withTable("t", "t1") { -withTempDir { - dir => -if (shouldDelete) { - dir.delete() -} -spark.sql( - s""" - |CREATE TABLE t - |USING parquet - |LOCATION '$dir' - |AS SELECT 3 as a, 4 as b, 1 as c, 2 as d - """.stripMargin) -val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t")) -assert(table.location == makeQualifiedPath(dir.getAbsolutePath)) +withTempDir { dir => + if (shouldDelete) { +dir.delete() + } --- End diff -- ok thanks~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17253: [SPARK-19916][SQL] simplify bad file handling
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17253#discussion_r105526080 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -130,54 +144,35 @@ class FileScanRDD( // Sets InputFileBlockHolder for the file block's information InputFileBlockHolder.set(currentFile.filePath, currentFile.start, currentFile.length) - try { -if (ignoreCorruptFiles) { - currentIterator = new NextIterator[Object] { -private val internalIter = { - try { -// The readFunction may read files before consuming the iterator. -// E.g., vectorized Parquet reader. -readFunction(currentFile) - } catch { -case e @(_: RuntimeException | _: IOException) => - logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e) - Iterator.empty - } -} - -override def getNext(): AnyRef = { - try { -if (internalIter.hasNext) { - internalIter.next() -} else { - finished = true - null -} - } catch { -case e: IOException => - logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e) - finished = true - null + if (ignoreCorruptFiles) { +currentIterator = new NextIterator[Object] { + // The readFunction may read some bytes before consuming the iterator, e.g., + // vectorized Parquet reader. Here we use lazy val to delay the creation of + // iterator so that we will throw exception in `getNext`. + private lazy val internalIter = readCurrentFile() + + override def getNext(): AnyRef = { +try { + if (internalIter.hasNext) { +internalIter.next() + } else { +finished = true +null } +} catch { + // Throw FileNotFoundException even `ignoreCorruptFiles` is true + case e: java.io.FileNotFoundException => throw e --- End diff -- `FileNotFoundException` extends `IOException` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526016 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -2180,6 +2181,13 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { withTempDir { dir => assert(!dir.getAbsolutePath.startsWith("file:/")) +spark.sql(s"ALTER TABLE t SET LOCATION '$dir'") --- End diff -- ALTER TABLE SET LOCATION should be also qualified --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526010 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1357,13 +1360,11 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { "PARTITION (a='2', b='6') LOCATION 'paris' PARTITION (a='3', b='7')") assert(catalog.listPartitions(tableIdent).map(_.spec).toSet == Set(part1, part2, part3)) assert(catalog.getPartition(tableIdent, part1).storage.locationUri.isDefined) -val partitionLocation = if (isUsingHiveMetastore) { - val tableLocation = catalog.getTableMetadata(tableIdent).storage.locationUri - assert(tableLocation.isDefined) - makeQualifiedPath(new Path(tableLocation.get.toString, "paris").toString) -} else { - new URI("paris") --- End diff -- ALTER TABLE ADD PARTITION LOCATION relative location will be quallified with parent path using table location --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17254#discussion_r105526000 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1201,7 +1202,9 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { verifyLocation(new URI("/swanky/steak/place")) // set table partition location without explicitly specifying database sql("ALTER TABLE tab1 PARTITION (a='1', b='2') SET LOCATION 'vienna'") -verifyLocation(new URI("vienna"), Some(partSpec)) +val table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("tab1")) --- End diff -- relative location will be quallified with parent path using table location --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17254 **[Test build #74373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74373/testReport)** for PR 17254 at commit [`c0dc3b7`](https://github.com/apache/spark/commit/c0dc3b72149d2b88384e6ce5208e355d8bd7f52a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13837: [SPARK-16126] [SQL] Better Error Message When using Data...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/13837 hi - where are we on this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17254: [SPARK-19917][SQL]qualified partition path stored...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17254 [SPARK-19917][SQL]qualified partition path stored in catalog ## What changes were proposed in this pull request? partition path should be qualified to store in catalog. There are some scenes: 1. ALTER TABLE t PARTITION(b=1) SET LOCATION '/path/x' qualified: file:/path/x 2. ALTER TABLE t PARTITION(b=1) SET LOCATION 'x' qualified: file:/tablelocation/x 3. ALTER TABLE t ADD PARTITION(b=1) LOCATION '/path/x' qualified: file:/path/x 4. ALTER TABLE t ADD PARTITION(b=1) LOCATION 'x' qualified: file:/tablelocation/x Currently only ALTER TABLE t ADD PARTITION(b=1) LOCATION for hive serde table has the expected qualified path. we should make other scenes to be consist with it. Another change is for alter table location. ## How was this patch tested? modify existing TestCases You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark qualifiedPartitionPath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17254.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17254 commit 51dcf4092345e173a63e088e779fa44c4fc963b1 Author: windpigerDate: 2017-03-11T04:39:02Z [SPARK-19917][SQL]qualified partition path stored in catalog commit 996a84d8d1ece4d311240e71a1a9be47828621f8 Author: windpiger Date: 2017-03-11T04:42:32Z remove empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74365/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17219: [SPARK-19876][SS][WIP] OneTime Trigger Executor
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17219 **[Test build #74365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74365/testReport)** for PR 17219 at commit [`8c5b84f`](https://github.com/apache/spark/commit/8c5b84f8875f630dc197c018d1d68f440164805b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16290 Yes that's the plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17253: [SPARK-19916][SQL] simplify bad file handling
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17253#discussion_r105525303 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -44,7 +44,7 @@ case class PartitionedFile( filePath: String, start: Long, length: Long, -locations: Array[String] = Array.empty) { +@transient locations: Array[String] = Array.empty) { --- End diff -- do we need to mark it as `transient`? `filePartitions: Seq[FilePartition])` is already `transient` in `FileScanRDD`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17253: [SPARK-19916][SQL] simplify bad file handling
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17253 LGTM with very minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #74372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74372/testReport)** for PR 15435 at commit [`e629030`](https://github.com/apache/spark/commit/e629030c14d83dd330f3ff5cfb79ee3e95f35081). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17253: [SPARK-19916][SQL] simplify bad file handling
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17253#discussion_r105525158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -130,54 +144,35 @@ class FileScanRDD( // Sets InputFileBlockHolder for the file block's information InputFileBlockHolder.set(currentFile.filePath, currentFile.start, currentFile.length) - try { -if (ignoreCorruptFiles) { - currentIterator = new NextIterator[Object] { -private val internalIter = { - try { -// The readFunction may read files before consuming the iterator. -// E.g., vectorized Parquet reader. -readFunction(currentFile) - } catch { -case e @(_: RuntimeException | _: IOException) => - logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e) - Iterator.empty - } -} - -override def getNext(): AnyRef = { - try { -if (internalIter.hasNext) { - internalIter.next() -} else { - finished = true - null -} - } catch { -case e: IOException => - logWarning(s"Skipped the rest content in the corrupted file: $currentFile", e) - finished = true - null + if (ignoreCorruptFiles) { +currentIterator = new NextIterator[Object] { + // The readFunction may read some bytes before consuming the iterator, e.g., + // vectorized Parquet reader. Here we use lazy val to delay the creation of + // iterator so that we will throw exception in `getNext`. + private lazy val internalIter = readCurrentFile() + + override def getNext(): AnyRef = { +try { + if (internalIter.hasNext) { +internalIter.next() + } else { +finished = true +null } +} catch { + // Throw FileNotFoundException even `ignoreCorruptFiles` is true + case e: java.io.FileNotFoundException => throw e --- End diff -- nit: `FileNotFoundException` will be thrown anyway, do wee need this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74371/testReport)** for PR 17188 at commit [`75e1884`](https://github.com/apache/spark/commit/75e188495a9c9a55a46dfaaf7cecf41f8b60c130). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74359/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16867 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16867 **[Test build #74359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74359/testReport)** for PR 16867 at commit [`09719a2`](https://github.com/apache/spark/commit/09719a2b0e065f40d0dfe47c6b4342f0ad3a235c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MedianHeap(implicit val ord: Ordering[Double]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74370/testReport)** for PR 17188 at commit [`2013679`](https://github.com/apache/spark/commit/2013679ea4f726b2a5b86387753b98da5171113d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17055 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17055 **[Test build #74361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74361/testReport)** for PR 17055 at commit [`ada275c`](https://github.com/apache/spark/commit/ada275cde7d7ca189342446b4ddefcdd97c45d85). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74369/testReport)** for PR 17188 at commit [`1f31c27`](https://github.com/apache/spark/commit/1f31c2756a7f56e8e2f22a1e4054ea9e8e3d165c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17253: [SPARK-19916][SQL] simplify bad file handling
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17253 **[Test build #74367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74367/testReport)** for PR 17253 at commit [`05febbd`](https://github.com/apache/spark/commit/05febbdc58f566426796dbf814000381b309062f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org