[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61151/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61151/consoleFull)** for PR 13881 at commit [`7fb031e`](https://github.com/apache/spark/commit/7fb031eff488ca657e89220193866af0b39a358a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13883 Thanks - can you describe the bug in the pr description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...
Github user davies commented on the issue: https://github.com/apache/spark/pull/13883 https://gist.github.com/vlad17/964c0a93510d79cb130c33700f6139b7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13884#discussion_r68355286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] { case c: Command => stop = true c +// For outer join, although its output attributes are derived from its children, they are +// actually different attributes: the output of outer join is not always picked from its +// children, but can also be null. +// TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes +// of outer join. --- End diff -- Yea, I think we should consider it for 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13884#discussion_r68355194 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1541,4 +1541,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val df = Seq(1, 1, 2).toDF("column.with.dot") checkAnswer(df.distinct(), Row(1) :: Row(2) :: Nil) } + + test("SPARK-16181: outer join with isNull filter") { +val left = Seq("x").toDF("col") +val right = Seq("y").toDF("col").withColumn("new", lit(true)) +val joined = left.join(right, left("col") === right("col"), "left_outer") + +checkAnswer(joined, Row("x", null, null)) +checkAnswer(joined.filter($"new".isNull), Row("x", null, null)) --- End diff -- ah, this is subtle. `new` is replaced back to the original `col` from right, which is not nullable. Then, `isNull` just returns false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13883 What is the bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61151/consoleFull)** for PR 13881 at commit [`7fb031e`](https://github.com/apache/spark/commit/7fb031eff488ca657e89220193866af0b39a358a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61150/consoleFull)** for PR 13881 at commit [`f5a6893`](https://github.com/apache/spark/commit/f5a6893a1314de5f6a33bd6fb912a77a6cb19fa1). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61150/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61150/consoleFull)** for PR 13881 at commit [`f5a6893`](https://github.com/apache/spark/commit/f5a6893a1314de5f6a33bd6fb912a77a6cb19fa1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13680 having 2 implementations is also kind of a branch: the virtual function call need to be dispatched between these 2 implementations, while the only one implementation can be marked as final and doesn't have this overhead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r68353312 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy with Logging { ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet))) logInfo(s"Pruning directories with: ${partitionKeyFilters.mkString(",")}") - val dataColumns = -l.resolve(fsRelation.dataSchema, fsRelation.sparkSession.sessionState.analyzer.resolver) + val dataColumns = l.resolve(fsRelation.dataSchema, +fsRelation.sparkSession.sessionState.analyzer.resolver).map { c => + fsRelation.dataSchema.find(_.name == c.name).map { f => +c match { + case a: AttributeReference => a.withMetadata(f.metadata) + case _ => c +} + }.getOrElse(c) +} --- End diff -- We use metadata in merged schema to mark the optional field (not existing in all partitions), the metadata is lost after resolving. If we don't add them back, the pushed-down filters will be failed due to non existing field error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13844 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13884: [SPARK-16181][SQL: outer join with isNull filter may ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13884 **[Test build #61149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61149/consoleFull)** for PR 13884 at commit [`9316d7f`](https://github.com/apache/spark/commit/9316d7f0baec6d59e8a5a88cd872eca3e6720f9d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf i...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13883 [SPARK-16179] [PYSPARK] fix bugs for Python udf in generate ## What changes were proposed in this pull request? This PR fix the bug when Python UDF is used in explode (generator), GenerateExec requires that all the attributes in expressions should be resolvable from children when creating, we should replace the children first, then replace it's expressions. ## How was this patch tested? Added regression tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark udf_in_generate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13883.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13883 commit b9fd4bfb93dea18331987b83336b11f4f1f6e388 Author: Davies LiuDate: 2016-06-24T04:43:35Z fix udf in generate --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13844 **[Test build #61145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61145/consoleFull)** for PR 13844 at commit [`718023d`](https://github.com/apache/spark/commit/718023d9fa899af580cc45851db7d53c83fe1efa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13884#discussion_r68353553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -688,6 +688,14 @@ object FoldablePropagation extends Rule[LogicalPlan] { case c: Command => stop = true c +// For outer join, although its output attributes are derived from its children, they are +// actually different attributes: the output of outer join is not always picked from its +// children, but can also be null. +// TODO(cloud-fan): It seems more reasonable to use new attributes as the output attributes +// of outer join. --- End diff -- cc @marmbrus @yhuai @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13884: [SPARK-16181][SQL: outer join with isNull filter ...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13884 [SPARK-16181][SQL: outer join with isNull filter may return wrong result ## What changes were proposed in this pull request? The root cause is: the output attributes of outer join are derived from its children, while they are actually different attributes(outer join can return null). We have already added some special logic to handle it, e.g. `PushPredicateThroughJoin` won't push down predicates through outer join side, `FixNullability`. This PR adds one more special logic in `FoldablePropagation`. ## How was this patch tested? new test in `DataFrameSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13884.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13884 commit 9316d7f0baec6d59e8a5a88cd872eca3e6720f9d Author: Wenchen FanDate: 2016-06-24T04:48:58Z fix bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13701 @yhuai As I mentioned in the description, I am not sure if we can manipulate row groups as we want, but I have manually tested it to show the actually scanned row numbers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13844 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13883: [SPARK-16179] [PYSPARK] fix bugs for Python udf in gener...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13883 **[Test build #61148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61148/consoleFull)** for PR 13883 at commit [`b9fd4bf`](https://github.com/apache/spark/commit/b9fd4bfb93dea18331987b83336b11f4f1f6e388). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13844 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13844 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61145/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13701 Thank you for the testing. Can you also test the case that a file contains multiple row groups and we can avoid of scanning unneeded ones? Also since it is not fixing a critical bug, let's not merge it into branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13877: [SPARK-16142] [R] group naiveBayes method docs in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13877 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13877: [SPARK-16142] [R] group naiveBayes method docs in a sing...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13877 Merged into master and branch-2.0. Thanks for reviewing! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61144/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61144/consoleFull)** for PR 13881 at commit [`bd7d24d`](https://github.com/apache/spark/commit/bd7d24d4f5a79eca6ff9629706c254beba74bc45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r68352913 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy with Logging { ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet))) logInfo(s"Pruning directories with: ${partitionKeyFilters.mkString(",")}") - val dataColumns = -l.resolve(fsRelation.dataSchema, fsRelation.sparkSession.sessionState.analyzer.resolver) + val dataColumns = l.resolve(fsRelation.dataSchema, +fsRelation.sparkSession.sessionState.analyzer.resolver).map { c => + fsRelation.dataSchema.find(_.name == c.name).map { f => +c match { + case a: AttributeReference => a.withMetadata(f.metadata) + case _ => c +} + }.getOrElse(c) +} --- End diff -- I guess a better question is if it is part of the bug fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13701: [SPARK-15639][SQL] Try to push down filter at Row...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13701#discussion_r68352884 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -85,8 +85,15 @@ private[sql] object FileSourceStrategy extends Strategy with Logging { ExpressionSet(normalizedFilters.filter(_.references.subsetOf(partitionSet))) logInfo(s"Pruning directories with: ${partitionKeyFilters.mkString(",")}") - val dataColumns = -l.resolve(fsRelation.dataSchema, fsRelation.sparkSession.sessionState.analyzer.resolver) + val dataColumns = l.resolve(fsRelation.dataSchema, +fsRelation.sparkSession.sessionState.analyzer.resolver).map { c => + fsRelation.dataSchema.find(_.name == c.name).map { f => +c match { + case a: AttributeReference => a.withMetadata(f.metadata) + case _ => c +} + }.getOrElse(c) +} --- End diff -- Do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/13865 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #61147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61147/consoleFull)** for PR 13701 at commit [`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @cloud-fan , for the first issue, we are on the same page. Your proposal is what I am thinking about as possible solutions. I will do that. For the second issue, it seems to be design choice between 1. introduce one conditional branch in ```isNullAt()``` in one implementation 2. have two implementations without conditional branch at ```isNullAt()``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13865 **[Test build #61146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61146/consoleFull)** for PR 13865 at commit [`85e0eed`](https://github.com/apache/spark/commit/85e0eedd1d610d5c2cf486a43cda3401df856c33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13865: [SPARK-13709][SQL] Initialize deserializer with both tab...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13865 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13701 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13877: [SPARK-16142] [R] group naiveBayes method docs in a sing...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/13877 The new document in the screenshot looks pretty good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352383 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- it's inside `withTable`, tables will be dropped automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352349 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- Yea, sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352336 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- oh, nvm. We have withTable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352322 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- drop the table at the end of this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352282 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- yea, it is a good idea to add comments to explain why this one failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68352258 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- Yea, when reading data from a partition, the Avro deserializer needs to know the Avro schema defined in the table properties (`avro.schema.literal`). However, originally we only initialize the deserializer using the partition properties, which doesn't contain `avro.schema.literal`. This PR fixes it by merging to sets of properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #61143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61143/consoleFull)** for PR 13701 at commit [`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61143/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13877: [SPARK-16142] [R] group naiveBayes method docs in...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/13877#discussion_r68352034 --- Diff: R/pkg/R/mllib.R --- @@ -390,23 +376,41 @@ setMethod("predict", signature(object = "KMeansModel"), return(dataFrame(callJMethod(object@jobj, "transform", newData@sdf))) }) -#' Fit a Bernoulli naive Bayes model +#' Naive Bayes Models #' -#' Fit a Bernoulli naive Bayes model on a Spark DataFrame (only categorical data is supported). +#' \code{spark.naiveBayes} fits a Bernoulli naive Bayes model against a SparkDataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' Only categorical data is supported. #' -#' @param data SparkDataFrame for training +#' @param data A \code{SparkDataFrame} of observations and labels for model fitting #' @param formula A symbolic description of the model to be fitted. Currently only a few formula #' operators are supported, including '~', '.', ':', '+', and '-'. #' @param smoothing Smoothing parameter -#' @return a fitted naive Bayes model +#' @return \code{spark.naiveBayes} returns a fitted naive Bayes model #' @rdname spark.naiveBayes +#' @name spark.naiveBayes #' @seealso e1071: \url{https://cran.r-project.org/web/packages/e1071/} --- End diff -- We could use the `\link` tag as discussed in http://stackoverflow.com/questions/25489042/linking-to-other-packages-in-documentation-in-roxygen2-in-r --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13844 **[Test build #61145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61145/consoleFull)** for PR 13844 at commit [`718023d`](https://github.com/apache/spark/commit/718023d9fa899af580cc45851db7d53c83fe1efa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13844: [SPARK-16133][ML] model loading backward compatibility f...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/13844 @mengxr Thanks for your review. Sent update for the style issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13881 **[Test build #61144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61144/consoleFull)** for PR 13881 at commit [`bd7d24d`](https://github.com/apache/spark/commit/bd7d24d4f5a79eca6ff9629706c254beba74bc45). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...
Github user hhbyyh commented on a diff in the pull request: https://github.com/apache/spark/pull/13844#discussion_r68350637 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -232,7 +233,9 @@ object MinMaxScalerModel extends MLReadable[MinMaxScalerModel] { override def load(path: String): MinMaxScalerModel = { val metadata = DefaultParamsReader.loadMetadata(path, sc, className) val dataPath = new Path(path, "data").toString - val Row(originalMin: Vector, originalMax: Vector) = sparkSession.read.parquet(dataPath) + val data = sparkSession.read.parquet(dataPath) + val Row(originalMin: Vector, originalMax: Vector) = MLUtils.convertVectorColumnsToML( +data, "originalMin", "originalMax") --- End diff -- Sorry to miss it. Will update right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13881: [SPARK-3723] [MLlib] Adding instrumentation to random fo...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13881 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/13720 ok, i will work on it based on comments. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13879: [SPARK-16177] [ML] model loading backward compati...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13879 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13844: [SPARK-16133][ML] model loading backward compatib...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/13844#discussion_r68350339 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala --- @@ -232,7 +233,9 @@ object MinMaxScalerModel extends MLReadable[MinMaxScalerModel] { override def load(path: String): MinMaxScalerModel = { val metadata = DefaultParamsReader.loadMetadata(path, sc, className) val dataPath = new Path(path, "data").toString - val Row(originalMin: Vector, originalMax: Vector) = sparkSession.read.parquet(dataPath) + val data = sparkSession.read.parquet(dataPath) + val Row(originalMin: Vector, originalMax: Vector) = MLUtils.convertVectorColumnsToML( +data, "originalMin", "originalMax") --- End diff -- @hhbyyh Could you fix this style issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13879: [SPARK-16177] [ML] model loading backward compatibility ...
Github user mengxr commented on the issue: https://github.com/apache/spark/pull/13879 LGTM. Merged into master and branch-2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #61143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61143/consoleFull)** for PR 13701 at commit [`36fd059`](https://github.com/apache/spark/commit/36fd0596302a4ef7e411c2fe45a279082adaf69a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13874: [SQL][minor] ParserUtils.operationNotAllowed shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13874 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13874 LGTM - thanks! Merging to master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SQL] Try to push down filter at RowGroups ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13701 ping @liancheng @yhuai again... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61141/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13874 **[Test build #61141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61141/consoleFull)** for PR 13874 at commit [`ec0506f`](https://github.com/apache/spark/commit/ec0506f5a27c9581857d49cf296e4b0bde76297d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13699 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13699 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61140/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13699: [SPARK-15958] Make initial buffer size for the Sorter co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13699 **[Test build #61140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61140/consoleFull)** for PR 13699 at commit [`cf464a3`](https://github.com/apache/spark/commit/cf464a3eae5d2fa86f4946c302b15df9d9ee1a21). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13130 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13130 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61142/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13130 **[Test build #61142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61142/consoleFull)** for PR 13130 at commit [`82d78a3`](https://github.com/apache/spark/commit/82d78a36161167c76aebf313ce9541ce51989948). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13778 ping @vlad17 @davies @liancheng Any thing else? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13130 **[Test build #61142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61142/consoleFull)** for PR 13130 at commit [`82d78a3`](https://github.com/apache/spark/commit/82d78a36161167c76aebf313ce9541ce51989948). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13130: [SPARK-15340][SQL]Limit the size of the map used to cach...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13130 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR
Github user Div333 commented on the issue: https://github.com/apache/spark/pull/13786 Hello everyone, Thanks a lot for implementing the pivot functionality. I have started using SparkR recently and would like to know if the pivot method is included in the library as I don't find it in the documentation. Also, I am looking to use the unpivot functionality. It would be great if it is included. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...
Github user JustinPihony commented on the issue: https://github.com/apache/spark/pull/12601 Bump @HyukjinKwon I have some comments to your comments. Could you please review them and I can push my changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13882: Branch 1.6
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/13882 Please close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13720#discussion_r68343766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -522,7 +523,7 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF private def describeSchema(schema: Seq[CatalogColumn], buffer: ArrayBuffer[Row]): Unit = { schema.foreach { column => - append(buffer, column.name, column.dataType.toLowerCase, column.comment.orNull) + append(buffer, column.name, column.dataType.toLowerCase, column.comment.getOrElse("")) --- End diff -- Yea. If it is null, let's keep it as null. Changing a null to an empty string actually destroys the information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13882: Branch 1.6
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13882 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13865: [SPARK-13709][SQL] Initialize deserializer with b...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13865#discussion_r68343609 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala --- @@ -65,4 +68,77 @@ class QueryPartitionSuite extends QueryTest with SQLTestUtils with TestHiveSingl sql("DROP TABLE IF EXISTS createAndInsertTest") } } + + test("SPARK-13709: reading partitioned Avro table with nested schema") { +withTempDir { dir => + val path = dir.getCanonicalPath + val tableName = "spark_13709" + val tempTableName = "spark_13709_temp" + + new File(path, tableName).mkdir() + new File(path, tempTableName).mkdir() + + val avroSchema = +"""{ + | "name": "test_record", + | "type": "record", + | "fields": [ { + |"name": "f0", + |"type": "int" + | }, { + |"name": "f1", + |"type": { + | "type": "record", + | "name": "inner", + | "fields": [ { + |"name": "f10", + |"type": "int" + | }, { + |"name": "f11", + |"type": "double" + | } ] + |} + | } ] + |} +""".stripMargin + + withTable(tableName, tempTableName) { +// Creates the external partitioned Avro table to be tested. +sql( + s"""CREATE EXTERNAL TABLE $tableName + |PARTITIONED BY (ds STRING) + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Creates an temporary Avro table used to prepare testing Avro file. +sql( + s"""CREATE EXTERNAL TABLE $tempTableName + |ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' + |STORED AS + | INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' + | OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' + |LOCATION '$path/$tempTableName' + |TBLPROPERTIES ('avro.schema.literal' = '$avroSchema') + """.stripMargin +) + +// Generates Avro data. +sql(s"INSERT OVERWRITE TABLE $tempTableName SELECT 1, STRUCT(2, 2.5)") + +// Adds generated Avro data as a new partition to the testing table. +sql(s"ALTER TABLE $tableName ADD PARTITION (ds = 'foo') LOCATION '$path/$tempTableName'") + +checkAnswer( + sql(s"SELECT * FROM $tableName"), --- End diff -- can you explain it a bit more how this query fails without your patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13882: Branch 1.6
GitHub user liu549676915 opened a pull request: https://github.com/apache/spark/pull/13882 Branch 1.6 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13882.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13882 commit 0afad6678431846a6eebda8d5891da9115884915 Author: RJ NowlingDate: 2016-01-05T23:05:04Z [SPARK-12450][MLLIB] Un-persist broadcasted variables in KMeans SPARK-12450 . Un-persist broadcasted variables in KMeans. Author: RJ Nowling Closes #10415 from rnowling/spark-12450. (cherry picked from commit 78015a8b7cc316343e302eeed6fe30af9f2961e8) Signed-off-by: Joseph K. Bradley commit bf3dca2df4dd3be264691be1321e0c700d4f4e32 Author: BrianLondon Date: 2016-01-05T23:15:07Z [SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches. For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2 The demo ran successfully on the 1.5 branch as well. According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2. Author: BrianLondon Closes #10492 from BrianLondon/remove-only. (cherry picked from commit ff89975543b153d0d235c0cac615d45b34aa8fe7) Signed-off-by: Sean Owen commit c3135d02176cdd679b4a0e4883895b9e9f001a55 Author: Yanbo Liang Date: 2016-01-06T06:35:41Z [SPARK-12393][SPARKR] Add read.text and write.text for SparkR Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang Closes #10348 from yanboliang/spark-12393. (cherry picked from commit d1fea41363c175a67b97cb7b3fe89f9043708739) Signed-off-by: Shivaram Venkataraman commit 175681914af953b7ce1b2971fef83a2445de1f94 Author: zero323 Date: 2016-01-06T19:58:33Z [SPARK-12006][ML][PYTHON] Fix GMM failure if initialModel is not None If initial model passed to GMM is not empty it causes `net.razorvine.pickle.PickleException`. It can be fixed by converting `initialModel.weights` to `list`. Author: zero323 Closes #9986 from zero323/SPARK-12006. (cherry picked from commit fcd013cf70e7890aa25a8fe3cb6c8b36bf0e1f04) Signed-off-by: Joseph K. Bradley commit d821fae0ecca6393d3632977797d72ba594d26a9 Author: Shixiong Zhu Date: 2016-01-06T20:03:01Z [SPARK-12617][PYSPARK] Move Py4jCallbackConnectionCleaner to Streaming Move Py4jCallbackConnectionCleaner to Streaming because the callback server starts only in StreamingContext. Author: Shixiong Zhu Closes #10621 from zsxwing/SPARK-12617-2. (cherry picked from commit 1e6648d62fb82b708ea54c51cd23bfe4f542856e) Signed-off-by: Shixiong Zhu commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386 Author: huangzhaowei Date: 2016-01-06T20:48:57Z [SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url. Author: huangzhaowei Closes #10617 from SaintBacchus/SPARK-12672. commit 39b0a348008b6ab532768b90fd578b77711af98c Author: Shixiong Zhu Date: 2016-01-06T21:53:25Z Revert "[SPARK-12672][STREAMING][UI] Use the uiRoot function instead of default root path to gain the streaming batch url." This reverts commit 8f0ead3e79beb2c5f2731ceaa34fe1c133763386. Will merge #10618 instead. commit 11b901b22b1cdaa6d19b1b73885627ac601be275 Author: Liang-Chi Hsieh Date: 2015-12-14T17:59:42Z [SPARK-12016] [MLLIB] [PYSPARK] Wrap Word2VecModel when loading it in pyspark JIRA:
[GitHub] spark issue #13720: [SPARK-16004] [SQL] improve the display of CatalogTable ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13720 For the test, currently we only have one `desc table` test in `HiveDDLSuite`, It will be good if we can have an individual test suite for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13720: [SPARK-16004] [SQL] improve the display of Catalo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13720#discussion_r68342801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -522,7 +523,7 @@ case class DescribeTableCommand(table: TableIdentifier, isExtended: Boolean, isF private def describeSchema(schema: Seq[CatalogColumn], buffer: ArrayBuffer[Row]): Unit = { schema.foreach { column => - append(buffer, column.name, column.dataType.toLowerCase, column.comment.orNull) + append(buffer, column.name, column.dataType.toLowerCase, column.comment.getOrElse("")) --- End diff -- this is a behaviour changing. The result is not only used to display, but also used as a table to be queried later. I'm not sure it worth. cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13874: [SQL][minor] ParserUtils.operationNotAllowed should thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13874 **[Test build #61141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61141/consoleFull)** for PR 13874 at commit [`ec0506f`](https://github.com/apache/spark/commit/ec0506f5a27c9581857d49cf296e4b0bde76297d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61139/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13880 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13880: SPARK-16178: Remove unnecessary Hive partition check.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13880 **[Test build #61139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61139/consoleFull)** for PR 13880 at commit [`919f520`](https://github.com/apache/spark/commit/919f52001f78f9b1de8a0088a3de312dd6447fae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13680 @kiszk we should definitely put zero into the corresponding field when set null. It will be a little harder than `UnsafeRow`, as we need `setNullBoolean`, `setNullInt`, etc. but it's still doable. about clear out all null bits, yea, it's a big overhead for array with small element like boolean array, but I'm not sure this worth 2 different implementations, cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13832: [SPARK-16123] Avoid NegativeArraySizeException wh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13832 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13832 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61138/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13832 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13832 Merging to master/2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13832: [SPARK-16123] Avoid NegativeArraySizeException while res...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13832 **[Test build #61138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61138/consoleFull)** for PR 13832 at commit [`e2a1e1e`](https://github.com/apache/spark/commit/e2a1e1e757ada0f51bac8cf8b8a77b20d2d26c8e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61137/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13860 @hvanhovell Sure, will do it! It sounds like you also like the suggestion by @cloud-fan . Let me do it too. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13876 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13876 **[Test build #61137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61137/consoleFull)** for PR 13876 at commit [`5a9f4ec`](https://github.com/apache/spark/commit/5a9f4ecdb349453a42ad2b06293183c55c0b1c44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r68341391 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -40,7 +40,7 @@ private[sql] class ParquetOptions( if (!shortParquetCompressionCodecNames.contains(codecName)) { val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) throw new IllegalArgumentException(s"Codec [$codecName] " + -s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") +s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") --- End diff -- Just to make it consistent with the output of the other cases. See the code: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala#L49-L51 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org