[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16938 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73405/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16938 **[Test build #73405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)** for PR 16938 at commit [`1f2ce17`](https://github.com/apache/spark/commit/1f2ce17e3d2eca92bc01b6a22e908bd8fd1d9592). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17035: [SPARK-19705][SQL] Preferred location supporting HDFS ca...
Github user highfei2011 commented on the issue: https://github.com/apache/spark/pull/17035 PreferredLocation calculation is more complex, reflected in the code which part of it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...
Github user watermen commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r102892814 --- Diff: core/src/main/scala/org/apache/spark/MapOutputStatistics.scala --- @@ -23,5 +23,9 @@ package org.apache.spark * @param shuffleId ID of the shuffle * @param bytesByPartitionId approximate number of output bytes for each map output partition * (may be inexact due to use of compressed map statuses) + * @param numberOfOutput number of output for each pre-map output partition */ -private[spark] class MapOutputStatistics(val shuffleId: Int, val bytesByPartitionId: Array[Long]) +private[spark] class MapOutputStatistics( +val shuffleId: Int, +val bytesByPartitionId: Array[Long], +val numberOfOutput: Array[Int]) --- End diff -- Here, maybe Long is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17000 cc @yanboliang - it seems actually similar in effect to the VL-BFGS work with RDD-based coefficients? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17056 **[Test build #73410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73410/testReport)** for PR 17056 at commit [`a378b3e`](https://github.com/apache/spark/commit/a378b3ef08cead4c915096f11de5bd371a405fef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17056 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r102891969 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -39,16 +40,18 @@ private[spark] sealed trait MapStatus { * necessary for correctness, since block fetchers are allowed to skip zero-size blocks. */ def getSizeForBlock(reduceId: Int): Long + + def numberOfOutput: Int --- End diff -- The number of output may be greater than 2G? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/17056 [SPARK-17495] [SQL] Support Decimal type in Hive-hash ## What changes were proposed in this pull request? Hive hash to support Decimal datatype. [Hive internally normalises decimals](https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveDecimalV1.java#L307) and I have ported that logic as-is to HiveHash. Generated code (in case any reviewer wants to examine): ``` /* 031 */ protected void processNext() throws java.io.IOException { /* 032 */ while (inputadapter_input.hasNext() && !stopEarly()) { /* 033 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 034 */ project_value = 0; /* 035 */ /* 036 */ boolean inputadapter_isNull = inputadapter_row.isNullAt(0); /* 037 */ Decimal inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getDecimal(0, 38, 0)); /* 038 */ if (!inputadapter_isNull) { /* 039 */ project_childHash = org.apache.spark.sql.catalyst.expressions.HiveHashFunction.normalizeDecimal( /* 040 */ inputadapter_value.toJavaBigDecimal(), true).hashCode(); /* 041 */ } /* 042 */ project_value = (31 * project_value) + project_childHash; /* 043 */ project_childHash = 0; /* 044 */ project_rowWriter.write(0, project_value); /* 045 */ append(project_result); /* 046 */ if (shouldStop()) return; /* 047 */ } /* 048 */ } ``` ## How was this patch tested? Added unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17495_decimal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17056 commit a378b3ef08cead4c915096f11de5bd371a405fef Author: Tejas PatilDate: 2017-02-24T07:35:16Z [SPARK-17495] [SQL] Support Decimal type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17050 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17053: [SPARK-18939][SQL] Timezone support in partition values.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17053 **[Test build #73409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73409/testReport)** for PR 17053 at commit [`c563a9a`](https://github.com/apache/spark/commit/c563a9a91e5ce872e10c7bfa528e9ea4688e333b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17055 **[Test build #73408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73408/testReport)** for PR 17055 at commit [`89eb03a`](https://github.com/apache/spark/commit/89eb03ad763538ec84cdd447cb51079881b4f9ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17050 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73397/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17050 **[Test build #73397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73397/testReport)** for PR 17050 at commit [`3cde705`](https://github.com/apache/spark/commit/3cde705c6baa1e4a869149f3ca289a5c1e3a3000). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16938 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73400/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...
GitHub user windpiger opened a pull request: https://github.com/apache/spark/pull/17055 [SPARK-19723][SQL]create datasource table with an non-existent location should work ## What changes were proposed in this pull request? This JIRA is a follow up work after SPARK-19583 As we discussed in that [PR|https://github.com/apache/spark/pull/16938] The following DDL for datasource table with an non-existent location should work: ``` CREATE TABLE ... (PARTITIONED BY ...) LOCATION path ``` Currently it will throw exception that path not exists ## How was this patch tested? unit test added You can merge this pull request into a Git repository by running: $ git pull https://github.com/windpiger/spark CTDataSourcePathNotExists Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17055.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17055 commit 89eb03ad763538ec84cdd447cb51079881b4f9ac Author: windpigerDate: 2017-02-24T07:33:23Z [SPARK-19723][SQL]create datasource table with an non-existent location should work --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16938 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16938 **[Test build #73400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)** for PR 16938 at commit [`afa1313`](https://github.com/apache/spark/commit/afa13136d6d24313c8f18bb7ed175bf45079476a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17053#discussion_r102890922 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -251,7 +251,8 @@ abstract class ExternalCatalog { def listPartitionsByFilter( db: String, table: String, - predicates: Seq[Expression]): Seq[CatalogTablePartition] + predicates: Seq[Expression], + defaultTimeZoneId: String): Seq[CatalogTablePartition] --- End diff -- Thank you, I'll add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17001 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73396/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17001 **[Test build #73396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73396/testReport)** for PR 17001 at commit [`9c0773b`](https://github.com/apache/spark/commit/9c0773b1d477d39f29ec44f2dcfe34d129706efe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17054: Refactored the code to remove redundency of count operat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17054 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17054: Refactored the code to remove redundency of count...
GitHub user HarshSharma8 opened a pull request: https://github.com/apache/spark/pull/17054 Refactored the code to remove redundency of count operation ## What changes were proposed in this pull request? Removed the redundant count operation which is generating same result when it not required to be performed twice. ## How was this patch tested? Its already a duplicate operation to be performed, so its already tested. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HarshSharma8/spark remove/redundency Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17054.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17054 commit 14785f52e5f4048ea687e97e7044b3de00716d89 Author: Harsh SharmaDate: 2017-02-24T07:15:14Z Refactored the code to remove redundency of count operation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17052 **[Test build #73407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73407/testReport)** for PR 17052 at commit [`e8a24e1`](https://github.com/apache/spark/commit/e8a24e1cc5f1a638ca23b00adbbcd909db28549d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17053#discussion_r102889140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -251,7 +251,8 @@ abstract class ExternalCatalog { def listPartitionsByFilter( db: String, table: String, - predicates: Seq[Expression]): Seq[CatalogTablePartition] + predicates: Seq[Expression], + defaultTimeZoneId: String): Seq[CatalogTablePartition] --- End diff -- we need to document what a timezone id is here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r10239 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -398,6 +398,27 @@ class FilterEstimationSuite extends StatsEstimationTestBase { // For all other SQL types, we compare the entire object directly. assert(filteredStats.attributeStats(ar) == expectedColStats) } - } +// If the filter has a binary operator (including those nested inside +// AND/OR/NOT), swap the sides of the attribte and the literal, reverse the +// operator, and then check again. +val rewrittenFilter = filterNode transformExpressionsDown { + case op @ EqualTo(ar: AttributeReference, l: Literal) => --- End diff -- Emm, we not only switch the side of the attr and the literal, but also reversed the operator, e.g. `LessThan` would be changed to `GreaterThan`. So I guess we can't use `withNewChildren` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.di...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16996 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16996 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16594 LGTM, pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r102888024 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -398,6 +398,27 @@ class FilterEstimationSuite extends StatsEstimationTestBase { // For all other SQL types, we compare the entire object directly. assert(filteredStats.attributeStats(ar) == expectedColStats) } - } +// If the filter has a binary operator (including those nested inside +// AND/OR/NOT), swap the sides of the attribte and the literal, reverse the +// operator, and then check again. +val rewrittenFilter = filterNode transformExpressionsDown { + case op @ EqualTo(ar: AttributeReference, l: Literal) => --- End diff -- ð I tried to find something like this but failed to, so I resorted to the current code. Thanks for the tip! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17053: [SPARK-18939][SQL] Timezone support in partition values.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17053 **[Test build #73406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73406/testReport)** for PR 17053 at commit [`49da287`](https://github.com/apache/spark/commit/49da287e174cf20e78c3ff0ef122d2ae0c34). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r102887733 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -398,6 +398,27 @@ class FilterEstimationSuite extends StatsEstimationTestBase { // For all other SQL types, we compare the entire object directly. assert(filteredStats.attributeStats(ar) == expectedColStats) } - } +// If the filter has a binary operator (including those nested inside +// AND/OR/NOT), swap the sides of the attribte and the literal, reverse the +// operator, and then check again. +val rewrittenFilter = filterNode transformExpressionsDown { + case op @ EqualTo(ar: AttributeReference, l: Literal) => --- End diff -- nit: `case b @ BinaryComparison(ar: AttributeReference, l: Literal) => b.withNewChildren(l, ar)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16938 **[Test build #73405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)** for PR 16938 at commit [`1f2ce17`](https://github.com/apache/spark/commit/1f2ce17e3d2eca92bc01b6a22e908bd8fd1d9592). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17051 **[Test build #73404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73404/testReport)** for PR 17051 at commit [`8881d58`](https://github.com/apache/spark/commit/8881d58ad65fb7f32a74610561230e3e800611a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/17053 [SPARK-18939][SQL] Timezone support in partition values. ## What changes were proposed in this pull request? This is a follow-up pr of #16308 and #16750. This pr enables timezone support in partition values. We should use `timeZone` option introduced at #16750 to parse/format partition values of the `TimestampType`. For example, if you have timestamp `"2016-01-01 00:00:00"` in `GMT` which will be used for partition values, the values written by the default timezone option, which is `"GMT"` because the session local timezone is `"GMT"` here, are: ```scala scala> spark.conf.set("spark.sql.session.timeZone", "GMT") scala> val df = Seq((1, new java.sql.Timestamp(145160640L))).toDF("i", "ts") df: org.apache.spark.sql.DataFrame = [i: int, ts: timestamp] scala> df.show() +---+---+ | i| ts| +---+---+ | 1|2016-01-01 00:00:00| +---+---+ scala> df.write.partitionBy("ts").save("/path/to/gmtpartition") ``` ```sh $ ls /path/to/gmtpartition/ _SUCCESSts=2016-01-01 00%3A00%3A00 ``` whereas setting the option to `"PST"`, they are: ```scala scala> df.write.option("timeZone", "PST").partitionBy("ts").save("/path/to/pstpartition") ``` ```sh $ ls /path/to/pstpartition/ _SUCCESSts=2015-12-31 16%3A00%3A00 ``` We can properly read the partition values if the session local timezone and the timezone of the partition values are the same: ```scala scala> spark.read.load("/path/to/gmtpartition").show() +---+---+ | i| ts| +---+---+ | 1|2016-01-01 00:00:00| +---+---+ ``` And even if the timezones are different, we can properly read the values with setting corrent timezone option: ```scala // wrong result scala> spark.read.load("/path/to/pstpartition").show() +---+---+ | i| ts| +---+---+ | 1|2015-12-31 16:00:00| +---+---+ // correct result scala> spark.read.option("timeZone", "PST").load("/path/to/pstpartition").show() +---+---+ | i| ts| +---+---+ | 1|2016-01-01 00:00:00| +---+---+ ``` ## How was this patch tested? Existing tests and added some tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-18939 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17053.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17053 commit 54e33690093d97d33de48f9665020a6296a8a909 Author: Takuya UESHINDate: 2017-02-14T07:12:10Z Modify FileFormatWriter to use timezone option. commit 2f0ca106cf60d57389e1725f3f61a784dbe98f70 Author: Takuya UESHIN Date: 2017-02-14T09:13:22Z Use timeZone option for PartitioningAwareFileIndex. commit 0e70ce6fe3b28c9448834f0dbb0c30f6a39669a2 Author: Takuya UESHIN Date: 2017-02-17T09:26:52Z Use stringSchema to make tests more explicitly. commit dae7eba86f3e1e3cb38c8b56c1f684374b9355f1 Author: Takuya UESHIN Date: 2017-02-20T07:57:09Z Use correct timezone for partition values for OptimizeMetadataOnlyQuery.. commit 49da287e174cf20e78c3ff0ef122d2ae0c34 Author: Takuya UESHIN Date: 2017-02-23T03:07:54Z Use correct timezone for partition values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r102887655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -1,511 +1,509 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.plans.logical.statsEstimation - -import java.sql.{Date, Timestamp} - -import scala.collection.immutable.{HashSet, Map} -import scala.collection.mutable - -import org.apache.spark.internal.Logging -import org.apache.spark.sql.catalyst.CatalystConf -import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.plans.logical._ -import org.apache.spark.sql.catalyst.util.DateTimeUtils -import org.apache.spark.sql.types._ - -case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Logging { - - /** - * We use a mutable colStats because we need to update the corresponding ColumnStat - * for a column after we apply a predicate condition. For example, column c has - * [min, max] value as [0, 100]. In a range condition such as (c > 40 AND c <= 50), - * we need to set the column's [min, max] value to [40, 100] after we evaluate the - * first condition c > 40. We need to set the column's [min, max] value to [40, 50] - * after we evaluate the second condition c <= 50. - */ - private var mutableColStats: mutable.Map[ExprId, ColumnStat] = mutable.Map.empty - - /** - * Returns an option of Statistics for a Filter logical plan node. - * For a given compound expression condition, this method computes filter selectivity - * (or the percentage of rows meeting the filter condition), which - * is used to compute row count, size in bytes, and the updated statistics after a given - * predicated is applied. - * - * @return Option[Statistics] When there is no statistics collected, it returns None. - */ - def estimate: Option[Statistics] = { -// We first copy child node's statistics and then modify it based on filter selectivity. -val stats: Statistics = plan.child.stats(catalystConf) -if (stats.rowCount.isEmpty) return None - -// save a mutable copy of colStats so that we can later change it recursively -mutableColStats = mutable.Map(stats.attributeStats.map(kv => (kv._1.exprId, kv._2)).toSeq: _*) - -// estimate selectivity of this filter predicate -val filterSelectivity: Double = calculateFilterSelectivity(plan.condition) match { - case Some(percent) => percent - // for not-supported condition, set filter selectivity to a conservative estimate 100% - case None => 1.0 -} - -// attributeStats has mapping Attribute-to-ColumnStat. -// mutableColStats has mapping ExprId-to-ColumnStat. -// We use an ExprId-to-Attribute map to facilitate the mapping Attribute-to-ColumnStat -val expridToAttrMap: Map[ExprId, Attribute] = - stats.attributeStats.map(kv => (kv._1.exprId, kv._1)) -// copy mutableColStats contents to an immutable AttributeMap. -val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] = - mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2) -val newColStats = AttributeMap(mutableAttributeStats.toSeq) - -val filteredRowCount: BigInt = - EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * filterSelectivity) -val filteredSizeInBytes = - EstimationUtils.getOutputSize(plan.output, filteredRowCount, newColStats) - -Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = Some(filteredRowCount), - attributeStats = newColStats)) - } - - /** - * Returns a percentage of rows meeting a compound condition in Filter node. - * A compound condition is decomposed into multiple single conditions linked with AND, OR, NOT. - * For
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r102887355 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -1,511 +1,509 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.plans.logical.statsEstimation - -import java.sql.{Date, Timestamp} - -import scala.collection.immutable.{HashSet, Map} -import scala.collection.mutable - -import org.apache.spark.internal.Logging -import org.apache.spark.sql.catalyst.CatalystConf -import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.plans.logical._ -import org.apache.spark.sql.catalyst.util.DateTimeUtils -import org.apache.spark.sql.types._ - -case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Logging { - - /** - * We use a mutable colStats because we need to update the corresponding ColumnStat - * for a column after we apply a predicate condition. For example, column c has - * [min, max] value as [0, 100]. In a range condition such as (c > 40 AND c <= 50), - * we need to set the column's [min, max] value to [40, 100] after we evaluate the - * first condition c > 40. We need to set the column's [min, max] value to [40, 50] - * after we evaluate the second condition c <= 50. - */ - private var mutableColStats: mutable.Map[ExprId, ColumnStat] = mutable.Map.empty - - /** - * Returns an option of Statistics for a Filter logical plan node. - * For a given compound expression condition, this method computes filter selectivity - * (or the percentage of rows meeting the filter condition), which - * is used to compute row count, size in bytes, and the updated statistics after a given - * predicated is applied. - * - * @return Option[Statistics] When there is no statistics collected, it returns None. - */ - def estimate: Option[Statistics] = { -// We first copy child node's statistics and then modify it based on filter selectivity. -val stats: Statistics = plan.child.stats(catalystConf) -if (stats.rowCount.isEmpty) return None - -// save a mutable copy of colStats so that we can later change it recursively -mutableColStats = mutable.Map(stats.attributeStats.map(kv => (kv._1.exprId, kv._2)).toSeq: _*) - -// estimate selectivity of this filter predicate -val filterSelectivity: Double = calculateFilterSelectivity(plan.condition) match { - case Some(percent) => percent - // for not-supported condition, set filter selectivity to a conservative estimate 100% - case None => 1.0 -} - -// attributeStats has mapping Attribute-to-ColumnStat. -// mutableColStats has mapping ExprId-to-ColumnStat. -// We use an ExprId-to-Attribute map to facilitate the mapping Attribute-to-ColumnStat -val expridToAttrMap: Map[ExprId, Attribute] = - stats.attributeStats.map(kv => (kv._1.exprId, kv._1)) -// copy mutableColStats contents to an immutable AttributeMap. -val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] = - mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2) -val newColStats = AttributeMap(mutableAttributeStats.toSeq) - -val filteredRowCount: BigInt = - EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * filterSelectivity) -val filteredSizeInBytes = - EstimationUtils.getOutputSize(plan.output, filteredRowCount, newColStats) - -Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = Some(filteredRowCount), - attributeStats = newColStats)) - } - - /** - * Returns a percentage of rows meeting a compound condition in Filter node. - * A compound condition is decomposed into multiple single conditions linked with AND, OR, NOT. - * For
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17052 **[Test build #73403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73403/testReport)** for PR 17052 at commit [`9eb57b7`](https://github.com/apache/spark/commit/9eb57b7294f2636e370be86cf975509917fdd861). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...
GitHub user uncleGen opened a pull request: https://github.com/apache/spark/pull/17052 [SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work ## What changes were proposed in this pull request? `StatefulAggregationStrategy` should check logicplan is streaming or not Test code: ``` case class Record(key: Int, value: String) val df = spark.createDataFrame((1 to 100).map(i => Record(i, s"value_$i"))).groupBy("value").count val lines = spark.readStream.format("socket").option("host", "localhost").option("port", "").load val words = lines.as[String].flatMap(_.split(" ")) val result = words.join(df, "value") ``` before pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- StateStoreSave [value#1], OperatorStateId(,0,0), Append, 0 +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- StateStoreRestore [value#1], OperatorStateId(,0,0) +- *HashAggregate(keys=[value#1], functions=[merge_count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` after pr: ``` == Physical Plan == *Project [value#13, count#19L] +- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight :- *Filter isnotnull(value#13) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#13] : +- MapPartitions , obj#12: java.lang.String :+- DeserializeToObject value#5.toString, obj#11: java.lang.String : +- StreamingRelation textSocket, [value#5] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) +- *HashAggregate(keys=[value#1], functions=[count(1)]) +- Exchange hashpartitioning(value#1, 200) +- *HashAggregate(keys=[value#1], functions=[partial_count(1)]) +- *Project [value#1] +- *Filter isnotnull(value#1) +- LocalTableScan [key#0, value#1] ``` ## How was this patch tested? add new unit test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uncleGen/spark SPARK-19690 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17052.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17052 commit e45b06e2495e09c6d7e7a50ee509044b526bf8d0 Author: uncleGenDate: 2017-02-22T10:18:31Z Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work commit 9eb57b7294f2636e370be86cf975509917fdd861 Author: uncleGen Date: 2017-02-24T06:38:41Z code clean --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16696 @cloud-fan @gatorsmile I've updated this pr and also added test cases, please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16594#discussion_r102887155 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN'; FORMAT: 'FORMAT'; LOGICAL: 'LOGICAL'; CODEGEN: 'CODEGEN'; +COST: 'COST'; --- End diff -- Thanks! Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17051#discussion_r102887105 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -1,511 +1,509 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.catalyst.plans.logical.statsEstimation - -import java.sql.{Date, Timestamp} - -import scala.collection.immutable.{HashSet, Map} -import scala.collection.mutable - -import org.apache.spark.internal.Logging -import org.apache.spark.sql.catalyst.CatalystConf -import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.plans.logical._ -import org.apache.spark.sql.catalyst.util.DateTimeUtils -import org.apache.spark.sql.types._ - -case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Logging { - - /** - * We use a mutable colStats because we need to update the corresponding ColumnStat - * for a column after we apply a predicate condition. For example, column c has - * [min, max] value as [0, 100]. In a range condition such as (c > 40 AND c <= 50), - * we need to set the column's [min, max] value to [40, 100] after we evaluate the - * first condition c > 40. We need to set the column's [min, max] value to [40, 50] - * after we evaluate the second condition c <= 50. - */ - private var mutableColStats: mutable.Map[ExprId, ColumnStat] = mutable.Map.empty - - /** - * Returns an option of Statistics for a Filter logical plan node. - * For a given compound expression condition, this method computes filter selectivity - * (or the percentage of rows meeting the filter condition), which - * is used to compute row count, size in bytes, and the updated statistics after a given - * predicated is applied. - * - * @return Option[Statistics] When there is no statistics collected, it returns None. - */ - def estimate: Option[Statistics] = { -// We first copy child node's statistics and then modify it based on filter selectivity. -val stats: Statistics = plan.child.stats(catalystConf) -if (stats.rowCount.isEmpty) return None - -// save a mutable copy of colStats so that we can later change it recursively -mutableColStats = mutable.Map(stats.attributeStats.map(kv => (kv._1.exprId, kv._2)).toSeq: _*) - -// estimate selectivity of this filter predicate -val filterSelectivity: Double = calculateFilterSelectivity(plan.condition) match { - case Some(percent) => percent - // for not-supported condition, set filter selectivity to a conservative estimate 100% - case None => 1.0 -} - -// attributeStats has mapping Attribute-to-ColumnStat. -// mutableColStats has mapping ExprId-to-ColumnStat. -// We use an ExprId-to-Attribute map to facilitate the mapping Attribute-to-ColumnStat -val expridToAttrMap: Map[ExprId, Attribute] = - stats.attributeStats.map(kv => (kv._1.exprId, kv._1)) -// copy mutableColStats contents to an immutable AttributeMap. -val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] = - mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2) -val newColStats = AttributeMap(mutableAttributeStats.toSeq) - -val filteredRowCount: BigInt = - EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * filterSelectivity) -val filteredSizeInBytes = - EstimationUtils.getOutputSize(plan.output, filteredRowCount, newColStats) - -Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = Some(filteredRowCount), - attributeStats = newColStats)) - } - - /** - * Returns a percentage of rows meeting a compound condition in Filter node. - * A compound condition is decomposed into multiple single conditions linked with AND, OR, NOT. - *
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16944 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73393/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16594 **[Test build #73402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73402/testReport)** for PR 16594 at commit [`6e10f84`](https://github.com/apache/spark/commit/6e10f840fed50b7e48898e73967bc35a29a6e23b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16944 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16944 **[Test build #73393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73393/testReport)** for PR 16944 at commit [`9b0b2bb`](https://github.com/apache/spark/commit/9b0b2bb3fbc7db9e71b3342014b729568290dffd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17051 **[Test build #73401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73401/testReport)** for PR 17051 at commit [`0f56d0f`](https://github.com/apache/spark/commit/0f56d0f1003268e4945ec5a427bbcc4bb7061a49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...
GitHub user lins05 opened a pull request: https://github.com/apache/spark/pull/17051 [SPARK-17075][SQL] Follow up: fix file line ending and improve the tests ## What changes were proposed in this pull request? Fixed the line ending of `FilterEstimation.scala`. Also improved the tests to cover more cases. ## How was this patch tested? Existing unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lins05/spark fix-cbo-filter-file-encoding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17051.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17051 commit ee6d9915b26254db176a5aa34c1d59e304e201e0 Author: Shuai LinDate: 2017-02-24T05:59:41Z [SPARK-17075][SQL] Follow up: fix file line ending and improve the tests. commit 0f56d0f1003268e4945ec5a427bbcc4bb7061a49 Author: Shuai Lin Date: 2017-02-24T05:58:37Z Use transformExpressionsDown to rewrite the filter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...
Github user lins05 commented on the issue: https://github.com/apache/spark/pull/17051 cc @ron8hu @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16938 **[Test build #73400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)** for PR 16938 at commit [`afa1313`](https://github.com/apache/spark/commit/afa13136d6d24313c8f18bb7ed175bf45079476a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16938 **[Test build #73399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73399/testReport)** for PR 16938 at commit [`8559e4e`](https://github.com/apache/spark/commit/8559e4e8f9b8e8f773f4d336866a01ff15c9fc5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17049 **[Test build #73398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73398/testReport)** for PR 17049 at commit [`c31b2b0`](https://github.com/apache/spark/commit/c31b2b068a945ef8ca39532292989e7c205b9951). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17049 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17049#discussion_r102882775 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala --- @@ -71,6 +75,242 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkConsistencyBetweenInterpretedAndCodegen(Crc32, BinaryType) } + + def checkHiveHash(value: Any, dataType: DataType, expected: Long): Unit = { +// Note : All expected hashes need to be computed using Hive 1.2.1 +val actual = HiveHashFunction.hash(value, dataType, seed = 0) +assert(actual == expected) --- End diff -- Added clue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/17049#discussion_r102882772 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -781,12 +780,12 @@ object HiveHashFunction extends InterpretedHashFunction { var i = 0 val length = struct.numFields while (i < length) { - result = (31 * result) + hash(struct.get(i, types(i)), types(i), seed + 1).toInt + result = (31 * result) + hash(struct.get(i, types(i)), types(i), 0).toInt --- End diff -- The `seed` is something used in murmur3 hash and hive hash does not need it. See original impl in Hive codebase : https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638 Since the methods related to hashing in Spark already had `seed`, I had to add it in hive-hash. When I compute the hash, I always need to set `seed` to 0 which is what is done here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17049#discussion_r102881875 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -781,12 +780,12 @@ object HiveHashFunction extends InterpretedHashFunction { var i = 0 val length = struct.numFields while (i < length) { - result = (31 * result) + hash(struct.get(i, types(i)), types(i), seed + 1).toInt + result = (31 * result) + hash(struct.get(i, types(i)), types(i), 0).toInt --- End diff -- Could you explain the reason? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73391/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16696 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16696 **[Test build #73391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73391/testReport)** for PR 16696 at commit [`5692939`](https://github.com/apache/spark/commit/56929391719053e72791abe127b10a3316b51141). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17050 **[Test build #73397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73397/testReport)** for PR 17050 at commit [`3cde705`](https://github.com/apache/spark/commit/3cde705c6baa1e4a869149f3ca289a5c1e3a3000). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73390/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15125 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15125 **[Test build #73390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73390/testReport)** for PR 15125 at commit [`11bc349`](https://github.com/apache/spark/commit/11bc349e55eaa5f687d376d1a05f3509459dbecd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17050 [SPARK-19722] [SQL] [MINOR] Clean up the usage of EliminateSubqueryAliases ### What changes were proposed in this pull request? In the PR https://github.com/apache/spark/pull/11403, we introduced the function `canonicalized` for eliminating useless subqueries. We can simply replace the call of rule `EliminateSubqueryAliases` by the function `canonicalized`. After we changed the view resolution and management, the current reason why we keep `EliminateSubqueryAliases ` in optimizer becomes out-of-dated. Thus, this PR also update the reason to `eager analysis of Dataset`. ### How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark eliminateSubquery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17050.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17050 commit d75b94c1abb4b60444a4191319c787a50a061bf9 Author: Xiao LiDate: 2017-02-24T05:08:02Z fix. commit 3cde705c6baa1e4a869149f3ca289a5c1e3a3000 Author: Xiao Li Date: 2017-02-24T05:23:05Z clean --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17049 Looks good except that comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17049#discussion_r102881054 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala --- @@ -71,6 +75,242 @@ class HashExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkConsistencyBetweenInterpretedAndCodegen(Crc32, BinaryType) } + + def checkHiveHash(value: Any, dataType: DataType, expected: Long): Unit = { +// Note : All expected hashes need to be computed using Hive 1.2.1 +val actual = HiveHashFunction.hash(value, dataType, seed = 0) +assert(actual == expected) --- End diff -- we should add a clue; otherwise we will never be able to tell what's going on if the tests fail on those randomized vlaues. ``` withClue(s"value is $value") { assert(.. } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 @gatorsmile + @rxin : I had made a note of your comments but was not able to get to it that time because I had other time critical projects to be worked on. I have put out a PR which improves the unit test coverage : https://github.com/apache/spark/pull/17049 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17049 **[Test build #73395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73395/testReport)** for PR 17049 at commit [`c589350`](https://github.com/apache/spark/commit/c5893502f52d073f30344a9fa8c4e11287207959). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17049 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73395/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17049 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17049 **[Test build #73395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73395/testReport)** for PR 17049 at commit [`c589350`](https://github.com/apache/spark/commit/c5893502f52d073f30344a9fa8c4e11287207959). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17001 **[Test build #73396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73396/testReport)** for PR 17001 at commit [`9c0773b`](https://github.com/apache/spark/commit/9c0773b1d477d39f29ec44f2dcfe34d129706efe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17049: [SPARK-17495] Add more tests for hive hash
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17049 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17049: [SPARK-17495] Add more tests for hive hash
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/17049 [SPARK-17495] Add more tests for hive hash ## What changes were proposed in this pull request? This PR adds tests hive-hash by comparing the outputs generated against Hive 1.2.1. Following datatypes are covered by this PR: - null - boolean - byte - short - int - long - float - double - string - array - map - struct Datatypes that I have _NOT_ covered but I will work on separately are: - Decimal - Calendar ## How was this patch tested? NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17495_remaining_types Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17049.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17049 commit c5893502f52d073f30344a9fa8c4e11287207959 Author: Tejas PatilDate: 2016-10-24T04:17:07Z Add more tests for hive hash --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/17001 yes,it is for HiveExternalCatalog. when I do this [PR](https://github.com/apache/spark/pull/16996), I found the logic. >The hive.metastore.warehouse.dir in sparkConf still take effect in Spark, it is not useless. The reason is that: 1.when we run spark with HiveEnabled, it will create ShareState 2.when create ShareState, it will create a HiveExternalCatalog https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L85 3.when create HiveExternalCatalog, it will Create HiveClientImpl by HiveUtils https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L65 4.when create HiveClientImpl, it will call SessionState.start(state) and then in the SessionState.start(state), it will create a default database using hive.metastore.warehouse.dir in hiveConf which is created in HiveClientImpl https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L189 while the hiveConf created in HiveClientImpl from hadoopConf and sparkConf, and sparkConf will overwrite the value of the same key in hadoopConf. So it means that it actually will use hive.metastore.warehouse.dir in sparkConf to create the default database, if we does not overwrite the value in sparkConf in SharedState, the database location is not we expected which is the warehouse path. So here sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir) should be retained **we can also find that,the default database does not created in SharedState, here condition is false, will not hit the create database logic. it has been created when we init the HiveClientImpl https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96** --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73394/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17048 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17048 **[Test build #73394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73394/testReport)** for PR 17048 at commit [`adeb5b7`](https://github.com/apache/spark/commit/adeb5b7ea313662a6ab0803acbda1ec8b88bac9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73387/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17038 **[Test build #73387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73387/testReport)** for PR 17038 at commit [`db5c287`](https://github.com/apache/spark/commit/db5c287e1223522de9c17391c3ea3025c938158e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17048 **[Test build #73394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73394/testReport)** for PR 17048 at commit [`adeb5b7`](https://github.com/apache/spark/commit/adeb5b7ea313662a6ab0803acbda1ec8b88bac9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/17048 ping @jkbradley , backport for branch2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy metho...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/17048 [SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala implementation ## What changes were proposed in this pull request? Fixed the PySpark Params.copy method to behave like the Scala implementation. The main issue was that it did not account for the _defaultParamMap and merged it into the explicitly created param map. ## How was this patch tested? Added new unit test to verify the copy method behaves correctly for copying uid, explicitly created params, and default params. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark pyspark-ml-param_copy-Scala_sync-SPARK-14772-2_1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17048.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17048 commit adeb5b7ea313662a6ab0803acbda1ec8b88bac9f Author: Bryan CutlerDate: 2017-02-01T23:19:57Z fixed Params.copy method to account for _defaultParamMap and match Scala implementation modified test case to include an explicitly set param reworked test to be Python 2.6 compatible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user kunalkhamar commented on the issue: https://github.com/apache/spark/pull/16826 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16996 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73389/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17047 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73385/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16996 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17047 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16996 **[Test build #73389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73389/testReport)** for PR 16996 at commit [`86deb62`](https://github.com/apache/spark/commit/86deb6233faa3b64c999786741a0b0cf3cbbe457). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17047 **[Test build #73385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73385/testReport)** for PR 17047 at commit [`000efb1`](https://github.com/apache/spark/commit/000efb1e3152f837e01ce1f80ae108c596f9baa5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user kunalkhamar commented on the issue: https://github.com/apache/spark/pull/16826 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73388/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16826 **[Test build #73388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73388/testReport)** for PR 16826 at commit [`16824f9`](https://github.com/apache/spark/commit/16824f916e87fd90706f9dfd7b7dd81d87b732dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16944 **[Test build #73393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73393/testReport)** for PR 16944 at commit [`9b0b2bb`](https://github.com/apache/spark/commit/9b0b2bb3fbc7db9e71b3342014b729568290dffd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark
Github user davies commented on the issue: https://github.com/apache/spark/pull/17036 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16395 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org