[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21757#discussion_r202248225 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) extends Rule[LogicalPlan] override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, _, _) -if DDLUtils.isDatasourceTable(tableMeta) => +if DDLUtils.isDatasourceTable(tableMeta) && + DDLUtils.convertSchema(tableMeta, sparkSession) => --- End diff -- I do not think this is a right fix. If the original table is the native data source table, we will always use our parquet/orc reader instead of hive serde. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #92962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92962/testReport)** for PR 21758 at commit [`c25ec47`](https://github.com/apache/spark/commit/c25ec473ff078c071aec513953f56c64e6a228a4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait BarrierTaskContext extends TaskContext ` * `class BarrierTaskContextImpl(` * `class RDDBarrier[T: ClassTag](rdd: RDD[T]) ` * `case class WorkerOffer(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92962/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21652 Looks like the integration tests have been failing for the past few runs. Otherwise LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21758 **[Test build #92962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92962/testReport)** for PR 21758 at commit [`c25ec47`](https://github.com/apache/spark/commit/c25ec473ff078c071aec513953f56c64e6a228a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/919/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21758 cc @mengxr @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the ext...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20795 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/21758 [SPARK-24795][CORE] Implement barrier execution mode ## What changes were proposed in this pull request? Propose new APIs and modify job/task scheduling to support barrier execution mode, which requires all tasks in a same barrier stage start at the same time, and retry all tasks in case some tasks fail in the middle. The barrier execution mode is useful for some ML/DL workloads. The proposed API changes include: `RDDBarrier` that marks an RDD as barrier (Spark must launch all the tasks together for the current stage). `BarrierTaskContext` that support global sync of all tasks in a barrier stage, and provide extra `BarrierTaskInfo`s. In DAGScheduler, we retry all tasks of a barrier stage in case some tasks fail in the middle, this is achieved by unregistering map outputs for a shuffleId (for ShuffleMapStage) or clear the finished partitions in an active job (for ResultStage). ## How was this patch tested? Add `RDDBarrierSuite` to ensure we convert RDDs correctly; Add new test cases in `DAGSchedulerSuite` to ensure we do task scheduling correctly; Add new test cases in `SparkContextSuite` to ensure the barrier execution mode actually works (both under local mode and local cluster mode). You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark barrier-execution-mode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21758 commit c25ec473ff078c071aec513953f56c64e6a228a4 Author: Xingbo Jiang Date: 2018-07-12T17:38:58Z implement barrier execution mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20795 Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21698 IIUC the output produced by `rdd1.zip(rdd2).map(v => (computeKey(v._1, v._2), computeValue(v._1, v._2)))` shall always have the same cardinality, no matter how many tasks are retried, so where is the data loss issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92961/testReport)** for PR 21745 at commit [`9e00db9`](https://github.com/apache/spark/commit/9e00db938ddc6293899170e19b41530b22fb525a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/918/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21757 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92960/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21757 **[Test build #92960 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92960/testReport)** for PR 21757 at commit [`a5d72cc`](https://github.com/apache/spark/commit/a5d72cc2cc77da7d8fab0cfc4a48959b774c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21745#discussion_r202242420 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2387,4 +2387,25 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val mapWithBinaryKey = map(lit(Array[Byte](1.toByte)), lit(1)) checkAnswer(spark.range(1).select(mapWithBinaryKey.getItem(Array[Byte](1.toByte))), Row(1)) } + + test("SPARK-24781: Using a reference from Dataset in Filter/Sort might not work") { +val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id") +val filter1 = df.select(df("name")).filter(df("id") === 0) +val filter2 = df.select(col("name")).filter(col("id") === 0) +checkAnswer(filter1, filter2.collect()) + +val sort1 = df.select(df("name")).orderBy(df("id")) +val sort2 = df.select(col("name")).orderBy(col("id")) +checkAnswer(sort1, sort2.collect()) + +withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") { --- End diff -- Will update it in next commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92958/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92958/testReport)** for PR 21745 at commit [`a98f416`](https://github.com/apache/spark/commit/a98f4161c682b90755e9599a437241dcaeb388b5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202240358 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: def createFilter(schema: MessageType, predicate: sources.Filter): Option[FilterPredicate] = { val nameToType = getFieldMap(schema) +def isDecimalMatched(value: Any, decimalMeta: DecimalMetadata): Boolean = value match { + case decimal: JBigDecimal => +decimal.scale == decimalMeta.getScale + case _ => false +} + +// Decimal type must make sure that filter value's scale matched the file. +// If doesn't matched, which would cause data corruption. +// Other types must make sure that filter value's type matched the file. --- End diff -- I would say like .. Parquet's type in the given file should be matched to the value's type in the pushed filter in order to push down the filter to Parquet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202239380 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -225,12 +316,44 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: def createFilter(schema: MessageType, predicate: sources.Filter): Option[FilterPredicate] = { val nameToType = getFieldMap(schema) +def isDecimalMatched(value: Any, decimalMeta: DecimalMetadata): Boolean = value match { + case decimal: JBigDecimal => +decimal.scale == decimalMeta.getScale + case _ => false +} + +// Decimal type must make sure that filter value's scale matched the file. --- End diff -- Shall we leave this comment around the decimal `case`s below or around `isDecimalMatched`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21556 @rdblue, so basically you mean it looks both equality comparison and nullsafe equality comparison are identically pushed down and looks it should be distinguished; otherwise, there could be a potential problem? If so, yup. I agree with it. I think we won't have actually a chance to push down equality comparison or nullsafe equality comparison with actual `null` value by the optimizer. However, sure, I think we shouldn't relay on it. I think actually we should disallow one of both nullsafe equality comparison or equality comparison with `null` in `ParquetFilters`. Thing is, I remember I checked the inside of Parquet's equality comparison API itself is actually nullsafe a long ago like few years ago - this of course should be double checked. Since this PR doesn't change the existing behaviour on this and looks needing some more investigation (e.g., checking if it is still (or it has been) true what I remembered and checked about Parquet's equality comparison), probably, it might be okay to leave it as is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21745 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21745#discussion_r202236189 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2387,4 +2387,25 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val mapWithBinaryKey = map(lit(Array[Byte](1.toByte)), lit(1)) checkAnswer(spark.range(1).select(mapWithBinaryKey.getItem(Array[Byte](1.toByte))), Row(1)) } + + test("SPARK-24781: Using a reference from Dataset in Filter/Sort might not work") { +val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id") +val filter1 = df.select(df("name")).filter(df("id") === 0) +val filter2 = df.select(col("name")).filter(col("id") === 0) +checkAnswer(filter1, filter2.collect()) + +val sort1 = df.select(df("name")).orderBy(df("id")) +val sort2 = df.select(col("name")).orderBy(col("id")) +checkAnswer(sort1, sort2.collect()) + +withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") { --- End diff -- This test case should be split to two. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92954/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #92954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92954/testReport)** for PR 20795 at commit [`26f2f54`](https://github.com/apache/spark/commit/26f2f540d30f2e87405489513220468e7708742b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...
Github user huaxingao commented on the issue: https://github.com/apache/spark/pull/21645 Thanks! @HyukjinKwon @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user yuanboliu commented on the issue: https://github.com/apache/spark/pull/21690 After applying this patch, my application can be running successfully. This issue could happen in the case of many topics(hundreds of ) consumed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...
Github user xdcjie commented on the issue: https://github.com/apache/spark/pull/21447 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20795 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the ext...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20795#discussion_r202231590 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1204,16 +1207,46 @@ class Analyzer( * only performs simple existence check according to the function identifier to quickly identify * undefined functions without triggering relation resolution, which may incur potentially * expensive partition/schema discovery process in some cases. - * + * In order to avoid duplicate external functions lookup, the external function identifier will + * store in the local hash set externalFunctionNameSet. * @see [[ResolveFunctions]] * @see https://issues.apache.org/jira/browse/SPARK-19737 */ object LookupFunctions extends Rule[LogicalPlan] { -override def apply(plan: LogicalPlan): LogicalPlan = plan.transformAllExpressions { - case f: UnresolvedFunction if !catalog.functionExists(f.name) => -withPosition(f) { - throw new NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName) -} +override def apply(plan: LogicalPlan): LogicalPlan = { + val externalFunctionNameSet = new mutable.HashSet[FunctionIdentifier]() + plan.transformAllExpressions { +case f: UnresolvedFunction + if externalFunctionNameSet.contains(normalizeFuncName(f.name)) => f +case f: UnresolvedFunction if catalog.isRegisteredFunction(f.name) => f +case f: UnresolvedFunction if catalog.isPersistentFunction(f.name) => + externalFunctionNameSet.add(normalizeFuncName(f.name)) + f +case f: UnresolvedFunction => + withPosition(f) { +throw new NoSuchFunctionException(f.name.database.getOrElse(catalog.getCurrentDatabase), + f.name.funcName) + } + } +} + +def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = { --- End diff -- This is a common utility function. We can refactor the code later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21757 **[Test build #92960 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92960/testReport)** for PR 21757 at commit [`a5d72cc`](https://github.com/apache/spark/commit/a5d72cc2cc77da7d8fab0cfc4a48959b774c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21757 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/917/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21757 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21757: [SQL][SPARK-24797] respect spark.sql.hive.convert...
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/21757 [SQL][SPARK-24797] respect spark.sql.hive.convertMetastoreOrc/Parquet when build⦠## What changes were proposed in this pull request? the current code path ignore the value of spark.sql.hive.convertMetastoreParquet when building data source table https://github.com/apache/spark/blob/e0559f238009e02c40f65678fec691c07904e8c0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L263 as a result, even I turned off spark.sql.hive.convertMetastoreParquet, Spark SQL still uses its own parquet reader to access table instead of delegate to serder This PR checks the value of the configuration when building data source table ## How was this patch tested? existing test You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark SPARK-24797 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21757.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21757 commit a5d72cc2cc77da7d8fab0cfc4a48959b774c Author: Nan Zhu Date: 2018-07-13T02:44:25Z respect respect spark.sql.hive.convertMetastoreOrc/Parquet when build the data source table --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/21757 @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21645: [SPARK-24537][R]Add array_remove / array_zip / ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21645 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21645 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92959 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92959/testReport)** for PR 21748 at commit [`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92959/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/916/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/916/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/916/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92959/testReport)** for PR 21748 at commit [`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21748 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21263: [SPARK-24084][ThriftServer] Add job group id for ...
Github user caneGuy closed the pull request at: https://github.com/apache/spark/pull/21263 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21745 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21698 > Given this, there is no ambiguity in cardinality of zip().map() ... which two tuples from rdd1 and rdd2 get zip'ed together can be arbitrary : and I agree about that. yes, but the following `.groupByKey().map()` has ambiguity in cardinality because the tulples get zipped can be arbitrary, isn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92949/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21608 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21608 **[Test build #92949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92949/testReport)** for PR 21608 at commit [`98ee81b`](https://github.com/apache/spark/commit/98ee81ba8581e57ff0bc098d0b05254cf72adada). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/915/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21745 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21745 **[Test build #92958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92958/testReport)** for PR 21745 at commit [`a98f416`](https://github.com/apache/spark/commit/a98f4161c682b90755e9599a437241dcaeb388b5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21745#discussion_r202217276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1165,15 +1173,19 @@ class Analyzer( (newExprs, AnalysisBarrier(newChild)) case p: Project => +// Resolving expressions against current plan. val maybeResolvedExprs = exprs.map(resolveExpression(_, p)) +// Recursively resolving expressions on the child of current plan. val (newExprs, newChild) = resolveExprsAndAddMissingAttrs(maybeResolvedExprs, p.child) -val missingAttrs = AttributeSet(newExprs) -- AttributeSet(maybeResolvedExprs) +// If some attributes used by expressions are resolvable only on the rewritten child +// plan, we need to add them into original projection. +val missingAttrs = (AttributeSet(newExprs) -- p.outputSet).intersect(newChild.outputSet) --- End diff -- Without this `intersect`, some tests fail, e.g., `group-analytics.sql` in `SQLQueryTestSuite`. Some attributes are resolved on parent plans, not on child plans. We can't add them as missing attributes here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92957/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92957/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/914/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/914/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/914/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21556#discussion_r202214356 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -Parquet Vectorized3785 / 3867 4.2 240.6 1.0X -Parquet Vectorized (Pushdown) 3820 / 3928 4.1 242.9 1.0X -Native ORC Vectorized 3981 / 4049 4.0 253.1 1.0X -Native ORC Vectorized (Pushdown) 702 / 735 22.4 44.6 5.4X +Parquet Vectorized4407 / 4852 3.6 280.2 1.0X +Parquet Vectorized (Pushdown) 1602 / 1634 9.8 101.8 2.8X --- End diff -- Here is a test: ```scala // decimal(9, 2) max values is 999.99 // 1024 * 1024 * 15 = 15728640 val path = "/tmp/spark/parquet" spark.range(1024 * 1024 * 15).selectExpr("cast((id) as decimal(9, 2)) as id").orderBy("id").write.mode("overwrite").parquet(path) ``` The generated parquet metadata: ```shell $ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta /tmp/spark/parquet file: file:/tmp/spark/parquet/part-0-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:5728640 TS:36 OFFSET:4 id: INT32 SNAPPY DO:0 FPO:4 SZ:38/36/0.95 VC:5728640 ENC:PLAIN,BIT_PACKED,RLE ST:[no stats for this column] file: file:/tmp/spark/parquet/part-1-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:651016 TS:2604209 OFFSET:4 id: INT32 SNAPPY DO:0 FPO:4 SZ:2604325/2604209/1.00 VC:651016 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 0.00, max: 651015.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-2-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:3231146 TS:12925219 OFFSET:4 id: INT32 SNAPPY DO:0 FPO:4 SZ:12925864/12925219/1.00 VC:3231146 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 651016.00, max: 3882161.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-3-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator: parquet-mr version 1.10.0 (build 031a6654009e3b82020012a18434c582bd74c73a) extra: org.apache.spark.sql.parquet.row.metadata = {"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]} file schema: spark_schema id: OPTIONAL INT32 O:DECIMAL R:0 D:1 row group 1: RC:2887956 TS:11552408 OFFSET:4 id: INT32 SNAPPY DO:0 FPO:4 SZ:11552986/11552408/1.00 VC:2887956 ENC:PLAIN,BIT_PACKED,RLE ST:[min: 3882162.00, max: 6770117.00, num_nulls: 0] file: file:/tmp/spark/parquet/part-4-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet creator:
[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21750 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21750 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92953/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21750 **[Test build #92953 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92953/testReport)** for PR 21750 at commit [`55f70ee`](https://github.com/apache/spark/commit/55f70ee3ee146a41c6f89121c2544959302cd79d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92956/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92957/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92956/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21583 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/913/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/913/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/913/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21753: [SPARK-24790][SQL] Allow complex aggregate expres...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21753 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21753: [SPARK-24790][SQL] Allow complex aggregate expres...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21753#discussion_r202211733 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -586,12 +581,17 @@ class Analyzer( } } -private def isAggregateExpression(expr: Expression): Boolean = { - expr match { -case Alias(e, _) => isAggregateExpression(e) -case AggregateExpression(_, _, _, _) => true -case _ => false - } +// Support any aggregate expression that can appear in an Aggregate plan except Pandas UDF. +// TODO: Support Pandas UDF. +private def checkValidAggregateExpression(expr: Expression): Unit = expr match { + case _: AggregateExpression => // OK and leave the argument check to CheckAnalysis. + case expr: PythonUDF if PythonUDF.isGroupedAggPandasUDF(expr) => --- End diff -- I created a JIRA for this support https://issues.apache.org/jira/browse/SPARK-24796 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21753: [SPARK-24790][SQL] Allow complex aggregate expressions i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21753 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92956/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21583 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 **[Test build #92955 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92955/testReport)** for PR 21583 at commit [`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92955/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/912/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/912/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21583 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21583 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/912/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92947/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #92947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92947/testReport)** for PR 21102 at commit [`7d789e2`](https://github.com/apache/spark/commit/7d789e221dd6c6d4d7176dcec87a867ec5386a60). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21163 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/21556 @wangyum, can you explain what was happening with the `decimal(9,2)` benchmark more clearly? I asked additional questions, but the thread is on a line that changed so it's collapsed by default. Also, `valueCanMakeFilterOn` returns true for all null values, so I think we still have a problem there. Conversion from EqualNullSafe needs to support null filter values. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21163 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92950/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org