[GitHub] spark pull request #16565: [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Pre...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/16565 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16565 Okay and thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve a...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16565 Thanks! Merging to 2.0 Could you please close it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16578 Does this take over https://github.com/apache/spark/pull/14957? If so, we might need `Closes #14957` in the PR description for the merge script to close that one or let the author know this takes over that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16517 Left a few comments. I am not 100% sure whether `HiveFileFormat` can completely replace the existing writer containers, but the other changes look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96131580 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -99,7 +99,7 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) } private def getFilename(taskContext: TaskAttemptContext, ext: String): String = { -// The file name looks like part-r-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet +// The file name looks like part-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet --- End diff -- After more reading, the exist hive table writers do not have such an issue. It is based on a unique ID [`TaskAttemptID`](https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/TaskAttemptID.html), which is generated by the function call of `FileOutputFormat.getTaskOutputPath` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/16578 Maybe we also want to get feedback from @liancheng ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data size of Bl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15730 **[Test build #71390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71390/testReport)** for PR 15730 at commit [`55dabe0`](https://github.com/apache/spark/commit/55dabe07e655c44b07634b102f20db74d2107ad4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15730: [SPARK-18218][ML][MLLib] Reduce shuffled data siz...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/15730#discussion_r96131333 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -459,14 +464,155 @@ class BlockMatrix @Since("1.3.0") ( */ @Since("1.3.0") def multiply(other: BlockMatrix): BlockMatrix = { +multiply(other, 1) + } + + /** + * Left multiplies this [[BlockMatrix]] to `other`, another [[BlockMatrix]]. This method add --- End diff -- All right, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16553 LGTM. cc @marmbrus for final sign off --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16553: [SPARK-9435][SQL] Reuse function in Java UDF to c...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16553#discussion_r96131308 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -109,9 +109,10 @@ class UDFRegistration private[sql] (functionRegistry: FunctionRegistry) extends | * @since 1.3.0 | */ |def register(name: String, f: UDF$i[$extTypeArgs, _], returnType: DataType): Unit = { + | val func = f$anyCast.call($anyParams) | functionRegistry.registerFunction( |name, - |(e: Seq[Expression]) => ScalaUDF(f$anyCast.call($anyParams), returnType, e)) + |(e: Seq[Expression]) => ScalaUDF(func, returnType, e)) --- End diff -- I can confirm they are the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96130960 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use a global lock here due to complexities downstream with ensuring // children RDD partitions point to the correct parent partitions. In the future // we should revisit this consideration. +if (doCheckpointCalled) { --- End diff -- `doCheckpointCalled && !isCheckpointed` If it is already checkpoint'ed successfully, then it is fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96131251 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1726,6 +1729,10 @@ abstract class RDD[T: ClassTag]( // checkpoint ourselves dependencies.foreach(_.rdd.doCheckpoint()) } + if (storageLevel == StorageLevel.NONE) { +logInfo(s"do checkpoint on unpersisted RDD ${id}, it will cause RDD recomputation" + + " when saving checkpoint files.") --- End diff -- Can you please change the message to something like : `s"checkpoint on unpersisted RDD $this will result in recomputation"` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96131096 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use a global lock here due to complexities downstream with ensuring // children RDD partitions point to the correct parent partitions. In the future // we should revisit this consideration. +if (doCheckpointCalled) { + logWarning(s"Because job has been executed on RDD ${id}, checkpoint won't work") --- End diff -- We have to decide if we simply log a message and ignore it (if it is a design choice) or if we need to fix this. From git, @mateiz and @zsxwing were the last people to work on it this - would be great to hear from them : this might be a variant of SPARK-6847 which @zsxwing fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #71389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71389/testReport)** for PR 14038 at commit [`d08ff73`](https://github.com/apache/spark/commit/d08ff737615775fbdb1df4da932d6f8f1230080c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71388/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #71388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71388/testReport)** for PR 14038 at commit [`85b0f61`](https://github.com/apache/spark/commit/85b0f6137472504b1bd2d982b9155128c0bfbeaf). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class PathFilter extends Serializable ` * `class MetadataLogFileIndex(sparkSession: SparkSession, path: Path, pathFilter: PathFilter)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96130318 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -86,6 +86,42 @@ class DetermineHiveSerde(conf: SQLConf) extends Rule[LogicalPlan] { } } +class HiveAnalysis(session: SparkSession) extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +case InsertIntoTable(table: MetastoreRelation, partSpec, query, overwrite, ifNotExists) +if hasBeenPreprocessed(table.output, table.partitionKeys.toStructType, partSpec, query) => + InsertIntoHiveTable(table, partSpec, query, overwrite, ifNotExists) + +case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => + // Currently we will never hit this branch, as SQL string API can only use `Ignore` or + // `ErrorIfExists` mode, and `DataFrameWriter.saveAsTable` doesn't support hive serde + // tables yet. + if (mode == SaveMode.Append || mode == SaveMode.Overwrite) { +throw new AnalysisException( + "CTAS for hive serde tables does not support append or overwrite semantics.") + } --- End diff -- The above codes need to merge from the latest master build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96130310 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -86,6 +86,42 @@ class DetermineHiveSerde(conf: SQLConf) extends Rule[LogicalPlan] { } } +class HiveAnalysis(session: SparkSession) extends Rule[LogicalPlan] { + override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +case InsertIntoTable(table: MetastoreRelation, partSpec, query, overwrite, ifNotExists) +if hasBeenPreprocessed(table.output, table.partitionKeys.toStructType, partSpec, query) => + InsertIntoHiveTable(table, partSpec, query, overwrite, ifNotExists) + +case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => + // Currently we will never hit this branch, as SQL string API can only use `Ignore` or + // `ErrorIfExists` mode, and `DataFrameWriter.saveAsTable` doesn't support hive serde + // tables yet. + if (mode == SaveMode.Append || mode == SaveMode.Overwrite) { +throw new AnalysisException( + "CTAS for hive serde tables does not support append or overwrite semantics.") + } + + val dbName = tableDesc.identifier.database.getOrElse(session.catalog.currentDatabase) + CreateHiveTableAsSelectCommand( +tableDesc.copy(identifier = tableDesc.identifier.copy(database = Some(dbName))), +query, +mode == SaveMode.Ignore) + } + + private def hasBeenPreprocessed( --- End diff -- Also add a code comment for this func? ``` /** * Returns true if the [[InsertIntoTable]] plan has already been preprocessed by analyzer rule * [[PreprocessTableInsertion]]. It is important that this rule([[HiveAnalysis]]) has to * be run after [[PreprocessTableInsertion]], to normalize the column names in partition spec and * fix the schema mismatch by adding Cast. */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16429#discussion_r96130191 --- Diff: python/pyspark/serializers.py --- @@ -382,18 +382,30 @@ def _hijack_namedtuple(): return global _old_namedtuple # or it will put in closure +global _old_namedtuple_kwdefaults # or it will put in closure too def _copy_func(f): return types.FunctionType(f.__code__, f.__globals__, f.__name__, f.__defaults__, f.__closure__) +def _kwdefaults(f): +kargs = getattr(f, "__kwdefaults__", None) --- End diff -- Could you put this comment into code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96130080 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -99,7 +99,7 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) } private def getFilename(taskContext: TaskAttemptContext, ext: String): String = { -// The file name looks like part-r-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet +// The file name looks like part-0-2dd664f9-d2c4-4ffe-878f-c6c70c1fb0cb_3.gz.parquet --- End diff -- The ext string is always starting from `c`. Below is the example I got from a test case. `part-0-fd8f3fdd-653a-4ea0-ab6d-5c8ad610b184-c000.snappy.parquet` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16585 ping @rxin @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16395 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16395 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71387/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16395 **[Test build #71387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71387/testReport)** for PR 16395 at commit [`f83c713`](https://github.com/apache/spark/commit/f83c713ced6a4473f86fd4d35c586732ca6fd4a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #71388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71388/testReport)** for PR 14038 at commit [`85b0f61`](https://github.com/apache/spark/commit/85b0f6137472504b1bd2d982b9155128c0bfbeaf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16410: [SPARK-19005][SQL] Keep column ordering when a schema is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16410 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16410: [SPARK-19005][SQL] Keep column ordering when a schema is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16410 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71385/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16410: [SPARK-19005][SQL] Keep column ordering when a schema is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16410 **[Test build #71385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71385/testReport)** for PR 16410 at commit [`e3e095a`](https://github.com/apache/spark/commit/e3e095a7821de3c1e4da44280fe2c12f6a36d3f3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71384/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #71384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71384/testReport)** for PR 14038 at commit [`4e3628b`](https://github.com/apache/spark/commit/4e3628b8e6695793e69b9dc81647e18728c4f751). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class PathFilter extends Serializable ` * `class MetadataLogFileIndex(sparkSession: SparkSession, path: Path, pathFilter: PathFilter)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SPARK-17458][SQL][Backport-2.0] Preserve a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16565 @gatorsmile How about this fix? plz check this again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16517#discussion_r96128666 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -69,34 +69,31 @@ import org.apache.spark.util.SerializableJobConf * {{{ * Map('a' -> Some('1'), 'b' -> None) * }}}. - * @param child the logical plan representing data to write to. + * @param query the logical plan representing data to write to. * @param overwrite overwrite existing table or partitions. * @param ifNotExists If true, only write if the table or partition does not exist. */ case class InsertIntoHiveTable( table: MetastoreRelation, partition: Map[String, Option[String]], -child: SparkPlan, +query: LogicalPlan, overwrite: Boolean, -ifNotExists: Boolean) extends UnaryExecNode { +ifNotExists: Boolean) extends RunnableCommand { - @transient private val sessionState = sqlContext.sessionState.asInstanceOf[HiveSessionState] - @transient private val externalCatalog = sqlContext.sharedState.externalCatalog + override protected def innerChildren: Seq[LogicalPlan] = query :: Nil --- End diff -- +1 Let me see whether we can add such a test case to hit the bug without it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16565 okay! I'll update them --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16565 I think it is fine to do it together. Basically, your PR is to fix the bug of https://github.com/apache/spark/pull/15111 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16395 **[Test build #71387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71387/testReport)** for PR 16395 at commit [`f83c713`](https://github.com/apache/spark/commit/f83c713ced6a4473f86fd4d35c586732ca6fd4a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 gentle ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user ron8hu commented on the issue: https://github.com/apache/spark/pull/16395 cc @rxin @wzhfy Have updated code. Please review again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16395 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16395 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71386/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16553 @marmbrus, could you take another look when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16395 **[Test build #71386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71386/testReport)** for PR 16395 at commit [`65c9635`](https://github.com/apache/spark/commit/65c9635e065406aa77e7681f7385fecd1444f8b5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16395: [SPARK-17075][SQL] implemented filter estimation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16395 **[Test build #71386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71386/testReport)** for PR 16395 at commit [`65c9635`](https://github.com/apache/spark/commit/65c9635e065406aa77e7681f7385fecd1444f8b5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96128057 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/Range.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans.logical.statsEstimation + +import java.math.{BigDecimal => JDecimal} +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.types.{BooleanType, DateType, TimestampType, _} + + +/** Value range of a column. */ +trait Range + +/** For simplicity we use decimal to unify operations of numeric ranges. */ +case class NumericRange(min: JDecimal, max: JDecimal) extends Range + +/** + * This version of Spark does not have min/max for binary/string types, we define their default + * behaviors by this class. + */ +class DefaultRange extends Range + +/** This is for columns with only null values. */ +class NullRange extends Range + +object Range { + def apply(min: Option[Any], max: Option[Any], dataType: DataType): Range = dataType match { +case StringType | BinaryType => new DefaultRange() +case _ if min.isEmpty || max.isEmpty => new NullRange() +case _ => toNumericRange(min.get, max.get, dataType) + } + + /** + * For simplicity we use decimal to unify operations of numeric types, the two methods below + * are the contract of conversion. + */ + private def toNumericRange(min: Any, max: Any, dataType: DataType): NumericRange = { +dataType match { + case _: NumericType => +NumericRange(new JDecimal(min.toString), new JDecimal(max.toString)) + case BooleanType => +val min1 = if (min.asInstanceOf[Boolean]) 1 else 0 +val max1 = if (max.asInstanceOf[Boolean]) 1 else 0 +NumericRange(new JDecimal(min1), new JDecimal(max1)) + case DateType => +val min1 = DateTimeUtils.fromJavaDate(min.asInstanceOf[Date]) +val max1 = DateTimeUtils.fromJavaDate(max.asInstanceOf[Date]) +NumericRange(new JDecimal(min1), new JDecimal(max1)) + case TimestampType => +val min1 = DateTimeUtils.fromJavaTimestamp(min.asInstanceOf[Timestamp]) +val max1 = DateTimeUtils.fromJavaTimestamp(max.asInstanceOf[Timestamp]) +NumericRange(new JDecimal(min1), new JDecimal(max1)) + case _ => +throw new AnalysisException(s"Type $dataType is not castable to numeric in estimation.") --- End diff -- OK. removed this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15928: [SPARK-18478][SQL] Support codegen'd Hive UDFs
Github user maropu commented on the issue: https://github.com/apache/spark/pull/15928 @hvanhovell Could you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15945: [SPARK-12978][SQL] Merge unnecessary partial aggregates
Github user maropu commented on the issue: https://github.com/apache/spark/pull/15945 @cloud-fan ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16213: [SPARK-18020][Streaming][Kinesis] Checkpoint SHARD_END t...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16213 @brkyvz ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16565: [SPARK-17237][SQL][Backport-2.0] Remove backticks in a p...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16565 @gatorsmile ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation
Github user ron8hu commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r96128021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/Range.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.plans.logical.statsEstimation + +import java.math.{BigDecimal => JDecimal} +import java.sql.{Date, Timestamp} + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.types.{BooleanType, DateType, TimestampType, _} + + +/** Value range of a column. */ +trait Range + +/** For simplicity we use decimal to unify operations of numeric ranges. */ +case class NumericRange(min: JDecimal, max: JDecimal) extends Range + +/** + * This version of Spark does not have min/max for binary/string types, we define their default + * behaviors by this class. + */ +class DefaultRange extends Range + +/** This is for columns with only null values. */ +class NullRange extends Range + +object Range { + def apply(min: Option[Any], max: Option[Any], dataType: DataType): Range = dataType match { +case StringType | BinaryType => new DefaultRange() +case _ if min.isEmpty || max.isEmpty => new NullRange() +case _ => toNumericRange(min.get, max.get, dataType) + } + + /** + * For simplicity we use decimal to unify operations of numeric types, the two methods below + * are the contract of conversion. + */ + private def toNumericRange(min: Any, max: Any, dataType: DataType): NumericRange = { +dataType match { + case _: NumericType => +NumericRange(new JDecimal(min.toString), new JDecimal(max.toString)) + case BooleanType => +val min1 = if (min.asInstanceOf[Boolean]) 1 else 0 +val max1 = if (max.asInstanceOf[Boolean]) 1 else 0 +NumericRange(new JDecimal(min1), new JDecimal(max1)) + case DateType => +val min1 = DateTimeUtils.fromJavaDate(min.asInstanceOf[Date]) +val max1 = DateTimeUtils.fromJavaDate(max.asInstanceOf[Date]) +NumericRange(new JDecimal(min1), new JDecimal(max1)) + case TimestampType => +val min1 = DateTimeUtils.fromJavaTimestamp(min.asInstanceOf[Timestamp]) --- End diff -- Yes. Added date and timestamp tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16410: [SPARK-19005][SQL] Keep column ordering when a sc...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/16410 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16410: [SPARK-19005][SQL] Keep column ordering when a schema is...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16410 I looked around the code and then I though this is an expected behaviour, so I'll close this. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16410: [SPARK-19005][SQL] Keep column ordering when a schema is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16410 **[Test build #71385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71385/testReport)** for PR 16410 at commit [`e3e095a`](https://github.com/apache/spark/commit/e3e095a7821de3c1e4da44280fe2c12f6a36d3f3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #71384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71384/testReport)** for PR 14038 at commit [`4e3628b`](https://github.com/apache/spark/commit/4e3628b8e6695793e69b9dc81647e18728c4f751). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin @ash211 Thanks a lot for your comments; I've changed accordingly. Please give another look at this~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16503: [SPARK-18113] Use ask to replace askWithRetry in ...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16503#discussion_r96127359 --- Diff: core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala --- @@ -221,6 +229,22 @@ private case class OutputCommitFunctions(tempDirPath: String) { if (ctx.attemptNumber == 0) failingOutputCommitter else successfulOutputCommitter) } + // Receiver should be idempotent for AskPermissionToCommitOutput + def callCanCommitMultipleTimes(iter: Iterator[Int]): Unit = { +val ctx = TaskContext.get() +val canCommit1 = SparkEnv.get.outputCommitCoordinator + .canCommit(ctx.stageId(), ctx.partitionId(), ctx.attemptNumber()) +val canCommit2 = SparkEnv.get.outputCommitCoordinator + .canCommit(ctx.stageId(), ctx.partitionId(), ctx.attemptNumber()) +if(canCommit1 && canCommit2) { + Utils.createDirectory(tempDirPath) --- End diff -- Yes, using `assert` is better here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16569: [SPARK-19206][DOC][DStream] Fix outdated parameter descr...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71382/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16500 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16500 **[Test build #71382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71382/testReport)** for PR 16500 at commit [`14da2b6`](https://github.com/apache/spark/commit/14da2b652a93e20131b7c61077312bbc3b1cc0ae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16464 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71383/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71383/testReport)** for PR 16464 at commit [`95a6910`](https://github.com/apache/spark/commit/95a69106ca52844bafdf820b50ed8353d6c80a25). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set optimizer c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16464 **[Test build #71383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71383/testReport)** for PR 16464 at commit [`95a6910`](https://github.com/apache/spark/commit/95a69106ca52844bafdf820b50ed8353d6c80a25). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16566#discussion_r96125393 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -38,6 +45,146 @@ setClass("KMeansModel", representation(jobj = "jobj")) #' @note LDAModel since 2.1.0 setClass("LDAModel", representation(jobj = "jobj")) +#' Bisecting K-Means Clustering Model +#' +#' Fits a bisecting k-means clustering model against a Spark DataFrame. +#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make +#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models. +#' +#' @param data a SparkDataFrame for training. +#' @param formula a symbolic description of the model to be fitted. Currently only a few formula +#'operators are supported, including '~', '.', ':', '+', and '-'. +#'Note that the response variable of formula is empty in spark.bisectingKmeans. +#' @param k the desired number of leaf clusters. Must be > 1. +#' The actual number could be smaller if there are no divisible leaf clusters. +#' @param maxIter maximum iteration number. +#' @param minDivisibleClusterSize The minimum number of points (if greater than or equal to 1.0) +#'or the minimum proportion of points (if less than 1.0) of a divisible cluster. +#' @param seed the random seed. +#' @param ... additional argument(s) passed to the method. +#' @return \code{spark.bisectingKmeans} returns a fitted bisecting k-means model. +#' @rdname spark.bisectingKmeans +#' @aliases spark.bisectingKmeans,SparkDataFrame,formula-method +#' @name spark.bisectingKmeans +#' @export +#' @examples +#' \dontrun{ +#' sparkR.session() +#' data(iris) +#' df <- createDataFrame(iris) +#' model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4) +#' summary(model) +#' +#' # fitted values on training data +#' fitted <- predict(model, df) +#' head(select(fitted, "Sepal_Length", "prediction")) +#' +#' # save fitted model to input path +#' path <- "path/to/model" +#' write.ml(model, path) +#' +#' # can also read back the saved model and print +#' savedModel <- read.ml(path) +#' summary(savedModel) +#' } +#' @note spark.bisectingKmeans since 2.2.0 +#' @seealso \link{predict}, \link{read.ml}, \link{write.ml} +setMethod("spark.bisectingKmeans", signature(data = "SparkDataFrame", formula = "formula"), + function(data, formula, k = 4, maxIter = 20, minDivisibleClusterSize = 1.0, seed = NULL) { --- End diff -- I will address comments soon. Now, debugging. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/16464#discussion_r96125283 --- Diff: R/pkg/R/mllib_clustering.R --- @@ -404,11 +411,14 @@ setMethod("summary", signature(object = "LDAModel"), vocabSize <- callJMethod(jobj, "vocabSize") topics <- dataFrame(callJMethod(jobj, "topics", maxTermsPerTopic)) vocabulary <- callJMethod(jobj, "vocabulary") +trainingLogLikelihood <- callJMethod(jobj, "trainingLogLikelihood") +logPrior <- callJMethod(jobj, "logPrior") --- End diff -- @felixcheung I will update it to NA. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16500: [SPARK-19120] Refresh Metadata Cache After Loading Hive ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16500 **[Test build #71382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71382/testReport)** for PR 16500 at commit [`14da2b6`](https://github.com/apache/spark/commit/14da2b652a93e20131b7c61077312bbc3b1cc0ae). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71380/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71380/testReport)** for PR 16503 at commit [`55e4fd3`](https://github.com/apache/spark/commit/55e4fd3f99c2daa7e7f375b8a8b9df453d86d83b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71377/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16582: [SPARK-19220][UI] Make redirection to HTTPS apply to all...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16582 Pinging some (random?) people @ajbozarth @srowen @sarutak --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71377/testReport)** for PR 16503 at commit [`b867b92`](https://github.com/apache/spark/commit/b867b92d6d0641aa45ca6d23b846993e38a27814). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16579 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71381/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71381/testReport)** for PR 16579 at commit [`337d02d`](https://github.com/apache/spark/commit/337d02d3fcdec696eb9e89bcad8094d5b34c1f59). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71378/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71378/testReport)** for PR 16503 at commit [`eb5367a`](https://github.com/apache/spark/commit/eb5367a48b71268acf27088b21cf843c34e71020). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71379/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71379/testReport)** for PR 16503 at commit [`fce3a36`](https://github.com/apache/spark/commit/fce3a360dbab2404d7c39c8da751f3ea6d75122b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16583: [SPARK-19129] [SQL] SessionCatalog: Disallow empt...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16583#discussion_r96122416 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -247,6 +247,16 @@ class HiveDDLSuite } } + test("SPARK-19129: drop partition with a empty string will drop the whole table") { +val df = spark.createDataFrame(Seq((0, "a"), (1, "b"))).toDF("partCol1", "name") + df.write.mode("overwrite").partitionBy("partCol1").saveAsTable("partitionedTable") +val e = intercept[AnalysisException] { + spark.sql("alter table partitionedTable drop partition(partCol1='')") --- End diff -- @tejasapatil Thank you for your research So far, we are not completely following Hive in the partition-related DDL commands. `DROP PARTITION` is an example. If the users-specified spec does not exist, we will throw an exception. Instead, Hive just silently ignores it without any exception, but Hive will always report which partition is dropped after the command. Thus, maybe we can improve this in the future PR. Thus, this PR is to follow the same way to block the invalid inputs. That is, throwing an exception when the input partition spec is not valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16503 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71375/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71375/testReport)** for PR 16503 at commit [`26c9a2f`](https://github.com/apache/spark/commit/26c9a2fa2b76f538224de0d37e85aeb257b9a19c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16549: [SPARK-19151][SQL]DataFrameWriter.saveAsTable support hi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16549 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16549: [SPARK-19151][SQL]DataFrameWriter.saveAsTable sup...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16549 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16549: [SPARK-19151][SQL]DataFrameWriter.saveAsTable support hi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16549 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attr...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/16581#discussion_r96121284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -86,6 +86,19 @@ trait PredicateHelper { */ protected def canEvaluate(expr: Expression, plan: LogicalPlan): Boolean = expr.references.subsetOf(plan.outputSet) + + /** + * Returns true iff `expr` could be evaluated as a condition within join. + */ + protected def canEvaluateWithinJoin(expr: Expression): Boolean = { +expr.find { + case e: SubqueryExpression => +// non-correlated subquery will be replaced as literal +e.children.nonEmpty + case e: Unevaluable => true --- End diff -- `Unevaluable` is not evaluable. This block tries to find a case that is not evaluable in a join, and then negates it by isEmpty. I have to admit that we should document this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16576: [SPARK-19215] Add necessary check for `RDD.checkpoint` t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16576 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16576: [SPARK-19215] Add necessary check for `RDD.checkpoint` t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16576 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71374/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16576: [SPARK-19215] Add necessary check for `RDD.checkpoint` t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16576 **[Test build #71374 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71374/testReport)** for PR 16576 at commit [`4b4fad7`](https://github.com/apache/spark/commit/4b4fad7a46c4a85ad8907118edc6b97b436c935f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16579: [WIP][SPARK-19218][SQL] SET command should show a result...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16579 **[Test build #71381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71381/testReport)** for PR 16579 at commit [`337d02d`](https://github.com/apache/spark/commit/337d02d3fcdec696eb9e89bcad8094d5b34c1f59). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71380/testReport)** for PR 16503 at commit [`55e4fd3`](https://github.com/apache/spark/commit/55e4fd3f99c2daa7e7f375b8a8b9df453d86d83b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16503 **[Test build #71379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71379/testReport)** for PR 16503 at commit [`fce3a36`](https://github.com/apache/spark/commit/fce3a360dbab2404d7c39c8da751f3ea6d75122b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16527: [SPARK-19146][Core]Drop more elements when stageData.tas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16527 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org