[GitHub] spark pull request #17070: [SPARK-19721][SS] Good error message for version ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/17070#discussion_r106348166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -195,6 +195,11 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](sparkSession: SparkSession, path: val input = fileManager.open(batchMetadataFile) try { Some(deserialize(input)) + } catch { +case ise: IllegalStateException => + // re-throw the exception with the log file path added + throw new IllegalStateException( +s"Failed to read log file $batchMetadataFile. ${ise.getMessage}") --- End diff -- nit: please also add `ise` as the cause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74642/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74642/testReport)** for PR 17287 at commit [`25da5f6`](https://github.com/apache/spark/commit/25da5f6bfe99e1bf81856a353e7d572a8594a759). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74638/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74638/testReport)** for PR 17287 at commit [`4214379`](https://github.com/apache/spark/commit/421437951df5d3bb551dc62428bbd3c23cd94f4e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74637/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74637/testReport)** for PR 17287 at commit [`80df8c7`](https://github.com/apache/spark/commit/80df8c74fc2280d9ca3d9fa2c6a624c6970ed6da). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17291 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74645/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17291 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r106344896 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -71,7 +71,6 @@ class JDBCSuite extends SparkFunSuite conn.prepareStatement("insert into test.people values ('mary', 2)").executeUpdate() conn.prepareStatement( "insert into test.people values ('joe ''foo'' \"bar\"', 3)").executeUpdate() -conn.commit() --- End diff -- This is a mistake --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17291 **[Test build #74645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74645/testReport)** for PR 17291 at commit [`23c1c3e`](https://github.com/apache/spark/commit/23c1c3e01b64879e5889d6d08c8f824283574574). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class BadRecordException(` * `class DataSourceReader(mode: String, numFields: Int, corruptFieldIndex: Option[Int])` * `class RowWithBadRecord(var row: InternalRow, index: Int, var record: UTF8String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17308#discussion_r106344532 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala --- @@ -32,7 +31,7 @@ import org.apache.spark.sql.types.{BinaryType, StringType} * automatically trigger task aborts. */ private[kafka010] class KafkaWriteTask( -producerConfiguration: ju.Map[String, Object], +producerConfiguration: ju.HashMap[String, Object], --- End diff -- Ideally this should not have been changed. But, they are changed to HashMap, to avoid converting or casting them later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16476 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74639/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16476 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16476 **[Test build #74639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74639/testReport)** for PR 16476 at commit [`4e60b7c`](https://github.com/apache/spark/commit/4e60b7c52c0ca9e20296256607ce78741d80cea3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ImplicitCastInputTypesToSameType extends ExpectsInputTypes ` * `case class Field(children: Seq[Expression]) extends Expression` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17308 **[Test build #74644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74644/testReport)** for PR 17308 at commit [`febf387`](https://github.com/apache/spark/commit/febf3874cf07bad04e574b571f1caa839c9c28b7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17291: [SPARK-19949][SQL][WIP] unify bad record handling in CSV...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17291 **[Test build #74645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74645/testReport)** for PR 17291 at commit [`23c1c3e`](https://github.com/apache/spark/commit/23c1c3e01b64879e5889d6d08c8f824283574574). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17308: [SPARK-19968][SS] Use a cached instance of `Kafka...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/17308 [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch. ## What changes were proposed in this pull request? Changes include a new API for doing cleanup of resources in KafkaSink is added to Sink trait. In summary, cost of recreating a KafkaProducer for writing every batch is high as it starts a lot threads and make connections and then closes them. A KafkaProducer instance is promised to be thread safe in Kafka docs. Reuse of KafkaProducer instance while writing via multiple threads is encouraged. Furthermore, I have performance improvement of 10x in latency, with this patch. TODO: post exact results. ## How was this patch tested? Running distributed benchmarks comparing runs with this patch and without it. Added relevant unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark cached-kafka-producer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17308 commit febf3874cf07bad04e574b571f1caa839c9c28b7 Author: Prashant Sharma Date: 2017-03-15T11:03:45Z [SPARK-19968][SS] Use a cached instance of KafkaProducer instead of creating one every batch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17270 Yeah. you need to close the PR by yourself. We are unable to close it. thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16626 Could you add a scenario when users add a column name that already exists in the table schema? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74643/testReport)** for PR 17287 at commit [`d82e8ed`](https://github.com/apache/spark/commit/d82e8eda4eed494604b131f1448fd93be3c1e33a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r106342233 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -71,7 +71,6 @@ class JDBCSuite extends SparkFunSuite conn.prepareStatement("insert into test.people values ('mary', 2)").executeUpdate() conn.prepareStatement( "insert into test.people values ('joe ''foo'' \"bar\"', 3)").executeUpdate() -conn.commit() --- End diff -- Why? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r106342123 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1860,4 +1860,72 @@ class HiveDDLSuite } } } + + Seq("PARQUET", "ORC", "TEXTFILE", "SEQUENCEFILE", "RCFILE", "AVRO").foreach { tableType => +test(s"alter hive serde table add columns -- partitioned - $tableType") { + withTable("alter_add_partitioned") { --- End diff -- The name is confusing. Let us just simplify it to `tab`. We already can know the scenario by the test case name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106342082 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- yes, you are right, let me add a try catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r106341966 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1860,4 +1860,72 @@ class HiveDDLSuite } } } + + Seq("PARQUET", "ORC", "TEXTFILE", "SEQUENCEFILE", "RCFILE", "AVRO").foreach { tableType => --- End diff -- If the list is complete, we can create a variable and reuse it in the future test cases in `HiveCatalogedDDLSuite `. Let us create it now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74642/testReport)** for PR 17287 at commit [`25da5f6`](https://github.com/apache/spark/commit/25da5f6bfe99e1bf81856a353e7d572a8594a759). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r106341499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -175,6 +178,78 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) +if (wasCached) { + try { +sparkSession.catalog.uncacheTable(table.unquotedString) + } catch { +case NonFatal(e) => log.warn(e.toString, e) + } +} --- End diff -- No need to check if it is cached or not. Just uncache it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 anyway, I will move it to optimizer in next update. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17286#discussion_r106341181 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -696,6 +696,13 @@ object SQLConf { .intConf .createWithDefault(12) + val JOIN_REORDER_CARD_WEIGHT = +buildConf("spark.sql.cbo.joinReorder.card.weight") + .doc("The weight of cardinality (number of rows) for plan cost comparison in join reorder: " + +"rows * weight + size * (1 - weight).") + .doubleConf + .createWithDefault(0.7) --- End diff -- What is boundary of this? adding `check`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106340900 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- Then, this `reset()` could be skipped if hitting an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/17084 Thanks for the PR. I think this is helpful. Will take a look next week. Quite swamped recently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout Thanks a lot for the comments :) very helpful. I've refined, please take another look when you have time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-19591][ML][MLlib] Add sample weights to decision ...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/16722 @jkbradley might you be able to take a look at the changes from @sethah ? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goo...
Github user hhbyyh closed the pull request at: https://github.com/apache/spark/pull/11780 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11780: [SPARK-8884][MLlib] 1-sample Anderson-Darling Goodness-o...
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/11780 Close this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340513 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -893,6 +893,7 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg val taskSet = FakeTask.createTaskSet(4) // Set the speculation multiplier to be 0 so speculative tasks are launched immediately sc.conf.set("spark.speculation.multiplier", "0.0") +sc.conf.set("spark.speculation", "true") --- End diff -- This should be set. Because the duration is inserted to `MedianHeap` only when `spark.speculation`(e.g. If I remove this, `MedianHeap` will be empty when call `checkSpeculatableTasks`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340321 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl private[scheduler]( if (!isLocal && conf.getBoolean("spark.speculation", false)) { logInfo("Starting speculative execution thread") - speculationScheduler.scheduleAtFixedRate(new Runnable { + speculationScheduler.scheduleWithFixedDelay(new Runnable { --- End diff -- I was thinking `checkSpeculatableTasks` will synchronize `TaskSchedulerImpl`. If `checkSpeculatableTasks` doesn't finish with 100ms, then the possibility exists for that thread to release and then immediately re-acquire the lock. Should this be included in this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106340219 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- You need to leave a comment to explain it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17123 @crackcell I'm not sure about changing the UDF to be on a row instead of a column, I've found that the serialization costs are much higher and the spark code performs much less. Maybe an expert like @cloud-fan can comment more here? Can you keep the UDF on a column instead of a row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r106339981 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -171,34 +173,34 @@ object Bucketizer extends DefaultParamsReadable[Bucketizer] { * Binary searching in several buckets to place each data point. * @param splits array of split points * @param feature data point - * @param keepInvalid NaN flag. - *Set "true" to make an extra bucket for NaN values; - *Set "false" to report an error for NaN values + * @param keepInvalid NaN/NULL flag. + *Set "true" to make an extra bucket for NaN/NULL values; + *Set "false" to report an error for NaN/NULL values * @return bucket for each data point * @throws SparkException if a feature is < splits.head or > splits.last */ private[feature] def binarySearchForBuckets( splits: Array[Double], - feature: Double, + feature: Option[Double], keepInvalid: Boolean): Double = { -if (feature.isNaN) { +if (feature.getOrElse(Double.NaN).isNaN) { --- End diff -- I think you can equivalently write this as: if (feature.isEmpty) { --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17251 **[Test build #74641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74641/testReport)** for PR 17251 at commit [`c951084`](https://github.com/apache/spark/commit/c9510847c8eeb5f5da3b63c38ac835d1c3491815). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in...
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/17123#discussion_r106339731 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -105,20 +106,21 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0") override val uid: String transformSchema(dataset.schema) val (filteredDataset, keepInvalid) = { if (getHandleInvalid == Bucketizer.SKIP_INVALID) { -// "skip" NaN option is set, will filter out NaN values in the dataset +// "skip" NaN/NULL option is set, will filter out NaN/NULL values in the dataset (dataset.na.drop().toDF(), false) } else { (dataset.toDF(), getHandleInvalid == Bucketizer.KEEP_INVALID) } } -val bucketizer: UserDefinedFunction = udf { (feature: Double) => +val bucketizer: UserDefinedFunction = udf { (row: Row) => --- End diff -- I believe you should try to avoid using a udf on a row because the serialization costs will be more expensive... hmm how could we make this perform well and handle nulls? Does it work with Option[Double] instead of Row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17251 Thank you, @cloud-fan ! I updated the PR according to the review comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16867 **[Test build #74640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74640/testReport)** for PR 16867 at commit [`104e867`](https://github.com/apache/spark/commit/104e86773d9e688e35a2273ce71379e8d03b9f81). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17130 Refined some comments and minor things. This should be ready for review. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16209 @sureshthalamati https://github.com/apache/spark/pull/17171 has been resolved. Can you update your PR by allowing users to specify the schema in DDL format? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17085: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17085 ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106338768 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,23 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved && s.children.head.dataType == IntegerType && + s.children.head.foldable => +val schema = s.elementSchema +Stack(children.zipWithIndex.map { + case (e, 0) => e + case (Literal(null, NullType), index: Int) => +Literal.create(null, schema.fields((index - 1) % schema.length).dataType) --- End diff -- Yep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17086 ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17084: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/17084 ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17289 This is the design decision we need to make here. Spark SQL is kind of a federation system. Two write APIs behave differently. The `saveAsTable` API expects users to register it in the **global catalog** before usage. The `save` API skips the global catalog and relies on the connectors to communicate with the **local catalog**. The users might not realize the difference. ```Scala df.write.format("xyz").mode(SaveMode.ErrorIfExists) .saveAsTable("j1") ``` ```Scala df.write.format("xyz").mode(SaveMode.ErrorIfExists) .save() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17286#discussion_r106338345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala --- @@ -128,38 +128,43 @@ case class CostBasedJoinReorder(conf: CatalystConf) extends Rule[LogicalPlan] wi object JoinReorderDP extends PredicateHelper { def search( - conf: CatalystConf, + conf: SQLConf, items: Seq[LogicalPlan], conditions: Set[Expression], topOutput: AttributeSet): Option[LogicalPlan] = { // Level i maintains all found plans for i + 1 items. // Create the initial plans: each plan is a single item with zero cost. -val itemIndex = items.zipWithIndex +val itemIndex = items.zipWithIndex.map(_.swap).toMap val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map { - case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 0)) -}.toMap) + case (id, item) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 0)) +}) -for (lev <- 1 until items.length) { +// Build plans for next levels until the last level has only one plan. This plan contains +// all items that can be joined, so there's no need to continue. +while (foundPlans.size < items.length && foundPlans.last.size > 1) { // Build plans for the next level. foundPlans += searchLevel(foundPlans, conf, conditions, topOutput) } -val plansLastLevel = foundPlans(items.length - 1) -if (plansLastLevel.isEmpty) { - // Failed to find a plan, fall back to the original plan - None -} else { - // There must be only one plan at the last level, which contains all items. - assert(plansLastLevel.size == 1 && plansLastLevel.head._1.size == items.length) - Some(plansLastLevel.head._2.plan) +// Find the best plan +assert(foundPlans.last.size <= 1) --- End diff -- how about ``` while (foundPlans.size < items.length && foundPlans.last.size > 0) ``` When we end the while loop, either we have reached the level n, or the current level has 0 entries. Then we pick the last level which has non-zero entries, and pick the best entry from this level, and construct the final join plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17307: [SPARK-13369] Make number of consecutive fetch fa...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17307#discussion_r106337950 --- Diff: docs/configuration.md --- @@ -1506,6 +1506,11 @@ Apart from these, the following properties are also available, and may be useful of this setting is to act as a safety-net to prevent runaway uncancellable tasks from rendering an executor unusable. + spark.stage.maxConsecutiveAttempts + 4 + +Number of consecutive stage retries allowed before a stage is aborted. --- End diff -- Hah sorry for all of the comment changes from the combination of Imran and me!! But I agree that this was an issue before and would be good to update. Thanks for the many updates here @sitalkedia. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17297 **[Test build #74631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74631/testReport)** for PR 17297 at commit [`901c9bf`](https://github.com/apache/spark/commit/901c9bf55247f0489519d976ca9729e5babbd292). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17297 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74631/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17297: [SPARK-14649][CORE] DagScheduler should not run duplicat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17297 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16476 **[Test build #74639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74639/testReport)** for PR 16476 at commit [`4e60b7c`](https://github.com/apache/spark/commit/4e60b7c52c0ca9e20296256607ce78741d80cea3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106337249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -156,9 +156,21 @@ case class Stack(children: Seq[Expression]) extends Generator { } } + private def findDataType(column: Integer): DataType = { --- End diff -- Right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106337141 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,23 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved && s.children.head.dataType == IntegerType && --- End diff -- Yep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106336735 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -76,468 +102,500 @@ class SessionCatalogSuite extends PlanTest { } test("create databases using invalid names") { -val catalog = new SessionCatalog(newEmptyCatalog()) -testInvalidName(name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +withEmptyCatalog { catalog => + testInvalidName( +name => catalog.createDatabase(newDb(name), ignoreIfExists = true)) +} } test("get database when a database exists") { -val catalog = new SessionCatalog(newBasicCatalog()) -val db1 = catalog.getDatabaseMetadata("db1") -assert(db1.name == "db1") -assert(db1.description.contains("db1")) +withBasicCatalog { catalog => + val db1 = catalog.getDatabaseMetadata("db1") + assert(db1.name == "db1") + assert(db1.description.contains("db1")) +} } test("get database should throw exception when the database does not exist") { -val catalog = new SessionCatalog(newBasicCatalog()) -intercept[NoSuchDatabaseException] { - catalog.getDatabaseMetadata("db_that_does_not_exist") +withBasicCatalog { catalog => + intercept[NoSuchDatabaseException] { +catalog.getDatabaseMetadata("db_that_does_not_exist") + } } } test("list databases without pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +withBasicCatalog { catalog => + assert(catalog.listDatabases().toSet == Set("default", "db1", "db2", "db3")) +} } test("list databases with pattern") { -val catalog = new SessionCatalog(newBasicCatalog()) -assert(catalog.listDatabases("db").toSet == Set.empty) -assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) -assert(catalog.listDatabases("*1").toSet == Set("db1")) -assert(catalog.listDatabases("db2").toSet == Set("db2")) +withBasicCatalog { catalog => + assert(catalog.listDatabases("db").toSet == Set.empty) + assert(catalog.listDatabases("db*").toSet == Set("db1", "db2", "db3")) + assert(catalog.listDatabases("*1").toSet == Set("db1")) + assert(catalog.listDatabases("db2").toSet == Set("db2")) +} } test("drop database") { -val catalog = new SessionCatalog(newBasicCatalog()) -catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) -assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +withBasicCatalog { catalog => + catalog.dropDatabase("db1", ignoreIfNotExists = false, cascade = false) + assert(catalog.listDatabases().toSet == Set("default", "db2", "db3")) +} } test("drop database when the database is not empty") { // Throw exception if there are functions left -val externalCatalog1 = newBasicCatalog() -val sessionCatalog1 = new SessionCatalog(externalCatalog1) -externalCatalog1.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) -externalCatalog1.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) -intercept[AnalysisException] { - sessionCatalog1.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) +withBasicCatalog { catalog => + catalog.externalCatalog.dropTable("db2", "tbl1", ignoreIfNotExists = false, purge = false) + catalog.externalCatalog.dropTable("db2", "tbl2", ignoreIfNotExists = false, purge = false) + intercept[AnalysisException] { +catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) + } } - -// Throw exception if there are tables left -val externalCatalog2 = newBasicCatalog() -val sessionCatalog2 = new SessionCatalog(externalCatalog2) -externalCatalog2.dropFunction("db2", "func1") -intercept[AnalysisException] { - sessionCatalog2.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) +withBasicCatalog { catalog => + // Throw exception if there are tables left + catalog.externalCatalog.dropFunction("db2", "func1") + intercept[AnalysisException] { +catalog.dropDatabase("db2", ignoreIfNotExists = false, cascade = false) + } } -// When cascade is true, it should drop them -val externalCatalog3 = newBasicCatalog() -val sessionCatalog3 = new SessionCatalog(externalCatalog3)
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74638/testReport)** for PR 17287 at commit [`4214379`](https://github.com/apache/spark/commit/421437951df5d3bb551dc62428bbd3c23cd94f4e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17287: [SPARK-19945][SQL]add test suite for SessionCatalog with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17287 **[Test build #74637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74637/testReport)** for PR 17287 at commit [`80df8c7`](https://github.com/apache/spark/commit/80df8c74fc2280d9ca3d9fa2c6a624c6970ed6da). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16981: [SPARK-19637][SQL] Add to_json in FunctionRegistry
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16981 cc @maropu https://github.com/apache/spark/pull/17171 is merged. Are you interested in working on `from_json`? JIRA: https://issues.apache.org/jira/browse/SPARK-19967 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106336045 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => + p.copy(parameters = Map.empty, storage = p.storage.copy( +properties = Map.empty, locationUri = None, serde = None))).toSet + +val expectedPartsNormalize = expectedParts.map(p => +p.copy(parameters = Map.empty, storage = p.storage.copy( + properties = Map.empty, locationUri = None, serde = None))).toSet + +actualPartsNormalize == expectedPartsNormalize +//actualParts.map(p => +// p.copy(storage = p.storage.copy( +//properties = Map.empty, locationUri = None))).toSet == +// expectedParts.map(p => +//p.copy(storage = p.storage.copy(properties = Map.empty, locationUri = None))).toSet --- End diff -- sorry, let me remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106335967 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- here the `SessionCatalog` is instanced with different `conf` parameter. In `withBasicCatalog`, it just leave it default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106335824 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1365,19 +1369,27 @@ class DAGScheduler( */ private[scheduler] def handleExecutorLost( execId: String, - filesLost: Boolean, + fileLost: Boolean, + hostLost: Boolean = false, + maybeHost: Option[String] = None, --- End diff -- I find this method pretty confusing now, but it was also confusing before, and I'm not sure how to clean it up yet. one minor thing: instead of having a `hostLost` and `maybeHost`, could there be a `hostToDeregisterAllShuffleOutput: Option[String]`, and you replace `if (hostLost) {...}` with `hostToDeregisterAllShuffleOutput.foreach{...}` etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106330358 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1331,7 +1328,14 @@ class DAGScheduler( // TODO: mark the executor as failed only if there were lots of fetch failures on it if (bmAddress != null) { -handleExecutorLost(bmAddress.executorId, filesLost = true, Some(task.epoch)) +if (!env.blockManager.externalShuffleServiceEnabled) { --- End diff -- I think these two cases are reversed, aren't they? Its a bit harder to keep straight with a negation in there, rather than switch the bodies, I'd just change it to `if (env.blockManager.externalShuffleServiceEnabled)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17088: [SPARK-19753][CORE] Un-register all shuffle outpu...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/17088#discussion_r106335566 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -394,6 +394,32 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with Timeou assertDataStructuresEmpty() } + test("All shuffle files should on the slave should be cleaned up when slave lost") { +// reset the test context with the right shuffle service config +afterEach() +val conf = new SparkConf() +conf.set("spark.shuffle.service.enabled", "true") +init(conf) +runEvent(ExecutorAdded("exec-hostA1", "hostA")) +runEvent(ExecutorAdded("exec-hostA2", "hostA")) +runEvent(ExecutorAdded("exec-hostB", "hostB")) +val shuffleMapRdd = new MyRDD(sc, 3, Nil) +val shuffleDep = new ShuffleDependency(shuffleMapRdd, new HashPartitioner(1)) +val shuffleId = shuffleDep.shuffleId +val reduceRdd = new MyRDD(sc, 1, List(shuffleDep), tracker = mapOutputTracker) +submit(reduceRdd, Array(0)) +complete(taskSets(0), Seq( + (Success, makeMapStatus("hostA", 1)), + (Success, makeMapStatus("hostA", 1)), + (Success, makeMapStatus("hostB", 1 +scheduler.handleExecutorLost("exec-hostA1", fileLost = false, hostLost = true, Some("hostA")) +runEvent(ExecutorLost("exec-hostA1", SlaveLost("", true))) +val mapStatus = mapOutputTracker.mapStatuses.get(0).get.filter(_!= null) --- End diff -- I think there are a couple of problems with this test. * you are trying to change the behavior on a fetch failure, so really you should have tasks completing with a `FetchFailed` * `makeMapStatus` is actually doing the wrong thing in this case, since its expecting executor ids to be "exec-$host", but you've got a "1" or "2" appended to some of them I think this is better: ```scala submit(reduceRdd, Array(0)) // map stage completes successfully, with one task on each executor complete(taskSets(0), Seq( (Success, MapStatus(BlockManagerId("exec-hostA1", "hostA", 12345), Array.fill[Long](1)(2))), (Success, MapStatus(BlockManagerId("exec-hostA2", "hostA", 12345), Array.fill[Long](1)(2))), (Success, makeMapStatus("hostB", 1)) )) // make sure our test setup is correct val initialMapStatus = mapOutputTracker.mapStatuses.get(0).get assert(initialMapStatus.count(_ != null) === 3) assert(initialMapStatus.map{_.location.executorId}.toSet === Set("exec-hostA1", "exec-hostA2", "exec-hostB")) // reduce stage fails with a fetch failure from one host complete(taskSets(1), Seq( (FetchFailed(BlockManagerId("exec-hostA2", "hostA", 12345), shuffleId, 0, 0, "ignored"), null) )) // Here is the main assertion -- make sure that we de-register the map output from both executors on hostA val mapStatus = mapOutputTracker.mapStatuses.get(0).get assert(mapStatus.count(_ != null) === 1) assert(mapStatus(2).location.executorId === "exec-hostB") assert(mapStatus(2).location.host === "hostB") ``` this version fails until you reverse the if / else I pointed out in the dagscheduler. it would also be nice if this included map output from multiple stages registered on the given host, so you could check that *all* output is deregistered, not just the one shuffleId which had an error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106335778 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- Yes, it is~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17171: [SPARK-19830] [SQL] Add parseTableSchema API to P...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17171 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17171 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17289 It's pretty weird that we need to check if table exists in spark catalog and then check if data exists in data source, by the same save mode specified by users. I think the new behavior is more reasonable, or we should ask users to provide 2 save modes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16971: [SPARK-19573][SQL] Make NaN/null handling consist...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16971#discussion_r106335040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala --- @@ -245,7 +245,7 @@ object ApproximatePercentile { val result = new Array[Double](percentages.length) var i = 0 while (i < percentages.length) { - result(i) = summaries.query(percentages(i)) + result(i) = summaries.query(percentages(i)).get --- End diff -- Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16971 ping @zhengruifeng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r106334932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- If we try to cast all types to `DoubleType`, the test case for all `StringType` will fail. While if we try to cast all types to `StringType`, parameters of '3' and '3.0' won't be determined as equal. I have a solution to balance, we look at the 1st parameter, if it's of `NumericType`, we implicitly cast all parameters to `DoubleType`, else we cast all parameters to `StringType`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106334939 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- Because Hive metastore fills the values after we calling the Hive APIs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74635/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17171 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17171: [SPARK-19830] [SQL] Add parseTableSchema API to ParserIn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17171 **[Test build #74635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74635/testReport)** for PR 17171 at commit [`b18ae84`](https://github.com/apache/spark/commit/b18ae84c1f0485d929e58d217c1881d037721881). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106334827 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1094,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => --- End diff -- Because Hive metastore fills the values after we call the Hive APIs? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106334626 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -999,257 +1083,279 @@ class SessionCatalogSuite extends PlanTest { expectedParts: CatalogTablePartition*): Boolean = { // ExternalCatalog may set a default location for partitions, here we ignore the partition // location when comparing them. -actualParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet == - expectedParts.map(p => p.copy(storage = p.storage.copy(locationUri = None))).toSet +val actualPartsNormalize = actualParts.map(p => + p.copy(parameters = Map.empty, storage = p.storage.copy( +properties = Map.empty, locationUri = None, serde = None))).toSet + +val expectedPartsNormalize = expectedParts.map(p => +p.copy(parameters = Map.empty, storage = p.storage.copy( + properties = Map.empty, locationUri = None, serde = None))).toSet + +actualPartsNormalize == expectedPartsNormalize +//actualParts.map(p => +// p.copy(storage = p.storage.copy( +//properties = Map.empty, locationUri = None))).toSet == +// expectedParts.map(p => +//p.copy(storage = p.storage.copy(properties = Map.empty, locationUri = None))).toSet --- End diff -- ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106334485 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1270,6 +1376,7 @@ class SessionCatalogSuite extends PlanTest { } assert(cause.getMessage.contains("Undefined function: 'undefined_fn'")) + catalog.reset() --- End diff -- Instead of adding `reset`, why not using your new function `withBasicCatalog`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r106334083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +343,105 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. When the parameters have different + * types, comparing will be done based on type firstly. For example, ''999'' 's type is StringType, + * while 999's type is IntegerType, so that no further comparison need to be done since they have + * different types. + * If the search expression is NULL, the return value is 0 because NULL fails equality comparison + * with any value. + * To also point out, no implicit cast will be done in this expression. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +// scalastyle:on line.size.limit +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- I met a problem, if I first try to cast all parameters to DoubleType in `TypeCoercion`(as described in 2nd paragraph last comment). I should really cast in order to know it can success or not, but that will 'execute' in analysis stage, that seems not right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17287: [SPARK-19945][SQL]add test suite for SessionCatal...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17287#discussion_r106333835 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalSessionCatalogSuite.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import org.apache.spark.sql.catalyst.catalog.{CatalogTestUtils, ExternalCatalog, SessionCatalogSuite} +import org.apache.spark.sql.hive.test.TestHiveSingleton + +class HiveExternalSessionCatalogSuite extends SessionCatalogSuite with TestHiveSingleton { + + protected override val isHiveExternalCatalog = true + + private val externalCatalog = { +val catalog = spark.sharedState.externalCatalog +catalog.asInstanceOf[HiveExternalCatalog].client.reset() +catalog + } + + protected val utils = new CatalogTestUtils { +override val tableInputFormat: String = "org.apache.hadoop.mapred.SequenceFileInputFormat" +override val tableOutputFormat: String = + "org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat" +override val defaultProvider: String = "parquet" --- End diff -- The above input and output formats does not match what you specified here. Let us change it to `hive` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74634/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16626 **[Test build #74634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74634/testReport)** for PR 16626 at commit [`7fbfc71`](https://github.com/apache/spark/commit/7fbfc7165e3bce388d4dc6e2c58487d4abf8d098). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17289 ```Scala test("saveAsTable API with SaveMode.Overwrite") { val df = spark.createDataFrame(sparkContext.parallelize(arr1x2), schema2) spark.read.jdbc(url1, "test.people", properties).show() df.write.format("jdbc").mode(SaveMode.ErrorIfExists) .option("url", url1) .option("dbtable", "test.people") .options(properties.asScala) .saveAsTable("j1") spark.read.jdbc(url1, "test.people", properties).show() } ``` This is a test case I used. Previously, we respected the user-specified mode `SaveMode.ErrorIfExists`. Now, we are not sending the[ mode to the _createRelation_ API ](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L181). It might be an unexpected behavior change to the external data source connector. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74633/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17307 **[Test build #74633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74633/testReport)** for PR 17307 at commit [`ffd6bde`](https://github.com/apache/spark/commit/ffd6bdeb543556d5e7f448c888ff4f00b5ba152d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17242 hmm, so you don't think canonicalizer should use this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17242: [SPARK-19902][SQL] Support more expression canonicalizat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17242 not "integration", but "move". I think this logic belongs to optimizer instead of canonicalizer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16905: [SPARK-19567][CORE][SCHEDULER] Support some Schedulable ...
Github user squito commented on the issue: https://github.com/apache/spark/pull/16905 reopened https://issues.apache.org/jira/browse/SPARK-7420 for the failure Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17289 it's probably ok as we turn an error case runnable. But we should document this in release notes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17251: [SPARK-19910][SQL] `stack` should not reject NULL...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17251#discussion_r106329490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -590,6 +591,23 @@ object TypeCoercion { } /** + * Coerces NullTypes of a Stack function to the corresponding column types. + */ + object StackCoercion extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { + case s @ Stack(children) if s.childrenResolved && s.children.head.dataType == IntegerType && + s.children.head.foldable => +val schema = s.elementSchema +Stack(children.zipWithIndex.map { + case (e, 0) => e + case (Literal(null, NullType), index: Int) => +Literal.create(null, schema.fields((index - 1) % schema.length).dataType) --- End diff -- we can call `findDataType((index - 1) % s.numFields)` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17289: [SPARK-19948] Document that saveAsTable uses catalog as ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17289 We introduced a behavior change in Spark 2.2. In Spark 2.1, we reported an error if the underlying JDBC table exists. We changed [the mode to `SaveMode.Overwrite`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L159) if the table does not exist in the catalog. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org