[GitHub] spark pull request #16825: Avoid leak SparkContext in Signaling.cancelOnInte...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16825 Avoid leak SparkContext in Signaling.cancelOnInterrupt ## What changes were proposed in this pull request? `Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes ReplSuite unstable. This PR adds `SparkContext.getActive` to allow `Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the leak. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-19481 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16825.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16825 commit 3554e33297140a51d554b57fbfce542ed66367df Author: Shixiong Zhu Date: 2017-02-06T22:40:16Z Avoid leak SparkContext in Signaling.cancelOnInterrupt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16625 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72470/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16625 **[Test build #72470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72470/testReport)** for PR 16625 at commit [`a3f551b`](https://github.com/apache/spark/commit/a3f551b7e5d58b0f2933a9a48e7e928171e152b2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72468/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 The current failure about `ExtendedYarnTest` came from `mesos` module . It seems to be irrelevant to this PR. Let me check that. ``` [info] Running Spark tests using Maven with these arguments: -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test --fail-at-end ... [INFO] Spark Project Mesos FAILURE [ 10.687 s] ... [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on project spark-mesos_2.11: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed: There was an error in the forked process [ERROR] java.lang.RuntimeException: Unable to load category: org.apache.spark.tags.ExtendedYarnTest [ERROR] at org.apache.maven.surefire.group.match.SingleGroupMatcher.loadGroupClasses(SingleGroupMatcher.java:139) [ERROR] at ... [ERROR] Caused by: java.lang.ClassNotFoundException: org.apache.spark.tags.ExtendedYarnTest ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #72468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72468/testReport)** for PR 16650 at commit [`cb24167`](https://github.com/apache/spark/commit/cb241672692db3e604c18bcd56f441f6863a09e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #72469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72469/testReport)** for PR 16043 at commit [`32805cf`](https://github.com/apache/spark/commit/32805cfb2176ab74c21ca93ab53f92852ad7fb24). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #72465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72465/testReport)** for PR 16744 at commit [`eb75482`](https://github.com/apache/spark/commit/eb754825d1934d7eee4175b8adaefe51f46050dd). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SerializableKCLAuthProvider(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72465/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16744 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72466/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #72466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72466/testReport)** for PR 16650 at commit [`37248a2`](https://github.com/apache/spark/commit/37248a202c15807fffe9e25e5b630a27dda38204). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16795 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72467/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16795 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16795 **[Test build #72467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72467/testReport)** for PR 16795 at commit [`42ff642`](https://github.com/apache/spark/commit/42ff6426ec090ef6a1242d8556f39cbdef526d8b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10949: [SPARK-12832][MESOS] mesos scheduler respect agent attri...
Github user evilezh commented on the issue: https://github.com/apache/spark/pull/10949 any update on this. It is real pain with driver ? As i see patch is ready .. question is about when yo merge ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16813#discussion_r99686720 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala --- @@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf) val DEFAULT_WEIGHT = 1 override def buildPools() { -var is: Option[InputStream] = None +var fileData: Option[FileData] = None try { - is = Option { -schedulerAllocFile.map { f => - new FileInputStream(f) -}.getOrElse { - Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) + fileData = schedulerAllocFile.map { f => +Some(FileData(new FileInputStream(f), f)) + }.getOrElse { +val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) +if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE)) +else { + logWarning(s"No Fair Scheduler file found.") + None } } - is.foreach { i => buildFairSchedulerPool(i) } + fileData.foreach { data => +logInfo(s"Fair Scheduler file: ${data.fileName} is found successfully and will be parsed.") --- End diff -- s"Creating Fair Scheduler pools from ${data.fileName}" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16813#discussion_r99686323 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala --- @@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder(val rootPool: Pool, conf: SparkConf) val DEFAULT_WEIGHT = 1 override def buildPools() { -var is: Option[InputStream] = None +var fileData: Option[FileData] = None try { - is = Option { -schedulerAllocFile.map { f => - new FileInputStream(f) -}.getOrElse { - Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) + fileData = schedulerAllocFile.map { f => +Some(FileData(new FileInputStream(f), f)) + }.getOrElse { +val is = Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE) +if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE)) +else { + logWarning(s"No Fair Scheduler file found.") --- End diff -- "Fair Scheduler configuration file not found." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Lo...
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16813 Looks reasonable, but I'd prefer slightly different log messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99681895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} + +if (isDatasourceTable(catalogTable)) { + catalogTable.provider.get match { +case provider if provider.toLowerCase == "text" => + // TextFileFormat can not support adding column either because text datasource table + // is resolved as a single-column table only. + throw new AnalysisException( +s"""${table.toString} is a text format datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider if provider.toLowerCase == "orc" + || provider.startsWith("org.apache.spark.sql.hive.orc") => + // TODO Current native orc reader can not handle the difference between + // user-specified schema and inferred schema from ORC data file yet. + throw new AnalysisException( +s"""${table.toString} is an ORC datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider + if (!DataSource.lookupDataSource(provider).newInstance().isInstanceOf[FileFormat]) => --- End diff -- OK. I will use the white list of allowed FileFormat implementations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99681470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} + +if (isDatasourceTable(catalogTable)) { + catalogTable.provider.get match { +case provider if provider.toLowerCase == "text" => + // TextFileFormat can not support adding column either because text datasource table + // is resolved as a single-column table only. + throw new AnalysisException( +s"""${table.toString} is a text format datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider if provider.toLowerCase == "orc" + || provider.startsWith("org.apache.spark.sql.hive.orc") => --- End diff -- I will double check with this case.. If `orc` is the only representation in CatalogTable.provider, I will reduce the logic here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99681098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) --- End diff -- I see. Will do. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99680917 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -168,6 +168,43 @@ case class AlterTableRenameCommand( } /** + * A command that add columns to a table + * The syntax of using this command in SQL is: + * {{{ + * ALTER TABLE table_identifier + * ADD COLUMNS (col_name data_type [COMMENT col_comment], ...); + * }}} +*/ +case class AlterTableAddColumnsCommand( +table: TableIdentifier, +columns: Seq[StructField]) extends RunnableCommand { + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table) + +// If an exception is thrown here we can just assume the table is uncached; +// this can happen with Hive tables when the underlying catalog is in-memory. +val wasCached = Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false) --- End diff -- The current way is right. The implementation should not rely on the internal behavior of another function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99680331 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} + +if (isDatasourceTable(catalogTable)) { + catalogTable.provider.get match { +case provider if provider.toLowerCase == "text" => + // TextFileFormat can not support adding column either because text datasource table + // is resolved as a single-column table only. + throw new AnalysisException( +s"""${table.toString} is a text format datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider if provider.toLowerCase == "orc" + || provider.startsWith("org.apache.spark.sql.hive.orc") => + // TODO Current native orc reader can not handle the difference between + // user-specified schema and inferred schema from ORC data file yet. + throw new AnalysisException( +s"""${table.toString} is an ORC datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider + if (!DataSource.lookupDataSource(provider).newInstance().isInstanceOf[FileFormat]) => --- End diff -- `FileFormat` only covers a few cases. It does not cover the other external data sources. How about using a white list here in this function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99680029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) +if (catalogTable.tableType == CatalogTableType.VIEW) { + throw new AnalysisException( +s"${table.toString} is a VIEW, which does not support ALTER ADD COLUMNS.") +} + +if (isDatasourceTable(catalogTable)) { + catalogTable.provider.get match { +case provider if provider.toLowerCase == "text" => + // TextFileFormat can not support adding column either because text datasource table + // is resolved as a single-column table only. + throw new AnalysisException( +s"""${table.toString} is a text format datasource table, + |which does not support ALTER ADD COLUMNS.""".stripMargin) +case provider if provider.toLowerCase == "orc" + || provider.startsWith("org.apache.spark.sql.hive.orc") => --- End diff -- When we store the metadata in the catalog, we unify different representations to `orc`, right? Can you find any case to break it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99679303 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( + catalog: SessionCatalog, + table: TableIdentifier): CatalogTable = { +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"${table.toString} is a temporary VIEW, which does not support ALTER ADD COLUMNS.") +} + +val catalogTable = catalog.getTableMetadata(table) --- End diff -- Call `getTempViewOrPermanentTableMetadata` instead of `getTableMetadata`. Then, you do not need the above check for temporary views. In addition, it also covers the cases for global views. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99679239 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( --- End diff -- Ok. I will move to AlterTableAddColumnsCommand class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99679185 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -763,7 +763,9 @@ object DDLUtils { val HIVE_PROVIDER = "hive" def isHiveTable(table: CatalogTable): Boolean = { -table.provider.isDefined && table.provider.get.toLowerCase == HIVE_PROVIDER +// When `CatalogTable` is directly fetched from the catalog, +// CatalogTable.provider = None means the table is a Hive serde table. +!table.provider.isDefined || table.provider.get.toLowerCase == HIVE_PROVIDER --- End diff -- I see. I will find another way. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroC...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16795#discussion_r99678662 --- Diff: sql/core/pom.xml --- @@ -130,6 +130,12 @@ test + org.apache.avro --- End diff -- @srowen . Maven rejects the newly added test dependency, so I reverted the commit about moving into parent pom. To use different versions, it seems we should keep this here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99678116 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( --- End diff -- Since this checking is only used in `AlterTableAddColumnsCommand `, we do not need to move it here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99677955 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -763,7 +763,9 @@ object DDLUtils { val HIVE_PROVIDER = "hive" def isHiveTable(table: CatalogTable): Boolean = { -table.provider.isDefined && table.provider.get.toLowerCase == HIVE_PROVIDER +// When `CatalogTable` is directly fetched from the catalog, +// CatalogTable.provider = None means the table is a Hive serde table. +!table.provider.isDefined || table.provider.get.toLowerCase == HIVE_PROVIDER --- End diff -- The provider could be empty if the table is a VIEW. Thus, please do not modify the utility function here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16821: [SPARK-19472][SQL] Parser should not mistake CASE...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16821 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16821: [SPARK-19472][SQL] Parser should not mistake CASE WHEN(....
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16821 Thanks! Merging to master/2.1/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16821: [SPARK-19472][SQL] Parser should not mistake CASE...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16821#discussion_r99676254 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala --- @@ -298,6 +298,8 @@ class ExpressionParserSuite extends PlanTest { CaseKeyWhen("a" === "a", Seq(true, 1))) assertEqual("case when a = 1 then b when a = 2 then c else d end", CaseWhen(Seq(('a === 1, 'b.expr), ('a === 2, 'c.expr)), 'd)) +assertEqual("case when (1) + case when a > b then c else d end then f else g end", + CaseWhen(Seq((Literal(1) + CaseWhen(Seq(('a > 'b, 'c.expr)), 'd.expr), 'f.expr)), 'g)) --- End diff -- To other reviewers: before the fix, if users do not put round brackets ( ), it works well. For example, `case when 1 + case when a > b then c else d end then f else g end` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16821: [SPARK-19472][SQL] Parser should not mistake CASE WHEN(....
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16821 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72471/testReport)** for PR 16722 at commit [`48b1258`](https://github.com/apache/spark/commit/48b12586d2c24fd9852f9130376acd72d6e64467). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14637 The release notes are for end users, and this doesn't impact end users. Developers are expected, more or less, to follow commits and dev@ to keep up with changes like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14637: [SPARK-16967] move mesos to module
Github user drcrallen commented on the issue: https://github.com/apache/spark/pull/14637 FYI, we have a build process that packages spark core, now that mesos is is in its own artifact, this broke our build and deploy process, and its not called out in release notes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16795 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16795 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72464/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16795 **[Test build #72464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72464/testReport)** for PR 16795 at commit [`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99670310 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since("1.4.0") ( ".train() called with non-matching numClasses and thresholds.length." + s" numClasses=$numClasses, but thresholds has length ${$(thresholds).length}") } - -val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, numClasses) --- End diff -- For regressors, `extractLabeledPoints` doesn't do any extra checking. The larger issue is that we are manually "extracting instances" but we have convenience methods for labeled points. Since correcting it now, in this PR, likely means implementing the framework to correct it everywhere - which is a larger and orthogonal change, I think we could just add the check manually to the classifier, then create a JIRA that addresses consolidating these, probably by adding `extractInstances` methods analogous their labeled point counterparts. This PR is large enough as is, without having to think about adding that method, then implementing it in all the other algos that manually extract instances, IMO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16625 **[Test build #72470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72470/testReport)** for PR 16625 at commit [`a3f551b`](https://github.com/apache/spark/commit/a3f551b7e5d58b0f2933a9a48e7e928171e152b2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99668948 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala --- @@ -35,4 +35,11 @@ case class LabeledPoint(@Since("2.0.0") label: Double, @Since("2.0.0") features: override def toString: String = { s"($label,$features)" } + + private[spark] def toInstance: Instance = toInstance(1.0) --- End diff -- Actually, I'd prefer to remove the no arg function and be explicit everywhere. That way there is no ambiguity or unintended effects if someone changes the default value. Sound ok? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99668686 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -590,8 +599,8 @@ private[spark] object RandomForest extends Logging { if (!isLeaf) { node.split = Some(split) val childIsLeaf = (LearningNode.indexToLevel(nodeIndex) + 1) == metadata.maxDepth - val leftChildIsLeaf = childIsLeaf || (stats.leftImpurity == 0.0) - val rightChildIsLeaf = childIsLeaf || (stats.rightImpurity == 0.0) + val leftChildIsLeaf = childIsLeaf || (math.abs(stats.leftImpurity) < 1e-16) + val rightChildIsLeaf = childIsLeaf || (math.abs(stats.rightImpurity) < 1e-16) --- End diff -- I'd prefer not to refactor it in this PR. Updated to use `EPSILON` from ml.impl.Utils --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16625: [SPARK-17874][core] Add SSL port configuration.
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16625#discussion_r99668339 --- Diff: docs/security.md --- @@ -49,10 +49,6 @@ component-specific configuration namespaces used to override the default setting Component -spark.ssl.fs --- End diff -- Hmmm... that code is actually being used, but not to set up the file server, but to configure HTTP clients that download from SSL-enabled servers. Let me see about making that clear in the configuration docs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99666983 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala --- @@ -124,8 +129,8 @@ private[ml] object TreeTests extends SparkFunSuite { * make mistakes such as creating loops of Nodes. */ private def checkEqual(a: Node, b: Node): Unit = { -assert(a.prediction === b.prediction) -assert(a.impurity === b.impurity) +assert(a.prediction ~== b.prediction absTol 1e-8) +assert(a.impurity ~== b.impurity absTol 1e-8) --- End diff -- All over the test suites we use tolerances as literal doubles instead of making a variable for each and every one. I think it would be over-engineering to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #72469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72469/testReport)** for PR 16043 at commit [`32805cf`](https://github.com/apache/spark/commit/32805cfb2176ab74c21ca93ab53f92852ad7fb24). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16625: [SPARK-17874][core] Add SSL port configuration.
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16625#discussion_r99667702 --- Diff: docs/configuration.md --- @@ -1797,6 +1797,20 @@ Apart from these, the following properties are also available, and may be useful +spark.ssl.[namespace].port --- End diff -- This was intentional. "spark.ssl.port" doesn't make that much sense if you think about it; you want things like the master UI and history server UI to have different, well known ports, so having this shared config key here doesn't make a lot of sense. For the other configs, such as algorithms and keystore locations, sharing configuration is ok. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99667381 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala --- @@ -351,6 +370,36 @@ class DecisionTreeClassifierSuite dt.fit(df) } + test("training with sample weights") { +val df = linearMulticlassDataset +val numClasses = 3 +val predEquals = (x: Double, y: Double) => x == y +// (impurity, maxDepth) +val testParams = Seq( + ("gini", 10), + ("entropy", 10), + ("gini", 5) +) +for ((impurity, maxDepth) <- testParams) { + val estimator = new DecisionTreeClassifier() +.setMaxDepth(maxDepth) +.setSeed(seed) +.setMinWeightFractionPerNode(0.049) --- End diff -- We use param validators for this. Since those are already tested, I don't see a need. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r9967 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -281,10 +283,26 @@ object MLTestingUtils extends SparkFunSuite { estimator: E with HasWeightCol, modelEquals: (M, M) => Unit): Unit = { estimator.set(estimator.weightCol, "weight") -val models = Seq(0.001, 1.0, 1000.0).map { w => +val models = Seq(0.01, 1.0, 1000.0).map { w => --- End diff -- Yes, the decision tree tests have trouble with numerical precision when the weights are really small. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...
Github user budde commented on the issue: https://github.com/apache/spark/pull/16797 I'll double check, but I don't think ```spark.sql.hive.manageFilesourcePartitions=false``` would solve this issue since we're still deriving the file relation's dataSchema parameter from the schema of MetastoreRelation. The call to ```fileFormat.inferSchema()``` has been removed entirely. If Spark SQL is set on using a table property to store the case-sesnitive schema then I think having a way to backfill this property for existing < 2.1 tables as well as tables not created or managed by Spark will be a necessity. If the cleanest way to deal with this case sensitivity problem is to bring back schema inference then I think a good option would be to introduce a configuration param to indicate whether or not an inferred schema should be written back to the table as a property. We could also introduce another config param that allows a user to bypass schema inference even if a case-sensitive schema can't be read from the table properties. This could be helpful for users who would like to query external Hive tables that aren't managed by Spark and that they know aren't backed by files containing case-sensitive field names. This would basically allow us to support the following use cases: 1) The MetastoreRelation is able to read a case-sensitive schema from the table properties. No inference is necessary. 2) The MetastoreRelation can't read a case-sensitive schema from the table properties. A case-sensitive schema is inferred and, if configured, written back as a table property. 3) The MetastoreRelation can't read a case-sensitive schema from the table properties. The user knows the underlying data files don't contain case-sensitive field names and has explicitly set a config param to skip the inference step. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16760 @gatorsmile Yeah Sean. Actually most likely i will need to work out a different schema than what i have currently for the generator tests. So i was planning to add the negative scenarios and generator tests in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #72468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72468/testReport)** for PR 16650 at commit [`cb24167`](https://github.com/apache/spark/commit/cb241672692db3e604c18bcd56f441f6863a09e4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99665910 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala --- @@ -70,17 +70,24 @@ object Variance extends Impurity { * Note: Instances of this class do not hold the data; they operate on views of the data. */ private[spark] class VarianceAggregator() - extends ImpurityAggregator(statsSize = 3) with Serializable { + extends ImpurityAggregator(statsSize = 4) with Serializable { /** * Update stats for one (node, feature, bin) with the given label. * @param allStats Flat stats array, with stats for this (node, feature, bin) contiguous. * @param offsetStart index of stats for this (node, feature, bin). */ - def update(allStats: Array[Double], offset: Int, label: Double, instanceWeight: Double): Unit = { + def update( + allStats: Array[Double], + offset: Int, + label: Double, + numSamples: Int, + sampleWeight: Double): Unit = { +val instanceWeight = numSamples * sampleWeight allStats(offset) += instanceWeight allStats(offset + 1) += instanceWeight * label allStats(offset + 2) += instanceWeight * label * label +allStats(offset + 3) += numSamples --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99665188 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala --- @@ -79,7 +79,12 @@ private[spark] abstract class ImpurityAggregator(val statsSize: Int) extends Ser * @param allStats Flat stats array, with stats for this (node, feature, bin) contiguous. * @param offsetStart index of stats for this (node, feature, bin). */ - def update(allStats: Array[Double], offset: Int, label: Double, instanceWeight: Double): Unit + def update( + allStats: Array[Double], + offset: Int, + label: Double, + numSamples: Int, + sampleWeight: Double): Unit --- End diff -- I don't think it's necessary. It's a private class, and the only params currently in the doc are the ambiguous ones. These new ones should be self explanatory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99664910 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -467,6 +469,52 @@ class StandaloneDynamicAllocationSuite } } + test("kill all executors on localhost") { +sc = new SparkContext(appConf) +val appId = sc.applicationId +eventually(timeout(10.seconds), interval(10.millis)) { + val apps = getApplications() + assert(apps.size === 1) + assert(apps.head.id === appId) + assert(apps.head.executors.size === 2) + assert(apps.head.getExecutorLimit === Int.MaxValue) +} +val beforeList = getApplications().head.executors.keys.toSet +assert(killExecutorsOnHost(sc, "localhost").equals(true)) + +syncExecutors(sc) +val afterList = getApplications().head.executors.keys.toSet + +eventually(timeout(10.seconds), interval(100.millis)) { + assert(beforeList.intersect(afterList).size == 0) +} + } + + test("executor registration on a blacklisted host must fail") { +sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, "true")) +val endpointRef = mock(classOf[RpcEndpointRef]) +val mockAddress = mock(classOf[RpcAddress]) +when(endpointRef.address).thenReturn(mockAddress) +val message = RegisterExecutor("one", endpointRef, "blacklisted-host", 10, Map.empty) + +// Get "localhost" on a blacklist. +val taskScheduler = mock(classOf[TaskSchedulerImpl]) +when(taskScheduler.nodeBlacklist()).thenReturn(Set("blacklisted-host")) +when(taskScheduler.sc).thenReturn(sc) +sc.taskScheduler = taskScheduler + +// Create a fresh scheduler backend to blacklist "localhost". +sc.schedulerBackend.stop() +val backend = + new StandaloneSchedulerBackend(taskScheduler, sc, Array(masterRpcEnv.address.toSparkURL)) --- End diff -- Would be really nice to have automated style checks... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16795 **[Test build #72467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72467/testReport)** for PR 16795 at commit [`42ff642`](https://github.com/apache/spark/commit/42ff6426ec090ef6a1242d8556f39cbdef526d8b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99664877 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala --- @@ -83,23 +83,29 @@ object Entropy extends Impurity { * @param numClasses Number of classes for label. */ private[spark] class EntropyAggregator(numClasses: Int) - extends ImpurityAggregator(numClasses) with Serializable { + extends ImpurityAggregator(numClasses + 1) with Serializable { --- End diff -- Yes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/16722#discussion_r99664273 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala --- @@ -42,6 +42,7 @@ import org.apache.spark.rdd.RDD private[spark] class DecisionTreeMetadata( val numFeatures: Int, val numExamples: Long, +val weightedNumExamples: Double, --- End diff -- Yeah, not all of the params are added to the doc, tbh I'm not sure how it was decided which ones were and were not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/16810 @srowen, I'll help with this. It turns out that we don't need to make any Jenkins configuration changes for the pull request builder. For the master branch builders, I've gone ahead and disabled the jobs and will complete their final removal in a few days after this patch merges. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99664044 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -467,6 +469,52 @@ class StandaloneDynamicAllocationSuite } } + test("kill all executors on localhost") { +sc = new SparkContext(appConf) +val appId = sc.applicationId +eventually(timeout(10.seconds), interval(10.millis)) { + val apps = getApplications() + assert(apps.size === 1) + assert(apps.head.id === appId) + assert(apps.head.executors.size === 2) + assert(apps.head.getExecutorLimit === Int.MaxValue) +} +val beforeList = getApplications().head.executors.keys.toSet +assert(killExecutorsOnHost(sc, "localhost").equals(true)) + +syncExecutors(sc) +val afterList = getApplications().head.executors.keys.toSet + +eventually(timeout(10.seconds), interval(100.millis)) { + assert(beforeList.intersect(afterList).size == 0) +} + } + + test("executor registration on a blacklisted host must fail") { +sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, "true")) +val endpointRef = mock(classOf[RpcEndpointRef]) +val mockAddress = mock(classOf[RpcAddress]) +when(endpointRef.address).thenReturn(mockAddress) +val message = RegisterExecutor("one", endpointRef, "blacklisted-host", 10, Map.empty) + +// Get "localhost" on a blacklist. +val taskScheduler = mock(classOf[TaskSchedulerImpl]) +when(taskScheduler.nodeBlacklist()).thenReturn(Set("blacklisted-host")) +when(taskScheduler.sc).thenReturn(sc) +sc.taskScheduler = taskScheduler + +// Create a fresh scheduler backend to blacklist "localhost". +sc.schedulerBackend.stop() +val backend = + new StandaloneSchedulerBackend(taskScheduler, sc, Array(masterRpcEnv.address.toSparkURL)) --- End diff -- super nit: looks like this is only indented one space, not two --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #72466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72466/testReport)** for PR 16650 at commit [`37248a2`](https://github.com/apache/spark/commit/37248a202c15807fffe9e25e5b630a27dda38204). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99661763 --- Diff: core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala --- @@ -456,4 +461,69 @@ class BlacklistTrackerSuite extends SparkFunSuite with BeforeAndAfterEach with M conf.remove(config) } } + + test("blacklisting kills executors, configured by BLACKLIST_KILL_ENABLED") { +val allocationClientMock = mock[ExecutorAllocationClient] +when(allocationClientMock.killExecutors(any(), any(), any())).thenReturn(Seq("called")) +when(allocationClientMock.killExecutorsOnHost("hostA")).thenAnswer(new Answer[Boolean] { + override def answer(invocation: InvocationOnMock): Boolean = { +if (blacklist.nodeBlacklist.contains("hostA") == false) { + throw new IllegalStateException("hostA should be on the blacklist") --- End diff -- Sure. I've used your text with very minor modification. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user budde commented on the issue: https://github.com/apache/spark/pull/16744 Amending this PR to upgrade the KCL/AWS SDK dependencies to more-current versions (1.7.3 and 1.11.76, respectively). The ```RegionUtils.getRegionByEndpoint()``` API was removed from the SDK, so I've had to replace it with a simple string split method for the examples and test suites that were utilizing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16744 **[Test build #72465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72465/testReport)** for PR 16744 at commit [`eb75482`](https://github.com/apache/spark/commit/eb754825d1934d7eee4175b8adaefe51f46050dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99661191 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -467,6 +469,51 @@ class StandaloneDynamicAllocationSuite } } + test("kill all executors on localhost") { +sc = new SparkContext(appConf) +val appId = sc.applicationId +eventually(timeout(10.seconds), interval(10.millis)) { + val apps = getApplications() + assert(apps.size === 1) + assert(apps.head.id === appId) + assert(apps.head.executors.size === 2) + assert(apps.head.getExecutorLimit === Int.MaxValue) +} +val beforeList = getApplications().head.executors.keys.toSet +// kill all executors without replacement +assert(killExecutorsOnHost(sc, "localhost").equals(true)) + +syncExecutors(sc) +val afterList = getApplications().head.executors.keys.toSet + +eventually(timeout(10.seconds), interval(100.millis)) { + assert(beforeList.intersect(afterList).size == 0) +} + } + + test("executor registration on a blacklisted host must fail") { +sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, "true")) +val endpointRef = mock(classOf[RpcEndpointRef]) +val mockAddress = mock(classOf[RpcAddress]) +when(endpointRef.address).thenReturn(mockAddress) +val message = RegisterExecutor("one", endpointRef, "localhost", 10, Map.empty) + +// Get "localhost" on a blacklist. +val taskScheduler = mock(classOf[TaskSchedulerImpl]) +when(taskScheduler.nodeBlacklist()).thenReturn(Set("localhost")) --- End diff -- Let's call it "blacklisted-host". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99660908 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -467,6 +469,51 @@ class StandaloneDynamicAllocationSuite } } + test("kill all executors on localhost") { +sc = new SparkContext(appConf) +val appId = sc.applicationId +eventually(timeout(10.seconds), interval(10.millis)) { + val apps = getApplications() + assert(apps.size === 1) + assert(apps.head.id === appId) + assert(apps.head.executors.size === 2) + assert(apps.head.getExecutorLimit === Int.MaxValue) +} +val beforeList = getApplications().head.executors.keys.toSet +// kill all executors without replacement --- End diff -- Best to just delete the comment, then. Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99660755 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -600,6 +603,16 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp */ protected def doKillExecutors(executorIds: Seq[String]): Future[Boolean] = Future.successful(false) + + /** + * Request that the cluster manager kill all executors on a given host. + * @return whether the kill request is acknowledged. + */ + final override def killExecutorsOnHost(host: String): Boolean = { +logInfo(s"Requesting to kill any and all executors on host ${host}") +driverEndpoint.send(KillExecutorsOnHost(host)) --- End diff -- Sure. I've paraphrased this a bit but it's a helpful comment to add. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...
Github user jsoltren commented on a diff in the pull request: https://github.com/apache/spark/pull/16650#discussion_r99660675 --- Diff: core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala --- @@ -489,6 +491,29 @@ class StandaloneDynamicAllocationSuite } } + test("executor registration on a blacklisted host must fail") { +sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, "true")) +val endpointRef = mock(classOf[RpcEndpointRef]) +val mockAddress = mock(classOf[RpcAddress]) +when(endpointRef.address).thenReturn(mockAddress) +val message = RegisterExecutor("one", endpointRef, "localhost", 10, Map.empty) + +// Get "localhost" on a blacklist. +val taskScheduler = mock(classOf[TaskSchedulerImpl]) +when(taskScheduler.nodeBlacklist()).thenReturn(Set("localhost")) +when(taskScheduler.sc).thenReturn(sc) +sc.taskScheduler = taskScheduler + +// Create a fresh scheduler backend to blacklist "localhost". +sc.schedulerBackend.stop() +val backend = + new StandaloneSchedulerBackend(taskScheduler, sc, Array(masterRpcEnv.address.toSparkURL)) +backend.start() + +backend.driverEndpoint.ask[Boolean](message) +verify(endpointRef).send(RegisterExecutorFailed(any())) --- End diff -- Thanks for the tip. Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99660704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( --- End diff -- oh. this is ddl util function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99660453 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( --- End diff -- yes. you are right. I will change it to private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16626#discussion_r99659988 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -814,4 +816,50 @@ object DDLUtils { } } } + + /** + * ALTER TABLE ADD COLUMNS command does not support temporary view/table, + * view, or datasource table with text, orc formats or external provider. + */ + def verifyAlterTableAddColumn( --- End diff -- This function should be a private function of `AlterTableAddColumnsCommand `, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16795 **[Test build #72464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72464/testReport)** for PR 16795 at commit [`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16760 @dilipbiswal Are you planning to submit another PR for `Generators` or do it in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16823 Agree, though we are talking about duplicating 1 line of code in 3 nearby places. It's not meaningfully duplicating anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16824 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72462/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16824 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16738: [SPARK-19398] Change one misleading log in TaskSe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16738 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16824 **[Test build #72462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72462/testReport)** for PR 16824 at commit [`b3acaad`](https://github.com/apache/spark/commit/b3acaadfed5833c108c03aae7865b6ed2782169a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatib...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16795 **[Test build #72463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72463/testReport)** for PR 16795 at commit [`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16738: [SPARK-19398] Change one misleading log in TaskSetManage...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/16738 Merged this to master. Thanks for the fix @jinxing64 -- these fixes to improve readability / usability of the code are super useful! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...
Github user pfcoperez commented on the issue: https://github.com/apache/spark/pull/16823 @andrewor14 @srowen In any case, I just wanted to add that copying code is, basically the worst strategy. If you wanted to constraint the types for those tree and not just **AnyVal** sub-classes I would recommend doing something as: ``` def config(key: String, value: Double): Builder = config(key, value.toString) def config(key: String, value: Boolean): Builder = config(key, value.toString) def config(key: String, value: Long): Builder = config(key, value.toString) ``` Exactly same interface, no copy-paste code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibilitySuite f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16795 Oh, thank you, @liancheng ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16824 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16824 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72461/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16824 **[Test build #72461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72461/testReport)** for PR 16824 at commit [`00c8af3`](https://github.com/apache/spark/commit/00c8af35b704e989ad8536490f310a35c2e721fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibilitySuite f...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16795 @dongjoon-hyun, you may add `[TEST-MAVEN]` in the PR title to ask Jenkins to test this PR using Maven. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16791 Ah, thank you for confirming and the information! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16824 **[Test build #72462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72462/testReport)** for PR 16824 at commit [`b3acaad`](https://github.com/apache/spark/commit/b3acaadfed5833c108c03aae7865b6ed2782169a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16824 **[Test build #72461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72461/testReport)** for PR 16824 at commit [`00c8af3`](https://github.com/apache/spark/commit/00c8af35b704e989ad8536490f310a35c2e721fb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16791 @HyukjinKwon Sorry that I didn't see your comment before this PR got merged. I believe PARQUET-686 had already been fixed by apache/parquet-mr#367 but wasn't marked as resolved in JIRA. Thanks for sending out #16817 for re-enabling the tests! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/16715 @yanboliang, just a friendly reminder please don't forget to review the PR when you have time. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16803: [SPARK-19458][BUILD][SQL]load hive jars from local repo ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16803: [SPARK-19458][BUILD][SQL]load hive jars from local repo ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72457/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org