date:20170206

[GitHub] spark pull request #16825: Avoid leak SparkContext in Signaling.cancelOnInte...

2017-02-06 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16825

Avoid leak SparkContext in Signaling.cancelOnInterrupt

## What changes were proposed in this pull request?

`Signaling.cancelOnInterrupt` leaks a SparkContext per call and it makes 
ReplSuite unstable.

This PR adds `SparkContext.getActive` to allow 
`Signaling.cancelOnInterrupt` to get the active `SparkContext` to avoid the 
leak.

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-19481

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16825


commit 3554e33297140a51d554b57fbfce542ed66367df
Author: Shixiong Zhu 
Date:   2017-02-06T22:40:16Z

Avoid leak SparkContext in Signaling.cancelOnInterrupt




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72470/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16625
  
**[Test build #72470 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72470/testReport)**
 for PR 16625 at commit 
[`a3f551b`](https://github.com/apache/spark/commit/a3f551b7e5d58b0f2933a9a48e7e928171e152b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72469/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72468/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16043
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16795
  
The current failure about `ExtendedYarnTest` came from `mesos` module . It 
seems to be irrelevant to this PR. Let me check that.
```
[info] Running Spark tests using Maven with these arguments:  -Phadoop-2.3 
-Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest
 test --fail-at-end
...
[INFO] Spark Project Mesos  FAILURE [ 
10.687 s]
...
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test (default-test) on 
project spark-mesos_2.11: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed: There was an 
error in the forked process
[ERROR] java.lang.RuntimeException: Unable to load category: 
org.apache.spark.tags.ExtendedYarnTest
[ERROR] at 
org.apache.maven.surefire.group.match.SingleGroupMatcher.loadGroupClasses(SingleGroupMatcher.java:139)
[ERROR] at ...
[ERROR] Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.tags.ExtendedYarnTest
...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72468/testReport)**
 for PR 16650 at commit 
[`cb24167`](https://github.com/apache/spark/commit/cb241672692db3e604c18bcd56f441f6863a09e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72469 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72469/testReport)**
 for PR 16043 at commit 
[`32805cf`](https://github.com/apache/spark/commit/32805cfb2176ab74c21ca93ab53f92852ad7fb24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16744
  
**[Test build #72465 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72465/testReport)**
 for PR 16744 at commit 
[`eb75482`](https://github.com/apache/spark/commit/eb754825d1934d7eee4175b8adaefe51f46050dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SerializableKCLAuthProvider(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72465/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16744
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72466/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16650
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72466/testReport)**
 for PR 16650 at commit 
[`37248a2`](https://github.com/apache/spark/commit/37248a202c15807fffe9e25e5b630a27dda38204).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72467/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72467 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72467/testReport)**
 for PR 16795 at commit 
[`42ff642`](https://github.com/apache/spark/commit/42ff6426ec090ef6a1242d8556f39cbdef526d8b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10949: [SPARK-12832][MESOS] mesos scheduler respect agent attri...

2017-02-06 Thread evilezh

Github user evilezh commented on the issue:

https://github.com/apache/spark/pull/10949
  
any update on this. It is real pain with driver ? As i see patch is ready 
.. question is about when yo merge ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...

2017-02-06 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/16813#discussion_r99686720
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala ---
@@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder(val 
rootPool: Pool, conf: SparkConf)
   val DEFAULT_WEIGHT = 1
 
   override def buildPools() {
-var is: Option[InputStream] = None
+var fileData: Option[FileData] = None
 try {
-  is = Option {
-schedulerAllocFile.map { f =>
-  new FileInputStream(f)
-}.getOrElse {
-  
Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+  fileData = schedulerAllocFile.map { f =>
+Some(FileData(new FileInputStream(f), f))
+  }.getOrElse {
+val is = 
Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
+else {
+  logWarning(s"No Fair Scheduler file found.")
+  None
 }
   }
 
-  is.foreach { i => buildFairSchedulerPool(i) }
+  fileData.foreach { data =>
+logInfo(s"Fair Scheduler file: ${data.fileName} is found 
successfully and will be parsed.")
--- End diff --

s"Creating Fair Scheduler pools from ${data.fileName}"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...

2017-02-06 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/16813#discussion_r99686323
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala ---
@@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder(val 
rootPool: Pool, conf: SparkConf)
   val DEFAULT_WEIGHT = 1
 
   override def buildPools() {
-var is: Option[InputStream] = None
+var fileData: Option[FileData] = None
 try {
-  is = Option {
-schedulerAllocFile.map { f =>
-  new FileInputStream(f)
-}.getOrElse {
-  
Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+  fileData = schedulerAllocFile.map { f =>
+Some(FileData(new FileInputStream(f), f))
+  }.getOrElse {
+val is = 
Utils.getSparkClassLoader.getResourceAsStream(DEFAULT_SCHEDULER_FILE)
+if(is != null) Some(FileData(is, DEFAULT_SCHEDULER_FILE))
+else {
+  logWarning(s"No Fair Scheduler file found.")
--- End diff --

"Fair Scheduler configuration file not found."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Lo...

2017-02-06 Thread markhamstra

Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/16813
  
Looks reasonable, but I'd prefer slightly different log messages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99681895
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+
+if (isDatasourceTable(catalogTable)) {
+  catalogTable.provider.get match {
+case provider if provider.toLowerCase == "text" =>
+  // TextFileFormat can not support adding column either because 
text datasource table
+  // is resolved as a single-column table only.
+  throw new AnalysisException(
+s"""${table.toString} is a text format datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider if provider.toLowerCase == "orc"
+  || provider.startsWith("org.apache.spark.sql.hive.orc") =>
+  // TODO Current native orc reader can not handle the difference 
between
+  // user-specified schema and inferred schema from ORC data file 
yet.
+  throw new AnalysisException(
+s"""${table.toString} is an ORC datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider
+  if 
(!DataSource.lookupDataSource(provider).newInstance().isInstanceOf[FileFormat]) 
=>
--- End diff --

OK. I will use the white list of allowed FileFormat implementations. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99681470
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+
+if (isDatasourceTable(catalogTable)) {
+  catalogTable.provider.get match {
+case provider if provider.toLowerCase == "text" =>
+  // TextFileFormat can not support adding column either because 
text datasource table
+  // is resolved as a single-column table only.
+  throw new AnalysisException(
+s"""${table.toString} is a text format datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider if provider.toLowerCase == "orc"
+  || provider.startsWith("org.apache.spark.sql.hive.orc") =>
--- End diff --

I will double check with this case.. If `orc` is the only representation in 
CatalogTable.provider, I will reduce the logic here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99681098
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
--- End diff --

I see. Will do. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99680917
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -168,6 +168,43 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
--- End diff --

The current way is right. The implementation should not rely on the 
internal behavior of another function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99680331
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+
+if (isDatasourceTable(catalogTable)) {
+  catalogTable.provider.get match {
+case provider if provider.toLowerCase == "text" =>
+  // TextFileFormat can not support adding column either because 
text datasource table
+  // is resolved as a single-column table only.
+  throw new AnalysisException(
+s"""${table.toString} is a text format datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider if provider.toLowerCase == "orc"
+  || provider.startsWith("org.apache.spark.sql.hive.orc") =>
+  // TODO Current native orc reader can not handle the difference 
between
+  // user-specified schema and inferred schema from ORC data file 
yet.
+  throw new AnalysisException(
+s"""${table.toString} is an ORC datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider
+  if 
(!DataSource.lookupDataSource(provider).newInstance().isInstanceOf[FileFormat]) 
=>
--- End diff --

`FileFormat` only covers a few cases. It does not cover the other external 
data sources. How about using a white list here in this function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99680029
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+
+if (isDatasourceTable(catalogTable)) {
+  catalogTable.provider.get match {
+case provider if provider.toLowerCase == "text" =>
+  // TextFileFormat can not support adding column either because 
text datasource table
+  // is resolved as a single-column table only.
+  throw new AnalysisException(
+s"""${table.toString} is a text format datasource table,
+   |which does not support ALTER ADD COLUMNS.""".stripMargin)
+case provider if provider.toLowerCase == "orc"
+  || provider.startsWith("org.apache.spark.sql.hive.orc") =>
--- End diff --

When we store the metadata in the catalog, we unify different 
representations to `orc`, right? Can you find any case to break it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99679303
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
+  catalog: SessionCatalog,
+  table: TableIdentifier): CatalogTable = {
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"${table.toString} is a temporary VIEW, which does not support 
ALTER ADD COLUMNS.")
+}
+
+val catalogTable = catalog.getTableMetadata(table)
--- End diff --

Call `getTempViewOrPermanentTableMetadata` instead of `getTableMetadata`. 
Then, you do not need the above check for temporary views. In addition, it also 
covers the cases for global views.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99679239
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
--- End diff --

Ok. I will move to AlterTableAddColumnsCommand class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99679185
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -763,7 +763,9 @@ object DDLUtils {
   val HIVE_PROVIDER = "hive"
 
   def isHiveTable(table: CatalogTable): Boolean = {
-table.provider.isDefined && table.provider.get.toLowerCase == 
HIVE_PROVIDER
+// When `CatalogTable` is directly fetched from the catalog,
+// CatalogTable.provider = None means the table is a Hive serde table.
+!table.provider.isDefined || table.provider.get.toLowerCase == 
HIVE_PROVIDER
--- End diff --

I see. I will find another way. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroC...

2017-02-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16795#discussion_r99678662
  
--- Diff: sql/core/pom.xml ---
@@ -130,6 +130,12 @@
   test
 
 
+  org.apache.avro
--- End diff --

@srowen .
Maven rejects the newly added test dependency, so I reverted the commit 
about moving into parent pom. To use different versions, it seems we should 
keep this here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99678116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
--- End diff --

Since this checking is only used in `AlterTableAddColumnsCommand `, we do 
not need to move it here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99677955
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -763,7 +763,9 @@ object DDLUtils {
   val HIVE_PROVIDER = "hive"
 
   def isHiveTable(table: CatalogTable): Boolean = {
-table.provider.isDefined && table.provider.get.toLowerCase == 
HIVE_PROVIDER
+// When `CatalogTable` is directly fetched from the catalog,
+// CatalogTable.provider = None means the table is a Hive serde table.
+!table.provider.isDefined || table.provider.get.toLowerCase == 
HIVE_PROVIDER
--- End diff --

The provider could be empty if the table is a VIEW. Thus, please do not 
modify the utility function here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16821: [SPARK-19472][SQL] Parser should not mistake CASE...

2017-02-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16821


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16821: [SPARK-19472][SQL] Parser should not mistake CASE WHEN(....

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16821
  
Thanks! Merging to master/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16821: [SPARK-19472][SQL] Parser should not mistake CASE...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16821#discussion_r99676254
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
 ---
@@ -298,6 +298,8 @@ class ExpressionParserSuite extends PlanTest {
   CaseKeyWhen("a" ===  "a", Seq(true, 1)))
 assertEqual("case when a = 1 then b when a = 2 then c else d end",
   CaseWhen(Seq(('a === 1, 'b.expr), ('a === 2, 'c.expr)), 'd))
+assertEqual("case when (1) + case when a > b then c else d end then f 
else g end",
+  CaseWhen(Seq((Literal(1) + CaseWhen(Seq(('a > 'b, 'c.expr)), 
'd.expr), 'f.expr)), 'g))
--- End diff --

To other reviewers: before the fix, if users do not put round brackets ( ), 
it works well. For example, `case when 1 + case when a > b then c else d end 
then f else g end`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16821: [SPARK-19472][SQL] Parser should not mistake CASE WHEN(....

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16821
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16722
  
**[Test build #72471 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72471/testReport)**
 for PR 16722 at commit 
[`48b1258`](https://github.com/apache/spark/commit/48b12586d2c24fd9852f9130376acd72d6e64467).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2017-02-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14637
  
The release notes are for end users, and this doesn't impact end users. 
Developers are expected, more or less, to follow commits and dev@ to keep up 
with changes like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [SPARK-16967] move mesos to module

2017-02-06 Thread drcrallen

Github user drcrallen commented on the issue:

https://github.com/apache/spark/pull/14637
  
FYI, we have a build process that packages spark core, now that mesos is is 
in its own artifact, this broke our build and deploy process, and its not 
called out in release notes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16795
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72464/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72464 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72464/testReport)**
 for PR 16795 at commit 
[`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99670310
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -106,14 +122,18 @@ class DecisionTreeClassifier @Since("1.4.0") (
 ".train() called with non-matching numClasses and 
thresholds.length." +
 s" numClasses=$numClasses, but thresholds has length 
${$(thresholds).length}")
 }
-
-val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, 
numClasses)
--- End diff --

For regressors, `extractLabeledPoints` doesn't do any extra checking. The 
larger issue is that we are manually "extracting instances" but we have 
convenience methods for labeled points. Since correcting it now, in this PR, 
likely means implementing the framework to correct it everywhere - which is a 
larger and orthogonal change, I think we could just add the check manually to 
the classifier, then create a JIRA that addresses consolidating these, probably 
by adding `extractInstances` methods analogous their labeled point 
counterparts. This PR is large enough as is, without having to think about 
adding that method, then implementing it in all the other algos that manually 
extract instances, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16625: [SPARK-17874][core] Add SSL port configuration.

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16625
  
**[Test build #72470 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72470/testReport)**
 for PR 16625 at commit 
[`a3f551b`](https://github.com/apache/spark/commit/a3f551b7e5d58b0f2933a9a48e7e928171e152b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99668948
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/LabeledPoint.scala ---
@@ -35,4 +35,11 @@ case class LabeledPoint(@Since("2.0.0") label: Double, 
@Since("2.0.0") features:
   override def toString: String = {
 s"($label,$features)"
   }
+
+  private[spark] def toInstance: Instance = toInstance(1.0)
--- End diff --

Actually, I'd prefer to remove the no arg function and be explicit 
everywhere. That way there is no ambiguity or unintended effects if someone 
changes the default value. Sound ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99668686
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -590,8 +599,8 @@ private[spark] object RandomForest extends Logging {
 if (!isLeaf) {
   node.split = Some(split)
   val childIsLeaf = (LearningNode.indexToLevel(nodeIndex) + 1) == 
metadata.maxDepth
-  val leftChildIsLeaf = childIsLeaf || (stats.leftImpurity == 0.0)
-  val rightChildIsLeaf = childIsLeaf || (stats.rightImpurity == 
0.0)
+  val leftChildIsLeaf = childIsLeaf || 
(math.abs(stats.leftImpurity) < 1e-16)
+  val rightChildIsLeaf = childIsLeaf || 
(math.abs(stats.rightImpurity) < 1e-16)
--- End diff --

I'd prefer not to refactor it in this PR. Updated to use `EPSILON` from 
ml.impl.Utils


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16625: [SPARK-17874][core] Add SSL port configuration.

2017-02-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16625#discussion_r99668339
  
--- Diff: docs/security.md ---
@@ -49,10 +49,6 @@ component-specific configuration namespaces used to 
override the default setting
 Component
   
   
-spark.ssl.fs
--- End diff --

Hmmm... that code is actually being used, but not to set up the file 
server, but to configure HTTP clients that download from SSL-enabled servers. 
Let me see about making that clear in the configuration docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99666983
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreeTests.scala ---
@@ -124,8 +129,8 @@ private[ml] object TreeTests extends SparkFunSuite {
*   make mistakes such as creating loops of Nodes.
*/
   private def checkEqual(a: Node, b: Node): Unit = {
-assert(a.prediction === b.prediction)
-assert(a.impurity === b.impurity)
+assert(a.prediction ~== b.prediction absTol 1e-8)
+assert(a.impurity ~== b.impurity absTol 1e-8)
--- End diff --

All over the test suites we use tolerances as literal doubles instead of 
making a variable for each and every one. I think it would be over-engineering 
to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16043
  
**[Test build #72469 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72469/testReport)**
 for PR 16043 at commit 
[`32805cf`](https://github.com/apache/spark/commit/32805cfb2176ab74c21ca93ab53f92852ad7fb24).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16625: [SPARK-17874][core] Add SSL port configuration.

2017-02-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16625#discussion_r99667702
  
--- Diff: docs/configuration.md ---
@@ -1797,6 +1797,20 @@ Apart from these, the following properties are also 
available, and may be useful
 
 
 
+spark.ssl.[namespace].port
--- End diff --

This was intentional. "spark.ssl.port" doesn't make that much sense if you 
think about it; you want things like the master UI and history server UI to 
have different, well known ports, so having this shared config key here doesn't 
make a lot of sense. For the other configs, such as algorithms and keystore 
locations, sharing configuration is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99667381
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala
 ---
@@ -351,6 +370,36 @@ class DecisionTreeClassifierSuite
 dt.fit(df)
   }
 
+  test("training with sample weights") {
+val df = linearMulticlassDataset
+val numClasses = 3
+val predEquals = (x: Double, y: Double) => x == y
+// (impurity, maxDepth)
+val testParams = Seq(
+  ("gini", 10),
+  ("entropy", 10),
+  ("gini", 5)
+)
+for ((impurity, maxDepth) <- testParams) {
+  val estimator = new DecisionTreeClassifier()
+.setMaxDepth(maxDepth)
+.setSeed(seed)
+.setMinWeightFractionPerNode(0.049)
--- End diff --

We use param validators for this. Since those are already tested, I don't 
see a need.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r9967
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala ---
@@ -281,10 +283,26 @@ object MLTestingUtils extends SparkFunSuite {
   estimator: E with HasWeightCol,
   modelEquals: (M, M) => Unit): Unit = {
 estimator.set(estimator.weightCol, "weight")
-val models = Seq(0.001, 1.0, 1000.0).map { w =>
+val models = Seq(0.01, 1.0, 1000.0).map { w =>
--- End diff --

Yes, the decision tree tests have trouble with numerical precision when the 
weights are really small. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...

2017-02-06 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/16797
  
I'll double check, but I don't think 
```spark.sql.hive.manageFilesourcePartitions=false``` would solve this issue 
since we're still deriving the file relation's dataSchema parameter from the 
schema of MetastoreRelation. The call to ```fileFormat.inferSchema()``` has 
been removed entirely.

If Spark SQL is set on using a table property to store the case-sesnitive 
schema then I think having a way to backfill this property for existing < 2.1 
tables as well as tables not created or managed by Spark will be a necessity. 
If the cleanest way to deal with this case sensitivity problem is to bring back 
schema inference then I think a good option would be to introduce a 
configuration param to indicate whether or not an inferred schema should be 
written back to the table as a property.

We could also introduce another config param that allows a user to bypass 
schema inference even if a case-sensitive schema can't be read from the table 
properties. This could be helpful for users who would like to query external 
Hive tables that aren't managed by Spark and that they know aren't backed by 
files containing case-sensitive field names.

This would basically allow us to support the following use cases:

1) The MetastoreRelation is able to read a case-sensitive schema from the 
table properties. No inference is necessary.
2) The MetastoreRelation can't read a case-sensitive schema from the table 
properties. A case-sensitive schema is inferred and, if configured, written 
back as a table property.
3) The MetastoreRelation can't read a case-sensitive schema from the table 
properties. The user knows the underlying data files don't contain 
case-sensitive field names and has explicitly set a config param to skip the 
inference step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-06 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/16760
  
@gatorsmile Yeah Sean. Actually most likely i will need to work out a 
different schema than what i have currently for the generator tests. So i was 
planning to add the negative scenarios and generator tests in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72468/testReport)**
 for PR 16650 at commit 
[`cb24167`](https://github.com/apache/spark/commit/cb241672692db3e604c18bcd56f441f6863a09e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99665910
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala ---
@@ -70,17 +70,24 @@ object Variance extends Impurity {
  * Note: Instances of this class do not hold the data; they operate on 
views of the data.
  */
 private[spark] class VarianceAggregator()
-  extends ImpurityAggregator(statsSize = 3) with Serializable {
+  extends ImpurityAggregator(statsSize = 4) with Serializable {
 
   /**
* Update stats for one (node, feature, bin) with the given label.
* @param allStats  Flat stats array, with stats for this (node, 
feature, bin) contiguous.
* @param offsetStart index of stats for this (node, feature, bin).
*/
-  def update(allStats: Array[Double], offset: Int, label: Double, 
instanceWeight: Double): Unit = {
+  def update(
+  allStats: Array[Double],
+  offset: Int,
+  label: Double,
+  numSamples: Int,
+  sampleWeight: Double): Unit = {
+val instanceWeight = numSamples * sampleWeight
 allStats(offset) += instanceWeight
 allStats(offset + 1) += instanceWeight * label
 allStats(offset + 2) += instanceWeight * label * label
+allStats(offset + 3) += numSamples
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99665188
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -79,7 +79,12 @@ private[spark] abstract class ImpurityAggregator(val 
statsSize: Int) extends Ser
* @param allStats  Flat stats array, with stats for this (node, 
feature, bin) contiguous.
* @param offsetStart index of stats for this (node, feature, bin).
*/
-  def update(allStats: Array[Double], offset: Int, label: Double, 
instanceWeight: Double): Unit
+  def update(
+  allStats: Array[Double],
+  offset: Int,
+  label: Double,
+  numSamples: Int,
+  sampleWeight: Double): Unit
--- End diff --

I don't think it's necessary. It's a private class, and the only params 
currently in the doc are the ambiguous ones. These new ones should be self 
explanatory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99664910
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala
 ---
@@ -467,6 +469,52 @@ class StandaloneDynamicAllocationSuite
 }
   }
 
+  test("kill all executors on localhost") {
+sc = new SparkContext(appConf)
+val appId = sc.applicationId
+eventually(timeout(10.seconds), interval(10.millis)) {
+  val apps = getApplications()
+  assert(apps.size === 1)
+  assert(apps.head.id === appId)
+  assert(apps.head.executors.size === 2)
+  assert(apps.head.getExecutorLimit === Int.MaxValue)
+}
+val beforeList = getApplications().head.executors.keys.toSet
+assert(killExecutorsOnHost(sc, "localhost").equals(true))
+
+syncExecutors(sc)
+val afterList = getApplications().head.executors.keys.toSet
+
+eventually(timeout(10.seconds), interval(100.millis)) {
+  assert(beforeList.intersect(afterList).size == 0)
+}
+  }
+
+  test("executor registration on a blacklisted host must fail") {
+sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, 
"true"))
+val endpointRef = mock(classOf[RpcEndpointRef])
+val mockAddress = mock(classOf[RpcAddress])
+when(endpointRef.address).thenReturn(mockAddress)
+val message = RegisterExecutor("one", endpointRef, "blacklisted-host", 
10, Map.empty)
+
+// Get "localhost" on a blacklist.
+val taskScheduler = mock(classOf[TaskSchedulerImpl])
+when(taskScheduler.nodeBlacklist()).thenReturn(Set("blacklisted-host"))
+when(taskScheduler.sc).thenReturn(sc)
+sc.taskScheduler = taskScheduler
+
+// Create a fresh scheduler backend to blacklist "localhost".
+sc.schedulerBackend.stop()
+val backend =
+ new StandaloneSchedulerBackend(taskScheduler, sc, 
Array(masterRpcEnv.address.toSparkURL))
--- End diff --

Would be really nice to have automated style checks...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72467 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72467/testReport)**
 for PR 16795 at commit 
[`42ff642`](https://github.com/apache/spark/commit/42ff6426ec090ef6a1242d8556f39cbdef526d8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99664877
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala ---
@@ -83,23 +83,29 @@ object Entropy extends Impurity {
  * @param numClasses  Number of classes for label.
  */
 private[spark] class EntropyAggregator(numClasses: Int)
-  extends ImpurityAggregator(numClasses) with Serializable {
+  extends ImpurityAggregator(numClasses + 1) with Serializable {
--- End diff --

Yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...

2017-02-06 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/16722#discussion_r99664273
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala 
---
@@ -42,6 +42,7 @@ import org.apache.spark.rdd.RDD
 private[spark] class DecisionTreeMetadata(
 val numFeatures: Int,
 val numExamples: Long,
+val weightedNumExamples: Double,
--- End diff --

Yeah, not all of the params are added to the doc, tbh I'm not sure how it 
was decided which ones were and were not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...

2017-02-06 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/16810
  
@srowen, I'll help with this. It turns out that we don't need to make any 
Jenkins configuration changes for the pull request builder. For the master 
branch builders, I've gone ahead and disabled the jobs and will complete their 
final removal in a few days after this patch merges.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99664044
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala
 ---
@@ -467,6 +469,52 @@ class StandaloneDynamicAllocationSuite
 }
   }
 
+  test("kill all executors on localhost") {
+sc = new SparkContext(appConf)
+val appId = sc.applicationId
+eventually(timeout(10.seconds), interval(10.millis)) {
+  val apps = getApplications()
+  assert(apps.size === 1)
+  assert(apps.head.id === appId)
+  assert(apps.head.executors.size === 2)
+  assert(apps.head.getExecutorLimit === Int.MaxValue)
+}
+val beforeList = getApplications().head.executors.keys.toSet
+assert(killExecutorsOnHost(sc, "localhost").equals(true))
+
+syncExecutors(sc)
+val afterList = getApplications().head.executors.keys.toSet
+
+eventually(timeout(10.seconds), interval(100.millis)) {
+  assert(beforeList.intersect(afterList).size == 0)
+}
+  }
+
+  test("executor registration on a blacklisted host must fail") {
+sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, 
"true"))
+val endpointRef = mock(classOf[RpcEndpointRef])
+val mockAddress = mock(classOf[RpcAddress])
+when(endpointRef.address).thenReturn(mockAddress)
+val message = RegisterExecutor("one", endpointRef, "blacklisted-host", 
10, Map.empty)
+
+// Get "localhost" on a blacklist.
+val taskScheduler = mock(classOf[TaskSchedulerImpl])
+when(taskScheduler.nodeBlacklist()).thenReturn(Set("blacklisted-host"))
+when(taskScheduler.sc).thenReturn(sc)
+sc.taskScheduler = taskScheduler
+
+// Create a fresh scheduler backend to blacklist "localhost".
+sc.schedulerBackend.stop()
+val backend =
+ new StandaloneSchedulerBackend(taskScheduler, sc, 
Array(masterRpcEnv.address.toSparkURL))
--- End diff --

super nit: looks like this is only indented one space, not two


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16650
  
**[Test build #72466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72466/testReport)**
 for PR 16650 at commit 
[`37248a2`](https://github.com/apache/spark/commit/37248a202c15807fffe9e25e5b630a27dda38204).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99661763
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala ---
@@ -456,4 +461,69 @@ class BlacklistTrackerSuite extends SparkFunSuite with 
BeforeAndAfterEach with M
   conf.remove(config)
 }
   }
+
+  test("blacklisting kills executors, configured by 
BLACKLIST_KILL_ENABLED") {
+val allocationClientMock = mock[ExecutorAllocationClient]
+when(allocationClientMock.killExecutors(any(), any(), 
any())).thenReturn(Seq("called"))
+when(allocationClientMock.killExecutorsOnHost("hostA")).thenAnswer(new 
Answer[Boolean] {
+  override def answer(invocation: InvocationOnMock): Boolean = {
+if (blacklist.nodeBlacklist.contains("hostA") == false) {
+  throw new IllegalStateException("hostA should be on the 
blacklist")
--- End diff --

Sure. I've used your text with very minor modification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/16744
  
Amending this PR to upgrade the KCL/AWS SDK dependencies to more-current 
versions (1.7.3 and 1.11.76, respectively). The 
```RegionUtils.getRegionByEndpoint()``` API was removed from the SDK, so I've 
had to replace it with a simple string split method for the examples and test 
suites that were utilizing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16744: [SPARK-19405][STREAMING] Support for cross-account Kines...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16744
  
**[Test build #72465 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72465/testReport)**
 for PR 16744 at commit 
[`eb75482`](https://github.com/apache/spark/commit/eb754825d1934d7eee4175b8adaefe51f46050dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99661191
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala
 ---
@@ -467,6 +469,51 @@ class StandaloneDynamicAllocationSuite
 }
   }
 
+  test("kill all executors on localhost") {
+sc = new SparkContext(appConf)
+val appId = sc.applicationId
+eventually(timeout(10.seconds), interval(10.millis)) {
+  val apps = getApplications()
+  assert(apps.size === 1)
+  assert(apps.head.id === appId)
+  assert(apps.head.executors.size === 2)
+  assert(apps.head.getExecutorLimit === Int.MaxValue)
+}
+val beforeList = getApplications().head.executors.keys.toSet
+// kill all executors without replacement
+assert(killExecutorsOnHost(sc, "localhost").equals(true))
+
+syncExecutors(sc)
+val afterList = getApplications().head.executors.keys.toSet
+
+eventually(timeout(10.seconds), interval(100.millis)) {
+  assert(beforeList.intersect(afterList).size == 0)
+}
+  }
+
+  test("executor registration on a blacklisted host must fail") {
+sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, 
"true"))
+val endpointRef = mock(classOf[RpcEndpointRef])
+val mockAddress = mock(classOf[RpcAddress])
+when(endpointRef.address).thenReturn(mockAddress)
+val message = RegisterExecutor("one", endpointRef, "localhost", 10, 
Map.empty)
+
+// Get "localhost" on a blacklist.
+val taskScheduler = mock(classOf[TaskSchedulerImpl])
+when(taskScheduler.nodeBlacklist()).thenReturn(Set("localhost"))
--- End diff --

Let's call it "blacklisted-host".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99660908
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala
 ---
@@ -467,6 +469,51 @@ class StandaloneDynamicAllocationSuite
 }
   }
 
+  test("kill all executors on localhost") {
+sc = new SparkContext(appConf)
+val appId = sc.applicationId
+eventually(timeout(10.seconds), interval(10.millis)) {
+  val apps = getApplications()
+  assert(apps.size === 1)
+  assert(apps.head.id === appId)
+  assert(apps.head.executors.size === 2)
+  assert(apps.head.getExecutorLimit === Int.MaxValue)
+}
+val beforeList = getApplications().head.executors.keys.toSet
+// kill all executors without replacement
--- End diff --

Best to just delete the comment, then. Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99660755
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -600,6 +603,16 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
*/
   protected def doKillExecutors(executorIds: Seq[String]): Future[Boolean] 
=
 Future.successful(false)
+
+  /**
+   * Request that the cluster manager kill all executors on a given host.
+   * @return whether the kill request is acknowledged.
+   */
+  final override def killExecutorsOnHost(host: String): Boolean = {
+logInfo(s"Requesting to kill any and all executors on host ${host}")
+driverEndpoint.send(KillExecutorsOnHost(host))
--- End diff --

Sure. I've paraphrased this a bit but it's a helpful comment to add.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16650: [SPARK-16554][CORE] Automatically Kill Executors ...

2017-02-06 Thread jsoltren

Github user jsoltren commented on a diff in the pull request:

https://github.com/apache/spark/pull/16650#discussion_r99660675
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala
 ---
@@ -489,6 +491,29 @@ class StandaloneDynamicAllocationSuite
 }
   }
 
+  test("executor registration on a blacklisted host must fail") {
+sc = new SparkContext(appConf.set(config.BLACKLIST_ENABLED.key, 
"true"))
+val endpointRef = mock(classOf[RpcEndpointRef])
+val mockAddress = mock(classOf[RpcAddress])
+when(endpointRef.address).thenReturn(mockAddress)
+val message = RegisterExecutor("one", endpointRef, "localhost", 10, 
Map.empty)
+
+// Get "localhost" on a blacklist.
+val taskScheduler = mock(classOf[TaskSchedulerImpl])
+when(taskScheduler.nodeBlacklist()).thenReturn(Set("localhost"))
+when(taskScheduler.sc).thenReturn(sc)
+sc.taskScheduler = taskScheduler
+
+// Create a fresh scheduler backend to blacklist "localhost".
+sc.schedulerBackend.stop()
+val backend =
+ new StandaloneSchedulerBackend(taskScheduler, sc, 
Array(masterRpcEnv.address.toSparkURL))
+backend.start()
+
+backend.driverEndpoint.ask[Boolean](message)
+verify(endpointRef).send(RegisterExecutorFailed(any()))
--- End diff --

Thanks for the tip. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99660704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
--- End diff --

oh. this is ddl util function. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99660453
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
--- End diff --

yes. you are right.  I will change it to private.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r99659988
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -814,4 +816,50 @@ object DDLUtils {
   }
 }
   }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  def verifyAlterTableAddColumn(
--- End diff --

This function should be a private function of `AlterTableAddColumnsCommand 
`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72464 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72464/testReport)**
 for PR 16795 at commit 
[`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16760
  
@dilipbiswal Are you planning to submit another PR for `Generators` or do 
it in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][test-maven] Fix ParquetAvroCompatib...

2017-02-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16795
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...

2017-02-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16823
  
Agree, though we are talking about duplicating 1 line of code in 3 nearby 
places. It's not meaningfully duplicating anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16824
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72462/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16824
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16738: [SPARK-19398] Change one misleading log in TaskSe...

2017-02-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16738


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16824
  
**[Test build #72462 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72462/testReport)**
 for PR 16824 at commit 
[`b3acaad`](https://github.com/apache/spark/commit/b3acaadfed5833c108c03aae7865b6ed2782169a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatib...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16795
  
**[Test build #72463 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72463/testReport)**
 for PR 16795 at commit 
[`499f6fd`](https://github.com/apache/spark/commit/499f6fdc568414d66156d72b5a411833fd116bea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16738: [SPARK-19398] Change one misleading log in TaskSetManage...

2017-02-06 Thread kayousterhout

Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/16738
  
Merged this to master.  Thanks for the fix @jinxing64 -- these fixes to 
improve readability / usability of the code are super useful!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16823: [SPARK] Config methods simplification at SparkSession#Bu...

2017-02-06 Thread pfcoperez

Github user pfcoperez commented on the issue:

https://github.com/apache/spark/pull/16823
  
@andrewor14 @srowen In any case, I just wanted to add that copying code is, 
basically the worst strategy. 

If you wanted to constraint the types for those tree and not just 
**AnyVal** sub-classes I would recommend doing something as:

```
def config(key: String, value: Double): Builder = config(key, 
value.toString)
def config(key: String, value: Boolean): Builder = config(key, 
value.toString)
def config(key: String, value: Long): Builder = config(key, value.toString) 

```

Exactly same interface, no copy-paste code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibilitySuite f...

2017-02-06 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16795
  
Oh, thank you, @liancheng !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16824
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16824
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72461/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16824
  
**[Test build #72461 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72461/testReport)**
 for PR 16824 at commit 
[`00c8af3`](https://github.com/apache/spark/commit/00c8af35b704e989ad8536490f310a35c2e721fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16795: [SPARK-19409][BUILD] Fix ParquetAvroCompatibilitySuite f...

2017-02-06 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16795
  
@dongjoon-hyun, you may add `[TEST-MAVEN]` in the PR title to ask Jenkins 
to test this PR using Maven.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-06 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16791
  
Ah, thank you for confirming and the information!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16824
  
**[Test build #72462 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72462/testReport)**
 for PR 16824 at commit 
[`b3acaad`](https://github.com/apache/spark/commit/b3acaadfed5833c108c03aae7865b6ed2782169a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...

2017-02-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16824
  
**[Test build #72461 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72461/testReport)**
 for PR 16824 at commit 
[`00c8af3`](https://github.com/apache/spark/commit/00c8af35b704e989ad8536490f310a35c2e721fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-06 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16791
  
@HyukjinKwon Sorry that I didn't see your comment before this PR got 
merged. I believe PARQUET-686 had already been fixed by apache/parquet-mr#367 
but wasn't marked as resolved in JIRA. Thanks for sending out #16817 for 
re-enabling the tests!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...

2017-02-06 Thread Yunni

Github user Yunni commented on the issue:

https://github.com/apache/spark/pull/16715
  
@yanboliang, just a friendly reminder please don't forget to review the PR 
when you have time. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16803: [SPARK-19458][BUILD][SQL]load hive jars from local repo ...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16803
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16803: [SPARK-19458][BUILD][SQL]load hive jars from local repo ...

2017-02-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16803
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72457/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 >

201 - 300 of 557 matches

Mail list logo