date:20160323

[GitHub] spark pull request: [SPARK-14073][Streaming][test-maven]Move flume...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11895#issuecomment-200687758
  
**[Test build #54008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54008/consoleFull)**
 for PR 11895 at commit 
[`f5d5976`](https://github.com/apache/spark/commit/f5d597681f6af472ffd5bba75674644bfc6cb4ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-200687233
  
@rxin : I would really like to have this PR in trunk. As things stand, for 
anyone using their own scheduler, one has to maintain a patch over open source 
release to have that glue in Spark. Re API breaking over Spark releases : I 
agree that breaking APIs is bad. But it would be atleast better than the 
current model of dealing with this: doing a merge for *every* release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14073][Streaming][test-maven]Move flume...

2016-03-23 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11895#discussion_r57277319
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java ---
@@ -303,27 +305,42 @@ public void pivot() {
 Assert.assertEquals(3.0, actual.get(1).getDouble(2), 0.01);
   }
 
+  private String getResource(String resource) {
+try {
+  // The following "getResource" has different behaviors in SBT and 
Maven.
+  // When running in Jenkins, the file path may contain "@" when there 
are multiple
+  // SparkPullRequestBuilders running in the same worker
+  // (e.g., /home/jenkins/workspace/SparkPullRequestBuilder@2)
+  // When running in SBT, "@" in the file path will be returned as 
"@", however,
+  // when running in Maven, "@" will be encoded as "%40".
--- End diff --

Here are the debug output I tried in Jenkins:

```
Running test.org.apache.spark.sql.JavaDataFrameSuite

raw:file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt
Find: 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/scala-2.11/test-classes/text-suite.txt

raw:file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt
Find: 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/scala-2.11/test-classes/text-suite.txt

raw:file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt
Find: 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/scala-2.11/test-classes/text-suite.txt

raw:file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt
Find: 
/home/jenkins/workspace/SparkPullRequestBuilder@2/sql/core/target/scala-2.11/test-classes/text-suite.txt
Tests run: 17, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 7.628 sec 
<<< FAILURE! - in test.org.apache.spark.sql.JavaDataFrameSuite
testTextLoad(test.org.apache.spark.sql.JavaDataFrameSuite)  Time elapsed: 
0.356 sec  <<< ERROR!
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt;
at 
test.org.apache.spark.sql.JavaDataFrameSuite.testTextLoad(JavaDataFrameSuite.java:349)

testGenericLoad(test.org.apache.spark.sql.JavaDataFrameSuite)  Time 
elapsed: 0.322 sec  <<< ERROR!
org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt;
at 
test.org.apache.spark.sql.JavaDataFrameSuite.testGenericLoad(JavaDataFrameSuite.java:311)

Running test.org.apache.spark.sql.JavaRowSuite
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec - in 
test.org.apache.spark.sql.JavaRowSuite

Results :

Tests in error: 
  JavaDataFrameSuite.testGenericLoad:311 ? Analysis Path does not exist: 
file:/h...
  JavaDataFrameSuite.testTextLoad:349 ? Analysis Path does not exist: 
file:/home...
```
This is the line returned by 
`Thread.currentThread().getContextClassLoader().getResource("text-suite.txt").toString()`
``` 

file:/home/jenkins/workspace/SparkPullRequestBuilder%402/sql/core/target/scala-2.11/test-classes/text-suite.txt
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14073][Streaming][test-maven]Move flume...

2016-03-23 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/11895#discussion_r57277180
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java ---
@@ -303,27 +305,42 @@ public void pivot() {
 Assert.assertEquals(3.0, actual.get(1).getDouble(2), 0.01);
   }
 
+  private String getResource(String resource) {
+try {
+  // The following "getResource" has different behaviors in SBT and 
Maven.
+  // When running in Jenkins, the file path may contain "@" when there 
are multiple
+  // SparkPullRequestBuilders running in the same worker
+  // (e.g., /home/jenkins/workspace/SparkPullRequestBuilder@2)
+  // When running in SBT, "@" in the file path will be returned as 
"@", however,
+  // when running in Maven, "@" will be encoded as "%40".
--- End diff --

I can reproduce this behavior by putting Spark to some folder containing 
"@" locally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14073][Streaming][test-maven]Move flume...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11895#issuecomment-200684929
  
**[Test build #54007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54007/consoleFull)**
 for PR 11895 at commit 
[`14d859d`](https://github.com/apache/spark/commit/14d859d1378a52442b84507cfdd690aad43d1531).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14109][SQL] Fix HDFSMetadataLog to fall...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11925#issuecomment-200682795
  
**[Test build #54006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54006/consoleFull)**
 for PR 11925 at commit 
[`dcf8096`](https://github.com/apache/spark/commit/dcf80967b6f85f30d34db5730ead611b83912f53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276779
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala
 ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkFunSuite}
+import org.apache.spark.executor.TaskMetrics
+import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
+import org.apache.spark.storage.BlockManagerId
+
+class ExternalClusterManagerSuite extends SparkFunSuite with 
LocalSparkContext
+{
+  test("launch of backend and scheduler") {
+val conf = new SparkConf().setMaster("myclusterManager").
+setAppName("testcm").set("spark.driver.allowMultipleContexts", 
"true")
+sc = new SparkContext(conf)
+// check if the scheduler components are created
+assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend])
+assert(sc.taskScheduler.isInstanceOf[FakeScheduler])
+  }
+}
+
+class CheckExternalClusterManager extends ExternalClusterManager {
+
+  def canCreate(masterURL: String): Boolean = masterURL == 
"myclusterManager"
+
+  def createTaskScheduler(sc: SparkContext): TaskScheduler = new 
FakeScheduler
+
+  def createSchedulerBackend(sc: SparkContext, scheduler: TaskScheduler): 
SchedulerBackend =
+new FakeSchedulerBackend()
+
+  def initialize(scheduler: TaskScheduler, backend: SchedulerBackend): 
Unit = {}
+
+}
+
+class FakeScheduler extends TaskScheduler {
--- End diff --

To keep this consistent with the external CM declared above, you could 
rename this to `TestTaskScheduler` or `DummyTaskScheduler`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276711
  
--- Diff: dev/.rat-excludes ---
@@ -98,3 +98,4 @@ LZ4BlockInputStream.java
 spark-deps-.*
 .*csv
 .*tsv
+org.apache.spark.scheduler.ExternalClusterManager
--- End diff --

why is this needed ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276691
  
--- Diff: 
core/src/test/resources/META-INF/services/org.apache.spark.scheduler.ExternalClusterManager
 ---
@@ -0,0 +1 @@
+org.apache.spark.scheduler.CheckExternalClusterManager
--- End diff --

why is this file needed ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276663
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.SparkContext
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: DeveloperApi ::
+ * A cluster manager interface to plugin external scheduler.
+ *
+ */
+@DeveloperApi
+private[spark] trait ExternalClusterManager {
+
+  /**
+   * Check if this cluster manager instance can create scheduler components
+   * for a certain master URL.
+   * @param masterURL the master URL
+   * @return True if the cluster manager can create scheduler backend/
+   */
+  def canCreate(masterURL : String): Boolean
+
+  /**
+   * Create a task scheduler instance for the given SparkContext
+   * @param sc SparkContext
+   * @return TaskScheduler that will be responsible for task handling
+   */
+  def createTaskScheduler (sc: SparkContext): TaskScheduler
+
+  /**
+   * Create a scheduler backend for the given SparkContext and scheduler. 
This is
+   * called after task scheduler is created using 
[[ExternalClusterManager.createTaskScheduler()]].
+   * @param sc SparkContext
+   * @param scheduler TaskScheduler that will be used with the scheduler 
backend.
+   * @return SchedulerBackend that works with a TaskScheduler
+   */
+  def createSchedulerBackend (sc: SparkContext, scheduler: TaskScheduler): 
SchedulerBackend
--- End diff --

can you also include the `masterURL` as a param ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276660
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.SparkContext
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: DeveloperApi ::
+ * A cluster manager interface to plugin external scheduler.
+ *
+ */
+@DeveloperApi
+private[spark] trait ExternalClusterManager {
+
+  /**
+   * Check if this cluster manager instance can create scheduler components
+   * for a certain master URL.
+   * @param masterURL the master URL
+   * @return True if the cluster manager can create scheduler backend/
+   */
+  def canCreate(masterURL : String): Boolean
+
+  /**
+   * Create a task scheduler instance for the given SparkContext
+   * @param sc SparkContext
+   * @return TaskScheduler that will be responsible for task handling
+   */
+  def createTaskScheduler (sc: SparkContext): TaskScheduler
--- End diff --

can you also include the `masterURL` as a param ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276618
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/ExternalClusterManagerSuite.scala
 ---
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkFunSuite}
+import org.apache.spark.executor.TaskMetrics
+import org.apache.spark.scheduler.SchedulingMode.SchedulingMode
+import org.apache.spark.storage.BlockManagerId
+
+class ExternalClusterManagerSuite extends SparkFunSuite with 
LocalSparkContext
+{
+  test("launch of backend and scheduler") {
+val conf = new SparkConf().setMaster("myclusterManager").
+setAppName("testcm").set("spark.driver.allowMultipleContexts", 
"true")
+sc = new SparkContext(conf)
+// check if the scheduler components are created
+assert(sc.schedulerBackend.isInstanceOf[FakeSchedulerBackend])
+assert(sc.taskScheduler.isInstanceOf[FakeScheduler])
+  }
+}
+
+class CheckExternalClusterManager extends ExternalClusterManager {
--- End diff --

Rename to `TestExternalClusterManager` or `DummyExternalClusterManager`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11916#issuecomment-200679647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53998/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11916#issuecomment-200679645
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DO_NOT_MERGE]Reproduce DataFrameReaderWriterS...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11922#issuecomment-200679591
  
**[Test build #54005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54005/consoleFull)**
 for PR 11922 at commit 
[`54bd806`](https://github.com/apache/spark/commit/54bd80670820074bfc57d96dc887ebee5bbb523f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11916#issuecomment-200679249
  
**[Test build #53998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53998/consoleFull)**
 for PR 11916 at commit 
[`7c033b6`](https://github.com/apache/spark/commit/7c033b6d6dd7eb1d9296d82a965facec95dd6757).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276344
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ExternalClusterManager.scala ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import org.apache.spark.SparkContext
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * :: DeveloperApi ::
+ * A cluster manager interface to plugin external scheduler.
+ *
--- End diff --

remove extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276268
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -149,7 +149,14 @@ private[spark] class Executor(
   tr.kill(interruptThread)
 }
   }
-
+  def killAllTasks (interruptThread: Boolean) : Unit = {
--- End diff --

I could not see this method being called from anywhere. If you don't plan 
to use it, please remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11809#issuecomment-200678468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53999/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11809#issuecomment-200678465
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11809#issuecomment-200677948
  
**[Test build #53999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53999/consoleFull)**
 for PR 11809 at commit 
[`81c46c7`](https://github.com/apache/spark/commit/81c46c72117f30679f8d11c908340dc9067a14e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13874][Doc]Remove docs of streaming-akk...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11824#issuecomment-200677903
  
**[Test build #54004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54004/consoleFull)**
 for PR 11824 at commit 
[`dcdd3cb`](https://github.com/apache/spark/commit/dcdd3cb6e7aafa73e8ea8302c29ccf8a376f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276201
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -149,7 +149,14 @@ private[spark] class Executor(
   tr.kill(interruptThread)
 }
   }
-
+  def killAllTasks (interruptThread: Boolean) : Unit = {
--- End diff --

add a space between the methods


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276211
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -149,7 +149,14 @@ private[spark] class Executor(
   tr.kill(interruptThread)
 }
   }
-
+  def killAllTasks (interruptThread: Boolean) : Unit = {
+// kill all the running tasks
+for (taskRunner <- runningTasks.values().asScala) {
+  if (taskRunner != null) {
+taskRunner.kill(interruptThread)
+  }
+}
+  }
--- End diff --

add a space between the methods


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14032] [SQL] Eliminate Unnecessary Dist...

2016-03-23 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11854#issuecomment-200676690
  
The motivation of this function `distinctSet` is to obtain the uniqueness 
constraint from the child operators. The output of `Distinct`, `Intersect`, 
`Except`, and `Aggregate` (iff its aggregate expressions is identical to its 
grouping expressions) can always guarantee the uniqueness. Thus, the parent 
operators can use it for query optimization. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13874][Doc]Remove docs of streaming-akk...

2016-03-23 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/11824#issuecomment-200676764
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11723#discussion_r57276135
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2443,8 +2443,34 @@ object SparkContext extends Logging {
   "in the form mesos://zk://host:port. Current Master URL will 
stop working in Spark 2.0.")
 createTaskScheduler(sc, "mesos://" + zkUrl, deployMode)
 
-  case _ =>
-throw new SparkException("Could not parse Master URL: '" + master 
+ "'")
+  case masterUrl =>
+val cm = getClusterManager(masterUrl) match {
+  case Some(clusterMgr) => clusterMgr
+  case None => throw new SparkException("Could not parse Master 
URL: '" + master + "'")
+}
+try {
+  val scheduler = cm.createTaskScheduler(sc)
+  val backend = cm.createSchedulerBackend(sc, scheduler)
+  cm.initialize(scheduler, backend)
+  (backend, scheduler)
+} catch {
+  case e: Exception => {
+throw new SparkException("External scheduler cannot be 
instantiated", e)
+  }
+}
+}
+  }
+
+  private def getClusterManager(url: String): 
Option[ExternalClusterManager] = {
+val loader = Utils.getContextOrSparkClassLoader
+val serviceLoader = 
ServiceLoader.load(classOf[ExternalClusterManager], loader)
+
+serviceLoader.asScala.filter(_.canCreate(url)).toList match {
+  // exactly one registered manager
+  case head :: Nil => Some(head)
+  case Nil => None
+  case multipleMgrs => sys.error(s"Multiple Cluster Managers 
registered " +
--- End diff --

Can you include the list of matching cluster managers in the message ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10691][ML] Make LogisticRegressionModel...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11928#issuecomment-200676208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54000/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10691][ML] Make LogisticRegressionModel...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11928#issuecomment-200676206
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10691][ML] Make LogisticRegressionModel...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11928#issuecomment-200675840
  
**[Test build #54000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54000/consoleFull)**
 for PR 11928 at commit 
[`836ec92`](https://github.com/apache/spark/commit/836ec920108646481c9f11ff4426c9efdd4c7df9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11927#issuecomment-200674840
  
LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11927#issuecomment-200675269
  
**[Test build #2676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2676/consoleFull)**
 for PR 11927 at commit 
[`9929da9`](https://github.com/apache/spark/commit/9929da95ad29a2568b5891d4627627220999c3b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11929#issuecomment-200674607
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13934][SQL] fixed table identifier

2016-03-23 Thread wangyang1992

GitHub user wangyang1992 opened a pull request:

https://github.com/apache/spark/pull/11929

[SPARK-13934][SQL] fixed table identifier

## What changes were proposed in this pull request?
Table identifier that starts in a form of scientific notation (like 1e34) 
will throw an exception.
val tableName = "1e34abcd"
hc.sql("select 123").registerTempTable(tableName)
hc.dropTempTable(tableName)
The last line will throw a RuntimeException.(java.lang.RuntimeException: 
[1.1] failure: identifier expected)

Fix this by changing the scientific notation parser. If a scientific 
notation is followed by one or more identifier char, then don't see it as a 
valid token.

## How was this patch tested?

Unit test is added.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyang1992/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11929


commit 81287d31648b229bd3e617ef9ebce985fb54dca0
Author: wangyang 
Date:   2016-03-24T04:30:27Z

fixed table identifier




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923][SPARK-14014][SQL] Session catalo...

2016-03-23 Thread andrewor14

Github user andrewor14 closed the pull request at:

https://github.com/apache/spark/pull/11923


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923][SPARK-14014][SQL] Session catalo...

2016-03-23 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/11923#issuecomment-200672391
  
I'm going to close this for now since we reverted the original patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/11927#discussion_r57275794
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
@@ -205,6 +206,8 @@ private[spark] class PipedRDD[T: ClassTag](
   private def propagateChildException(): Unit = {
 val t = childThreadException.get()
 if (t != null) {
+  logError(s"Caught exception ${t.getMessage} while running 
pipe(). Command ran: " +
--- End diff --

changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-23 Thread yinxusen

Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-200672196
  
@jkbradley Sure, I'll leave it here and check back later. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11927#discussion_r57275342
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
@@ -205,6 +206,8 @@ private[spark] class PipedRDD[T: ClassTag](
   private def propagateChildException(): Unit = {
 val t = childThreadException.get()
 if (t != null) {
+  logError(s"Caught exception ${t.getMessage} while running 
pipe(). Command ran: " +
--- End diff --

the exception message might be very long. would be better to put this at 
the end rather than in the middle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-23 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11621#discussion_r57275251
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -223,3 +223,20 @@ def _call_java(self, name, *args):
 sc = SparkContext._active_spark_context
 java_args = [_py2java(sc, arg) for arg in args]
 return _java2py(sc, m(*java_args))
+
+
+class JavaCallable(object):
--- End diff --

JavaCallable seems reasonable.  Could you modify it so that JavaModel can 
inherit from it and eliminate the duplicate code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14032] [SQL] Eliminate Unnecessary Dist...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11854#issuecomment-200665778
  
**[Test build #54003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54003/consoleFull)**
 for PR 11854 at commit 
[`7d95bc1`](https://github.com/apache/spark/commit/7d95bc17c2523fa25bcac59ed03886e8b0cb8c40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14032] [SQL] Eliminate Unnecessary Dist...

2016-03-23 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11854#issuecomment-200665098
  
Added a function `distinctSet` into `QueryPlan`. This function will return 
the set of attributes whose combination can uniquely identify a row. Maybe I 
should create a separate PR for this only and added a few test cases to cover 
the correctness. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200663369
  
**[Test build #54002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54002/consoleFull)**
 for PR 11926 at commit 
[`5912fd3`](https://github.com/apache/spark/commit/5912fd3794e0f62817a4c44025de09bfcec4c944).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200663292
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200663281
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200663293
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53997/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200663135
  
**[Test build #53997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53997/consoleFull)**
 for PR 11926 at commit 
[`5912fd3`](https://github.com/apache/spark/commit/5912fd3794e0f62817a4c44025de09bfcec4c944).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12443][SQL] encoderFor should support D...

2016-03-23 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/10399#issuecomment-200662936
  
ping @marmbrus please let me know if this is ok for you now, thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14102][CORE] Block `reset` command in S...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11920#issuecomment-200662831
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53995/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14102][CORE] Block `reset` command in S...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11920#issuecomment-200662829
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14102][CORE] Block `reset` command in S...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11920#issuecomment-200662764
  
**[Test build #53995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53995/consoleFull)**
 for PR 11920 at commit 
[`7a438ae`](https://github.com/apache/spark/commit/7a438ae78733ea987d39c04fa53f66de715b1c2a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10691][ML] Make LogisticRegressionModel...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11928#issuecomment-200662362
  
**[Test build #54000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54000/consoleFull)**
 for PR 11928 at commit 
[`836ec92`](https://github.com/apache/spark/commit/836ec920108646481c9f11ff4426c9efdd4c7df9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13742][Core] Add non-iterator interface...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11578#issuecomment-200662364
  
**[Test build #54001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54001/consoleFull)**
 for PR 11578 at commit 
[`d51d553`](https://github.com/apache/spark/commit/d51d5537d5d93dbbfe6c8fbfa8574780fdfb8c68).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10691][ML] Make LogisticRegressionModel...

2016-03-23 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/11928

[SPARK-10691][ML] Make LogisticRegressionModel, LinearRegressionModel 
evaluate() public

## What changes were proposed in this pull request?

Made evaluate method public.  Fixed LogisticRegressionModel evaluate to 
handle case when probabilityCol is not specified.

## How was this patch tested?

There were already unit tests for these methods.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark public-evaluate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11928.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11928


commit 836ec920108646481c9f11ff4426c9efdd4c7df9
Author: Joseph K. Bradley 
Date:   2016-03-24T04:40:59Z

Made LogisticRegression, LinearRegression evaluate() public




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...

2016-03-23 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11621#discussion_r57274042
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -231,6 +232,210 @@ def intercept(self):
 """
 return self._call_java("intercept")
 
+@property
+@since("2.0.0")
+def summary(self):
+"""
+Gets summary (e.g. residuals, mse, r-squared ) of model on
+training set. An exception is thrown if
+`trainingSummary == None`.
+"""
+java_blrt_summary = self._call_java("summary")
+return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
+
+@property
+@since("2.0.0")
+def hasSummary(self):
+"""
+Indicates whether a training summary exists for this model
+instance.
+"""
+return self._call_java("hasSummary")
+
+"""
--- End diff --

I want to make it public.  I'll send a PR now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-200660263
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53993/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-200660261
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9893#issuecomment-200660184
  
**[Test build #53993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53993/consoleFull)**
 for PR 9893 at commit 
[`2163b47`](https://github.com/apache/spark/commit/2163b47635a7e076515930e5a486a898d1cc6f6e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DB2IntegrationSuite extends DockerJDBCIntegrationSuite `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13949] PySpark ml DecisionTreeClassifie...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11892#issuecomment-200660088
  
**[Test build #2675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2675/consoleFull)**
 for PR 11892 at commit 
[`bad5c3e`](https://github.com/apache/spark/commit/bad5c3ea9556ad8fe02d60342bb7a545b92214fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200659749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53996/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200659748
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200659678
  
**[Test build #53996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53996/consoleFull)**
 for PR 11301 at commit 
[`078b272`](https://github.com/apache/spark/commit/078b272767ea653d9cced56b4e928b81aaabfb8f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13963][ML] Adding binary toggle param t...

2016-03-23 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/11832#issuecomment-200659606
  
LGTM Ready to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13579][build][wip] Stop building the ma...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11796#issuecomment-200659325
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13579][build][wip] Stop building the ma...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11796#issuecomment-200659327
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53992/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13579][build][wip] Stop building the ma...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11796#issuecomment-200658889
  
**[Test build #53992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53992/consoleFull)**
 for PR 11796 at commit 
[`5c660aa`](https://github.com/apache/spark/commit/5c660aac3b6d44622ab39656367a6f45b0b46402).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13963][ML] Adding binary toggle param t...

2016-03-23 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/11832#discussion_r57273260
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala ---
@@ -53,9 +65,10 @@ class HashingTF(val numFeatures: Int) extends 
Serializable {
   @Since("1.1.0")
   def transform(document: Iterable[_]): Vector = {
 val termFrequencies = mutable.HashMap.empty[Int, Double]
+val setTF = if (binary) (i: Int) => 1.0 else (i: Int) => 
termFrequencies.getOrElse(i, 0.0) + 1.0
--- End diff --

Oh yep you're right


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13949] PySpark ml DecisionTreeClassifie...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11892#issuecomment-200657826
  
**[Test build #2675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2675/consoleFull)**
 for PR 11892 at commit 
[`bad5c3e`](https://github.com/apache/spark/commit/bad5c3ea9556ad8fe02d60342bb7a545b92214fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13949] PySpark ml DecisionTreeClassifie...

2016-03-23 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/11892#issuecomment-200657678
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10780][ML][WIP] Add initial model to km...

2016-03-23 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/9#issuecomment-200657397
  
@yinxusen Well, I still think we should support initial models, and 
hopefully some of the code (like unit tests) would be reusable.  Feel free to 
leave this open until you can see how much needs to be changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-12183][ML][MLLIB] Remove mllib tree imp...

2016-03-23 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/11855#issuecomment-200656539
  
I'll go ahead and merge this with master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200653658
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53994/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200653654
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200653343
  
**[Test build #53994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53994/consoleFull)**
 for PR 11301 at commit 
[`c0b1f47`](https://github.com/apache/spark/commit/c0b1f473ddbdc46633431aa4f88cfc0d0b872164).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11927#issuecomment-200652969
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57272561
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/PlanParserSuite.scala
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types.IntegerType
+
+class PlanParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, plan: LogicalPlan): Unit = {
+comparePlans(parsePlan(sqlCommand), plan)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parsePlan(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("case insensitive") {
+val plan = table("a").select(star())
+assertEqual("sELEct * FroM a", plan)
+assertEqual("select * fRoM a", plan)
+assertEqual("SELECT * FROM a", plan)
+  }
+
+  test("show functions") {
+assertEqual("show functions", ShowFunctions(None, None))
+assertEqual("show functions foo", ShowFunctions(None, Some("foo")))
+assertEqual("show functions foo.bar", ShowFunctions(Some("foo"), 
Some("bar")))
+assertEqual("show functions 'foo.*'", ShowFunctions(None, 
Some("foo\\.*")))
+intercept("show functions foo.bar.baz", "SHOW FUNCTIONS unsupported 
name")
+  }
+
+  test("describe function") {
+assertEqual("describe function bar", DescribeFunction("bar", 
isExtended = false))
+assertEqual("describe function extended bar", DescribeFunction("bar", 
isExtended = true))
+assertEqual("describe function foo.bar", DescribeFunction("foo.bar", 
isExtended = false))
+assertEqual("describe function extended f.bar", 
DescribeFunction("f.bar", isExtended = true))
+  }
+
+  test("set operations") {
+val a = table("a").select(star())
+val b = table("b").select(star())
+
+assertEqual("select * from a union select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union distinct select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union all select * from b", a.unionAll(b))
+assertEqual("select * from a except select * from b", a.except(b))
+intercept("select * from a except all select * from b", "EXCEPT ALL is 
not supported.")
+assertEqual("select * from a except distinct select * from b", 
a.except(b))
+assertEqual("select * from a intersect select * from b", 
a.intersect(b))
+intercept("select * from a intersect all select * from b", "INTERSECT 
ALL is not supported.")
+assertEqual("select * from a intersect distinct select * from b", 
a.intersect(b))
+  }
+
+  test("common table expressions") {
+def cte(plan: LogicalPlan, namedPlans: (String, LogicalPlan)*): With = 
{
+  val ctes = namedPlans.map {
+case (name, cte) =>
+  name -> SubqueryAlias(name, cte)
+  }.toMap
+  With(plan, ctes)
+}
+assertEqual(
+  "with cte1 as (select * from a) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
table("a").select(star(
+assertEqual(
+  "with cte1 (select 1) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
OneRowRelation.select(1)))
+assertEqual(
+  "with cte1 (select 1), cte2 as (select * from cte1) select * from 
cte2",
+  cte(table("cte2").select(star()),
+"cte1" -> OneRowRelation.select(1),
+"cte2" ->

[GitHub] spark pull request: [SPARK-14110] [CORE] PipedRDD to print the com...

2016-03-23 Thread tejasapatil

GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/11927

[SPARK-14110] [CORE] PipedRDD to print the command ran on non zero exit

## What changes were proposed in this pull request?

In case of failure in subprocess launched in PipedRDD, the failure 
exception reads âSubprocess exited with status XXXâ. Debugging this is not 
easy for users especially if there are multiple pipe() operations in the Spark 
application. 

Changes done:
- Changed the exception message when non-zero exit code is seen
- If the reader and writer threads see exception, simply logging the 
command ran. The current model is to propagate the exception "as is" so that 
upstream Spark logic will take the right action based on what the exception was 
(eg. for fetch failure, it needs to retry; but for some fatal exception, it 
will decide to fail the stage / job). So wrapping the exception with a generic 
exception will not work. Altering the exception message will keep that 
guarantee but that is ugly (plus not all exceptions might have a constructor 
for a string message)

## How was this patch tested?

- Added a new test case
- Ran all existing tests for PipedRDD

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark SPARK-14110-piperdd-failure

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11927.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11927


commit c194d731686e16f9baf062f3a9321572b206fdaa
Author: Tejas Patil 
Date:   2016-03-24T03:49:11Z

PipedRDD to print the command ran on non zero exit

- Changed the exception message when non-zero exit code is seem
- If the reader and writer threads see exception, we simply log the command 
ran. This is done because we want to propagate the exception as is so that 
upstream Spark logic will take the right action based on what the exception was 
(eg. for fetch failure, it needs to retry; but for some fatal exception, it 
will decide to fail the stage / job)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57272283
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/PlanParserSuite.scala
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types.IntegerType
+
+class PlanParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, plan: LogicalPlan): Unit = {
+comparePlans(parsePlan(sqlCommand), plan)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parsePlan(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("case insensitive") {
+val plan = table("a").select(star())
+assertEqual("sELEct * FroM a", plan)
+assertEqual("select * fRoM a", plan)
+assertEqual("SELECT * FROM a", plan)
+  }
+
+  test("show functions") {
+assertEqual("show functions", ShowFunctions(None, None))
+assertEqual("show functions foo", ShowFunctions(None, Some("foo")))
+assertEqual("show functions foo.bar", ShowFunctions(Some("foo"), 
Some("bar")))
+assertEqual("show functions 'foo.*'", ShowFunctions(None, 
Some("foo\\.*")))
+intercept("show functions foo.bar.baz", "SHOW FUNCTIONS unsupported 
name")
+  }
+
+  test("describe function") {
+assertEqual("describe function bar", DescribeFunction("bar", 
isExtended = false))
+assertEqual("describe function extended bar", DescribeFunction("bar", 
isExtended = true))
+assertEqual("describe function foo.bar", DescribeFunction("foo.bar", 
isExtended = false))
+assertEqual("describe function extended f.bar", 
DescribeFunction("f.bar", isExtended = true))
+  }
+
+  test("set operations") {
+val a = table("a").select(star())
+val b = table("b").select(star())
+
+assertEqual("select * from a union select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union distinct select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union all select * from b", a.unionAll(b))
+assertEqual("select * from a except select * from b", a.except(b))
+intercept("select * from a except all select * from b", "EXCEPT ALL is 
not supported.")
+assertEqual("select * from a except distinct select * from b", 
a.except(b))
+assertEqual("select * from a intersect select * from b", 
a.intersect(b))
+intercept("select * from a intersect all select * from b", "INTERSECT 
ALL is not supported.")
+assertEqual("select * from a intersect distinct select * from b", 
a.intersect(b))
+  }
+
+  test("common table expressions") {
+def cte(plan: LogicalPlan, namedPlans: (String, LogicalPlan)*): With = 
{
+  val ctes = namedPlans.map {
+case (name, cte) =>
+  name -> SubqueryAlias(name, cte)
+  }.toMap
+  With(plan, ctes)
+}
+assertEqual(
+  "with cte1 as (select * from a) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
table("a").select(star(
+assertEqual(
+  "with cte1 (select 1) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
OneRowRelation.select(1)))
+assertEqual(
+  "with cte1 (select 1), cte2 as (select * from cte1) select * from 
cte2",
+  cte(table("cte2").select(star()),
+"cte1" -> OneRowRelation.select(1),
+"cte2" ->

[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11809#issuecomment-200650936
  
**[Test build #53999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53999/consoleFull)**
 for PR 11809 at commit 
[`81c46c7`](https://github.com/apache/spark/commit/81c46c72117f30679f8d11c908340dc9067a14e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57272248
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/PlanParserSuite.scala
 ---
@@ -0,0 +1,408 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.types.IntegerType
+
+class PlanParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, plan: LogicalPlan): Unit = {
+comparePlans(parsePlan(sqlCommand), plan)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parsePlan(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("case insensitive") {
+val plan = table("a").select(star())
+assertEqual("sELEct * FroM a", plan)
+assertEqual("select * fRoM a", plan)
+assertEqual("SELECT * FROM a", plan)
+  }
+
+  test("show functions") {
+assertEqual("show functions", ShowFunctions(None, None))
+assertEqual("show functions foo", ShowFunctions(None, Some("foo")))
+assertEqual("show functions foo.bar", ShowFunctions(Some("foo"), 
Some("bar")))
+assertEqual("show functions 'foo.*'", ShowFunctions(None, 
Some("foo\\.*")))
+intercept("show functions foo.bar.baz", "SHOW FUNCTIONS unsupported 
name")
+  }
+
+  test("describe function") {
+assertEqual("describe function bar", DescribeFunction("bar", 
isExtended = false))
+assertEqual("describe function extended bar", DescribeFunction("bar", 
isExtended = true))
+assertEqual("describe function foo.bar", DescribeFunction("foo.bar", 
isExtended = false))
+assertEqual("describe function extended f.bar", 
DescribeFunction("f.bar", isExtended = true))
+  }
+
+  test("set operations") {
+val a = table("a").select(star())
+val b = table("b").select(star())
+
+assertEqual("select * from a union select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union distinct select * from b", 
Distinct(a.unionAll(b)))
+assertEqual("select * from a union all select * from b", a.unionAll(b))
+assertEqual("select * from a except select * from b", a.except(b))
+intercept("select * from a except all select * from b", "EXCEPT ALL is 
not supported.")
+assertEqual("select * from a except distinct select * from b", 
a.except(b))
+assertEqual("select * from a intersect select * from b", 
a.intersect(b))
+intercept("select * from a intersect all select * from b", "INTERSECT 
ALL is not supported.")
+assertEqual("select * from a intersect distinct select * from b", 
a.intersect(b))
+  }
+
+  test("common table expressions") {
+def cte(plan: LogicalPlan, namedPlans: (String, LogicalPlan)*): With = 
{
+  val ctes = namedPlans.map {
+case (name, cte) =>
+  name -> SubqueryAlias(name, cte)
+  }.toMap
+  With(plan, ctes)
+}
+assertEqual(
+  "with cte1 as (select * from a) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
table("a").select(star(
+assertEqual(
+  "with cte1 (select 1) select * from cte1",
+  cte(table("cte1").select(star()), "cte1" -> 
OneRowRelation.select(1)))
+assertEqual(
+  "with cte1 (select 1), cte2 as (select * from cte1) select * from 
cte2",
+  cte(table("cte2").select(star()),
+"cte1" -> OneRowRelation.select(1),
+"cte2" ->

[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...

2016-03-23 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11809#issuecomment-200650254
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200646933
  
OK, thanks a lot for your explanation :smile: .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13866][SQL] Handle decimal type in CSV ...

2016-03-23 Thread falaki

Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/11724#discussion_r57272004
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -108,14 +109,38 @@ private[csv] object CSVInferSchema {
   }
 
   private def tryParseDouble(field: String): DataType = {
-if ((allCatch opt field.toDouble).isDefined) {
+val doubleTry = allCatch opt field.toDouble
--- End diff --

I think option 1 is good. If Decimal cannot handle a number, then we resort 
to Double.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57271935
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/ExpressionParserSuite.scala
 ---
@@ -0,0 +1,494 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, _}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Test basic expression parsing. If a type of expression is supported it 
should be tested here.
+ *
+ * Please note that some of the expressions test don't have to be sound 
expressions, only their
+ * structure needs to be valid. Unsound expressions should be caught by 
the Analyzer or
+ * CheckAnalysis classes.
+ */
+class ExpressionParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, e: Expression): Unit = {
+compareExpressions(parseExpression(sqlCommand), e)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parseExpression(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("star expressions") {
+// Global Star
+assertEqual("*", UnresolvedStar(None))
+
+// Targeted Star
+assertEqual("a.b.*", UnresolvedStar(Option(Seq("a", "b"
+  }
+
+  // NamedExpression (Alias/Multialias)
+  test("named expressions") {
+// No Alias
+val r0 = 'a
+assertEqual("a", r0)
+
+// Single Alias.
+val r1 = 'a as "b"
+assertEqual("a as b", r1)
+assertEqual("a b", r1)
+
+// Multi-Alias
+assertEqual("a as (b, c)", MultiAlias('a, Seq("b", "c")))
+assertEqual("a() (b, c)", MultiAlias('a.function(), Seq("b", "c")))
+
+// Numeric literals without a space between the literal qualifier and 
the alias, should not be
+// interpreted as such. An unresolved reference should be returned 
instead.
+// TODO add the JIRA-ticket number.
+assertEqual("1SL", Symbol("1SL"))
+
+// Aliased star is allowed.
+assertEqual("a.* b", UnresolvedStar(Option(Seq("a"))) as 'b)
+  }
+
+  test("binary logical expressions") {
+// And
+assertEqual("a and b", 'a && 'b)
+
+// Or
+assertEqual("a or b", 'a || 'b)
+
+// Combination And/Or check precedence
+assertEqual("a and b or c and d", ('a && 'b) || ('c && 'd))
+assertEqual("a or b or c and d", 'a || 'b || ('c && 'd))
+
+// Multiple AND/OR get converted into a balanced tree
+assertEqual("a or b or c or d or e or f", (('a || 'b) || 'c) || (('d 
|| 'e) || 'f))
+assertEqual("a and b and c and d and e and f", (('a && 'b) && 'c) && 
(('d && 'e) && 'f))
+  }
+
+  test("long binary logical expressions") {
+def testVeryBinaryExpression(op: String, clazz: Class[_]): Unit = {
+  val sql = (1 to 1000).map(x => s"$x == $x").mkString(op)
+  val e = parseExpression(sql)
+  assert(e.collect { case _: EqualTo => true }.size === 1000)
+  assert(e.collect { case x if clazz.isInstance(x) => true }.size === 
999)
+}
+testVeryBinaryExpression(" AND ", classOf[And])
+testVeryBinaryExpression(" OR ", classOf[Or])
+  }
+
+  test("not expressions") {
+assertEqual("not a", !'a)
+assertEqual("!a", !'a)
+assertEqual("not true > true", Not(GreaterThan(true, true)))
+  }

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200644697
  
As I've said above, spark does *not* use the Hadoop configuration from the 
classpath in the executors. It uses the hadoop configuration broadcast from the 
driver.

So no matter what you add to the executor's classpath, it *will not* be 
used.

And in any case, using the configuration present in the submitting node is 
more correct than using whatever configuration might or might not be available 
on the cluster nodes, which was the whole point of uploading the configuration 
archive to the AM in the first place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57271566
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/ExpressionParserSuite.scala
 ---
@@ -0,0 +1,494 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, _}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Test basic expression parsing. If a type of expression is supported it 
should be tested here.
+ *
+ * Please note that some of the expressions test don't have to be sound 
expressions, only their
+ * structure needs to be valid. Unsound expressions should be caught by 
the Analyzer or
+ * CheckAnalysis classes.
+ */
+class ExpressionParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, e: Expression): Unit = {
+compareExpressions(parseExpression(sqlCommand), e)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parseExpression(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("star expressions") {
+// Global Star
+assertEqual("*", UnresolvedStar(None))
+
+// Targeted Star
+assertEqual("a.b.*", UnresolvedStar(Option(Seq("a", "b"
+  }
+
+  // NamedExpression (Alias/Multialias)
+  test("named expressions") {
+// No Alias
+val r0 = 'a
+assertEqual("a", r0)
+
+// Single Alias.
+val r1 = 'a as "b"
+assertEqual("a as b", r1)
+assertEqual("a b", r1)
+
+// Multi-Alias
+assertEqual("a as (b, c)", MultiAlias('a, Seq("b", "c")))
+assertEqual("a() (b, c)", MultiAlias('a.function(), Seq("b", "c")))
+
+// Numeric literals without a space between the literal qualifier and 
the alias, should not be
+// interpreted as such. An unresolved reference should be returned 
instead.
+// TODO add the JIRA-ticket number.
+assertEqual("1SL", Symbol("1SL"))
+
+// Aliased star is allowed.
+assertEqual("a.* b", UnresolvedStar(Option(Seq("a"))) as 'b)
+  }
+
+  test("binary logical expressions") {
+// And
+assertEqual("a and b", 'a && 'b)
+
+// Or
+assertEqual("a or b", 'a || 'b)
+
+// Combination And/Or check precedence
+assertEqual("a and b or c and d", ('a && 'b) || ('c && 'd))
+assertEqual("a or b or c and d", 'a || 'b || ('c && 'd))
+
+// Multiple AND/OR get converted into a balanced tree
+assertEqual("a or b or c or d or e or f", (('a || 'b) || 'c) || (('d 
|| 'e) || 'f))
+assertEqual("a and b and c and d and e and f", (('a && 'b) && 'c) && 
(('d && 'e) && 'f))
+  }
+
+  test("long binary logical expressions") {
+def testVeryBinaryExpression(op: String, clazz: Class[_]): Unit = {
+  val sql = (1 to 1000).map(x => s"$x == $x").mkString(op)
+  val e = parseExpression(sql)
+  assert(e.collect { case _: EqualTo => true }.size === 1000)
+  assert(e.collect { case x if clazz.isInstance(x) => true }.size === 
999)
+}
+testVeryBinaryExpression(" AND ", classOf[And])
+testVeryBinaryExpression(" OR ", classOf[Or])
+  }
+
+  test("not expressions") {
+assertEqual("not a", !'a)
+assertEqual("!a", !'a)
+assertEqual("not true > true", Not(GreaterThan(true, true)))
+  }

[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11916#issuecomment-200643122
  
**[Test build #53998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53998/consoleFull)**
 for PR 11916 at commit 
[`7c033b6`](https://github.com/apache/spark/commit/7c033b6d6dd7eb1d9296d82a965facec95dd6757).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13343] [CORE] speculative tasks that di...

2016-03-23 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/11916#issuecomment-200641412
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57271266
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/ExpressionParserSuite.scala
 ---
@@ -0,0 +1,494 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, _}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Test basic expression parsing. If a type of expression is supported it 
should be tested here.
+ *
+ * Please note that some of the expressions test don't have to be sound 
expressions, only their
+ * structure needs to be valid. Unsound expressions should be caught by 
the Analyzer or
+ * CheckAnalysis classes.
+ */
+class ExpressionParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, e: Expression): Unit = {
+compareExpressions(parseExpression(sqlCommand), e)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parseExpression(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("star expressions") {
+// Global Star
+assertEqual("*", UnresolvedStar(None))
+
+// Targeted Star
+assertEqual("a.b.*", UnresolvedStar(Option(Seq("a", "b"
+  }
+
+  // NamedExpression (Alias/Multialias)
+  test("named expressions") {
+// No Alias
+val r0 = 'a
+assertEqual("a", r0)
+
+// Single Alias.
+val r1 = 'a as "b"
+assertEqual("a as b", r1)
+assertEqual("a b", r1)
+
+// Multi-Alias
+assertEqual("a as (b, c)", MultiAlias('a, Seq("b", "c")))
+assertEqual("a() (b, c)", MultiAlias('a.function(), Seq("b", "c")))
+
+// Numeric literals without a space between the literal qualifier and 
the alias, should not be
+// interpreted as such. An unresolved reference should be returned 
instead.
+// TODO add the JIRA-ticket number.
+assertEqual("1SL", Symbol("1SL"))
+
+// Aliased star is allowed.
+assertEqual("a.* b", UnresolvedStar(Option(Seq("a"))) as 'b)
+  }
+
+  test("binary logical expressions") {
+// And
+assertEqual("a and b", 'a && 'b)
+
+// Or
+assertEqual("a or b", 'a || 'b)
+
+// Combination And/Or check precedence
+assertEqual("a and b or c and d", ('a && 'b) || ('c && 'd))
+assertEqual("a or b or c and d", 'a || 'b || ('c && 'd))
+
+// Multiple AND/OR get converted into a balanced tree
+assertEqual("a or b or c or d or e or f", (('a || 'b) || 'c) || (('d 
|| 'e) || 'f))
+assertEqual("a and b and c and d and e and f", (('a && 'b) && 'c) && 
(('d && 'e) && 'f))
+  }
+
+  test("long binary logical expressions") {
+def testVeryBinaryExpression(op: String, clazz: Class[_]): Unit = {
+  val sql = (1 to 1000).map(x => s"$x == $x").mkString(op)
+  val e = parseExpression(sql)
+  assert(e.collect { case _: EqualTo => true }.size === 1000)
+  assert(e.collect { case x if clazz.isInstance(x) => true }.size === 
999)
+}
+testVeryBinaryExpression(" AND ", classOf[And])
+testVeryBinaryExpression(" OR ", classOf[Or])
+  }
+
+  test("not expressions") {
+assertEqual("not a", !'a)
+assertEqual("!a", !'a)
+assertEqual("not true > true", Not(GreaterThan(true, true)))
+  }

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11926#issuecomment-200640177
  
**[Test build #53997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53997/consoleFull)**
 for PR 11926 at commit 
[`5912fd3`](https://github.com/apache/spark/commit/5912fd3794e0f62817a4c44025de09bfcec4c944).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200640251
  
Thanks a lot for your explanation.

I'm not sure if I understand correctly, currently we will add 
`/etc/hadoop` into the classpath by default for AM and executors. 
And now if we add `__spark_conf__` into classpath of executors, there will be 
another copy of hadoop conf, and we create `Configuration()` at executor start, 
which will add some specific configurations like s3 and `spark.hadoop.xxx`.

If the two copies, one in cluster's hadoop home and one send from client, 
has difference, not sure if there's any side-effect.

It's just my concern, we haven't yet met such issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13713][SQL] Migrate parser from ANTLR3 ...

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11557#discussion_r57271242
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ng/ExpressionParserSuite.scala
 ---
@@ -0,0 +1,494 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.parser.ng
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.{UnresolvedAttribute, _}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+
+/**
+ * Test basic expression parsing. If a type of expression is supported it 
should be tested here.
+ *
+ * Please note that some of the expressions test don't have to be sound 
expressions, only their
+ * structure needs to be valid. Unsound expressions should be caught by 
the Analyzer or
+ * CheckAnalysis classes.
+ */
+class ExpressionParserSuite extends PlanTest {
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  def assertEqual(sqlCommand: String, e: Expression): Unit = {
+compareExpressions(parseExpression(sqlCommand), e)
+  }
+
+  def intercept(sqlCommand: String, messages: String*): Unit = {
+val e = intercept[ParseException](parseExpression(sqlCommand))
+messages.foreach { message =>
+  assert(e.message.contains(message))
+}
+  }
+
+  test("star expressions") {
+// Global Star
+assertEqual("*", UnresolvedStar(None))
+
+// Targeted Star
+assertEqual("a.b.*", UnresolvedStar(Option(Seq("a", "b"
+  }
+
+  // NamedExpression (Alias/Multialias)
+  test("named expressions") {
+// No Alias
+val r0 = 'a
+assertEqual("a", r0)
+
+// Single Alias.
+val r1 = 'a as "b"
+assertEqual("a as b", r1)
+assertEqual("a b", r1)
+
+// Multi-Alias
+assertEqual("a as (b, c)", MultiAlias('a, Seq("b", "c")))
+assertEqual("a() (b, c)", MultiAlias('a.function(), Seq("b", "c")))
+
+// Numeric literals without a space between the literal qualifier and 
the alias, should not be
+// interpreted as such. An unresolved reference should be returned 
instead.
+// TODO add the JIRA-ticket number.
+assertEqual("1SL", Symbol("1SL"))
+
+// Aliased star is allowed.
+assertEqual("a.* b", UnresolvedStar(Option(Seq("a"))) as 'b)
+  }
+
+  test("binary logical expressions") {
+// And
+assertEqual("a and b", 'a && 'b)
+
+// Or
+assertEqual("a or b", 'a || 'b)
+
+// Combination And/Or check precedence
+assertEqual("a and b or c and d", ('a && 'b) || ('c && 'd))
+assertEqual("a or b or c and d", 'a || 'b || ('c && 'd))
+
+// Multiple AND/OR get converted into a balanced tree
+assertEqual("a or b or c or d or e or f", (('a || 'b) || 'c) || (('d 
|| 'e) || 'f))
+assertEqual("a and b and c and d and e and f", (('a && 'b) && 'c) && 
(('d && 'e) && 'f))
+  }
+
+  test("long binary logical expressions") {
+def testVeryBinaryExpression(op: String, clazz: Class[_]): Unit = {
+  val sql = (1 to 1000).map(x => s"$x == $x").mkString(op)
+  val e = parseExpression(sql)
+  assert(e.collect { case _: EqualTo => true }.size === 1000)
+  assert(e.collect { case x if clazz.isInstance(x) => true }.size === 
999)
+}
+testVeryBinaryExpression(" AND ", classOf[And])
+testVeryBinaryExpression(" OR ", classOf[Or])
+  }
+
+  test("not expressions") {
+assertEqual("not a", !'a)
+assertEqual("!a", !'a)
+assertEqual("not true > true", Not(GreaterThan(true, true)))
+  }

[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...

2016-03-23 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/11926

[SPARK-14111][SQL] Correct output nullability with constraints for logical 
plans

## What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14111

We use output of logical plan as schema. Output nullablity is important and 
we should keep its correctness.

With constraints and optimization, in fact we will change output 
nullability of logical plans. But we don't reflect such changes in the output 
attributes. So the output nullablity is not correct now.

## How was this patch tested?

Modified `InferFiltersFromConstraintsSuite` and `AnalysisSuite`. Existing 
tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 
output-nullable-with-constraint3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11926


commit fb6c8cd182dae1aded07baf59cb185f1afde84e7
Author: Liang-Chi Hsieh 
Date:   2016-03-15T09:01:04Z

Modify output nullable with constraint for Join.

commit 2e4eca4c8336da790321569709475eaff8f193b5
Author: Liang-Chi Hsieh 
Date:   2016-03-15T09:51:11Z

Replace attributes in condition with correct ones.

commit aef73d5e71b9c997ac98a7172036c7c79f9e9b1c
Author: Liang-Chi Hsieh 
Date:   2016-03-16T04:32:41Z

Refactor.

commit 5bf4b4b544ef2aa25d93c974e94f8314a6626ef7
Author: Liang-Chi Hsieh 
Date:   2016-03-16T04:47:27Z

Refactor.

commit 93a73b79fbeee9c1bd722f5aa66257409d3c2512
Author: Liang-Chi Hsieh 
Date:   2016-03-16T04:59:28Z

Modify output nullability with constraints for Filter operator.

commit d85e8381f9b61d68288f8009ffa35b2cfd1ed771
Author: Liang-Chi Hsieh 
Date:   2016-03-16T07:12:43Z

Merge remote-tracking branch 'upstream/master' into 
output-nullable-with-constraint

commit c7d54a0fb78c826903c0db8f1b1ac7b0d54bb303
Author: Liang-Chi Hsieh 
Date:   2016-03-16T07:30:08Z

Fix a bug.

commit 76a8566e9506e6a41107c8cb244a76d4525b7a44
Author: Liang-Chi Hsieh 
Date:   2016-03-16T10:04:32Z

Fix test.

commit bbced69407e1d400fb66b684c9c6529b96f15d88
Author: Liang-Chi Hsieh 
Date:   2016-03-17T03:07:05Z

Merge remote-tracking branch 'upstream/master' into 
output-nullable-with-constraint

Conflicts:

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala

commit 6b4a98cdd44a447e6ab8c3aee908647243e62449
Author: Liang-Chi Hsieh 
Date:   2016-03-17T07:54:05Z

fix bug.

commit 665bf50880b3f87353c5037946db28f8dda7d1c2
Author: Liang-Chi Hsieh 
Date:   2016-03-17T11:41:35Z

Fix scala style.

commit ac795610e2091a1534b80e7eea01630c4cb5deb4
Author: Liang-Chi Hsieh 
Date:   2016-03-18T07:45:12Z

Fix it.

commit 7f68967eeb3f303c552dadc760788d3fe9d090f5
Author: Liang-Chi Hsieh 
Date:   2016-03-19T04:33:06Z

Modify attribute nullability for filter pushdown.

commit a7b8daef9e82e184226f101a5fd81fcc070dc25c
Author: Liang-Chi Hsieh 
Date:   2016-03-21T04:32:25Z

Reset nullabilty for project and filter list in preparing scaning in memory 
relation.

commit 23b328d1c01806841943ad8dd0ab3eed8963d7e2
Author: Liang-Chi Hsieh 
Date:   2016-03-21T05:02:21Z

Unnecessary change removed.

commit da3f35b4d315cc3c2576ac781cd3e8beef5eb774
Author: Liang-Chi Hsieh 
Date:   2016-03-21T08:16:21Z

Fix python test.

commit cdc5878e4ccb98f87b4496b85a2ef95e1722a1f6
Author: Liang-Chi Hsieh 
Date:   2016-03-24T03:09:45Z

Correct output nullability of logical plans.

commit 5912fd3794e0f62817a4c44025de09bfcec4c944
Author: Liang-Chi Hsieh 
Date:   2016-03-24T03:31:16Z

Merge remote-tracking branch 'upstream/master' into 
output-nullable-with-constraint3

Conflicts:

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail:

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200638308
  
There's no "several paths". Spark will broadcast the hadoop configs before 
running tasks and use that in the executors, so Spark won't use whatever is in 
the executor's classpath anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14085] [SQL] Star Expansion for Hash

2016-03-23 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11904#issuecomment-200636903
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200637889
  
My concern is about hadoop related configurations, who will take the 
precedence if several paths have different configurations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14085] [SQL] Star Expansion for Hash

2016-03-23 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/11904#issuecomment-200637711
  
Thank you! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14062][Yarn] Fix log4j and upload metri...

2016-03-23 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/11885#issuecomment-200635731
  
I don't think there's any harm in using the archive everywhere; it's 
currently only used in the AM mostly as an optimization, since it wasn't really 
used in the executors (aside from the oversight of log4j.properties).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...

2016-03-23 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11301#issuecomment-200635290
  
**[Test build #53996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53996/consoleFull)**
 for PR 11301 at commit 
[`078b272`](https://github.com/apache/spark/commit/078b272767ea653d9cced56b4e928b81aaabfb8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...

2016-03-23 Thread viirya

Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11722#issuecomment-200637490
  
I come up another approach for this problem. Close this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 863 matches

Mail list logo