[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...

2017-05-24 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/18067#discussion_r118423872
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -776,6 +778,20 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
 head(predict(isoregModel, newDF))
 ```
 
+ Decision Tree
+
+`spark.decisionTree` fits a [decision 
tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classification or 
regression model on a `SparkDataFrame`.
+Users can call `summary` to get a summary of the fitted model, `predict` 
to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
+
+We use the `longley` dataset to train a decision tree and make predictions:
+
+```{r}
+df <- createDataFrame(longley)
--- End diff --

option 2: do you mean using {r, warning=FALSE}` like other examples?
I think both are OK,.
which do you prefer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/16578
  
Sorry about test failures. Will fix tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
@felixcheung All comments are addressed now and I think this is ready for 
review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
- New commit now resolves the Name issue. `@title` does not work, which is 
the header in the second line `\title{Aggregate functions for Column 
operations}`. The solution is to use `@name NULL` for the generics.  Now we 
have:


![image](https://cloud.githubusercontent.com/assets/11082368/26437454/3780b8d4-40d2-11e7-83e9-80eec206f000.png)

- Also added several more practical examples. But most of these functions 
are very straightforward to use. 


![image](https://cloud.githubusercontent.com/assets/11082368/26437488/5be621be-40d2-11e7-8df8-0e5c99fb6ef6.png)
 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18025
  
**[Test build #77341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77341/testReport)**
 for PR 18025 at commit 
[`038eac3`](https://github.com/apache/spark/commit/038eac3a60b330a29fc7099c31913175f6593e3c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77339/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77339 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77339/testReport)**
 for PR 18104 at commit 
[`a72ab8c`](https://github.com/apache/spark/commit/a72ab8c3153743ee5b5d0fe4ba797023aac6e88c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18097: [Spark-20873][SQL] Improve the error message for ...

2017-05-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18097#discussion_r118421574
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala
 ---
@@ -684,7 +684,7 @@ private[columnar] object ColumnType {
   case struct: StructType => STRUCT(struct)
   case udt: UserDefinedType[_] => apply(udt.sqlType)
   case other =>
-throw new Exception(s"Unsupported type: $other")
+throw new Exception(s"Unsupported type: ${other.typeName}")
--- End diff --

`typeName` -> `simpleString`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-24 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18094#discussion_r118421269
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3072,6 +3072,22 @@ object functions {
* @since 2.1.0
*/
   def from_json(e: Column, schema: String, options: java.util.Map[String, 
String]): Column = {
+from_json(e, schema, options.asScala.toMap)
+  }
+
+  /**
+* (Scala-specific) Parses a column containing a JSON string into a 
`StructType` or `ArrayType` of `StructType`s
--- End diff --

ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18094: [Spark-20775][SQL] Added scala support from_json

2017-05-24 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18094#discussion_r118421254
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3060,7 +3060,7 @@ object functions {
 from_json(e, schema, Map.empty[String, String])
 
   /**
-   * Parses a column containing a JSON string into a `StructType` or 
`ArrayType` of `StructType`s
+   * (Java-specific) Parses a column containing a JSON string into a 
`StructType` or `ArrayType` of `StructType`s
--- End diff --

nit: ScalaStyle check will fail saying `File line length exceeds 100 
characters`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18097: [Spark-20873][SQL] Improve the error message for ...

2017-05-24 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18097#discussion_r118421460
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/ColumnTypeSuite.scala
 ---
@@ -144,4 +144,18 @@ class ColumnTypeSuite extends SparkFunSuite with 
Logging {
   ColumnType(DecimalType(19, 0))
 }
   }
+
+  test("show type name in type mismatch error") {
+val invalidType = new DataType {
+override def defaultSize: Int = 1
+override private[spark] def asNullable: DataType = null
--- End diff --

`null` -> `this`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18097: [Spark-20873][SQL] Improve the error message for unsuppo...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18097
  
**[Test build #77340 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77340/testReport)**
 for PR 18097 at commit 
[`f53de3e`](https://github.com/apache/spark/commit/f53de3ea2606ecf5073d2577d0f82feb0671b8a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18097: [Spark-20873][SQL] Improve the error message for unsuppo...

2017-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18097
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...

2017-05-24 Thread santhavathi
Github user santhavathi commented on the issue:

https://github.com/apache/spark/pull/13067
  
Is this feature available yet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...

2017-05-24 Thread santhavathi
Github user santhavathi commented on the issue:

https://github.com/apache/spark/pull/13067
  
Is this feature available yet?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77339 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77339/testReport)**
 for PR 18104 at commit 
[`a72ab8c`](https://github.com/apache/spark/commit/a72ab8c3153743ee5b5d0fe4ba797023aac6e88c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18091
  
**[Test build #77338 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77338/testReport)**
 for PR 18091 at commit 
[`c79de07`](https://github.com/apache/spark/commit/c79de072fd4c0e32f5a62d15f8d921095d4e3bf0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77337/testReport)**
 for PR 18064 at commit 
[`eec0946`](https://github.com/apache/spark/commit/eec0946842657539d69deab43641d32d247f67ec).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77337/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18091
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77337/testReport)**
 for PR 18064 at commit 
[`eec0946`](https://github.com/apache/spark/commit/eec0946842657539d69deab43641d32d247f67ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77336/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77336 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77336/testReport)**
 for PR 18104 at commit 
[`313dcbc`](https://github.com/apache/spark/commit/313dcbc99c408c81d6bd5e5395bb373e1d0f418a).
 * This patch **fails R style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77336/testReport)**
 for PR 18104 at commit 
[`313dcbc`](https://github.com/apache/spark/commit/313dcbc99c408c81d6bd5e5395bb373e1d0f418a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18058
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18058
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77334/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18058
  
**[Test build #77334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77334/testReport)**
 for PR 18058 at commit 
[`44267cb`](https://github.com/apache/spark/commit/44267cb56dafd59fb9a43cd72b18d5c1c2cf0c6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16985: [SPARK-19122][SQL] Unnecessary shuffle+sort added if joi...

2017-05-24 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/16985
  
@cloud-fan : ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17181
  
**[Test build #77335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77335/testReport)**
 for PR 17181 at commit 
[`f8b5eaf`](https://github.com/apache/spark/commit/f8b5eaf37547a77e03a63de7a6b44e3886b38aec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77328/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77328/testReport)**
 for PR 18064 at commit 
[`57f9dde`](https://github.com/apache/spark/commit/57f9dde7d4469bbd7f1e04a04fac2041a2d743e6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...

2017-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17181
  
Could you please check whether there exists any other inconsistent value 
between the UI and JSON API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12252: [SPARK-14460] [SQL] properly handling of column name con...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/12252
  
seems it's fixed in https://github.com/apache/spark/pull/15662 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17181: [SPARK-19824][Core] Standalone master JSON not showing c...

2017-05-24 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17181
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18025
  
re: title, would explicitly adding `@title` help?
re: multiple class - agreed, a link or `@seealso` should be good. wouldn't 
`?coalesce` show the overloads though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77330/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77330 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77330/testReport)**
 for PR 18104 at commit 
[`dab72a6`](https://github.com/apache/spark/commit/dab72a60441e337e9143c4144795a723d5cc0867).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/18058
  
Hi, I'm not familiar with pyspark. I just wonder whether is it needed to 
create a unit test for verification. If yes, how to check it? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (expert)...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18058
  
**[Test build #77334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77334/testReport)**
 for PR 18058 at commit 
[`44267cb`](https://github.com/apache/spark/commit/44267cb56dafd59fb9a43cd72b18d5c1c2cf0c6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18058#discussion_r118416434
  
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
 return self.getOrDefault(self.minSupport)
 
 
+class HasNumPartitions(Params):
+"""
+Mixin for param support.
+"""
+
+numPartitions = Param(
+Params._dummy(),
+"numPartitions",
+"""Number of partitions (at least 1) used by parallel FP-growth.
+By default the param is not set,
+and partition number of the input dataset is used.""",
+typeConverter=TypeConverters.toInt)
+
+def setNumPartitions(self, value):
+"""
+Sets the value of :py:attr:`numPartitions`.
+"""
+return self._set(numPartitions=value)
+
+def getNumPartitions(self):
+"""
+Gets the value of numPartitions or its default value.
--- End diff --

added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18058#discussion_r118416400
  
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
 return self.getOrDefault(self.minSupport)
 
 
+class HasNumPartitions(Params):
+"""
+Mixin for param support.
+"""
+
+numPartitions = Param(
+Params._dummy(),
+"numPartitions",
+"""Number of partitions (at least 1) used by parallel FP-growth.
--- End diff --

replaced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18058#discussion_r118416391
  
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
 return self.getOrDefault(self.minSupport)
 
 
+class HasNumPartitions(Params):
+"""
+Mixin for param support.
--- End diff --

modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17967
  
yes I'd hold this for a day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18057
  
**[Test build #77333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77333/testReport)**
 for PR 18057 at commit 
[`4c68688`](https://github.com/apache/spark/commit/4c68688d3c970a0ca95c5afb6f1a60fb02b14421).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...

2017-05-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18090
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #77332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77332/testReport)**
 for PR 17770 at commit 
[`8314cc3`](https://github.com/apache/spark/commit/8314cc310d9cf5d807a7e9b9de3c962dc37bf3e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #77331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77331/testReport)**
 for PR 17770 at commit 
[`b82b018`](https://github.com/apache/spark/commit/b82b0181c16b64968feaf560eb1422193746efde).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
@felixcheung 
- The links to `stddev_samp` etc are already removed in the latest commit. 
- About collecting all the example into one, I think that'll work for this 
particular one. But I'm not sure about this in general. These methods are still 
spread out in `.R` file. And if we decide to change the grouping of these 
functions later on, it will be very difficult if we don't have examples in 
those methods. 
- For a method that is defined for multiple classes but meaning are 
drastically different, I agree that it's best to document by class. One 
downside is a generic `?coalesce` can only go to one help page,  e.g., the help 
for SparkDataFrame, not the other classed. However, we can add links to the 
`coalesce` methods for the other classes in the `SeeAlso` section. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18091
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77323/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18091
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18057
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77327/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18091: [SPARK-20868][CORE] UnsafeShuffleWriter should verify th...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18091
  
**[Test build #77323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77323/testReport)**
 for PR 18091 at commit 
[`c79de07`](https://github.com/apache/spark/commit/c79de072fd4c0e32f5a62d15f8d921095d4e3bf0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18057
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18057
  
**[Test build #77327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77327/testReport)**
 for PR 18057 at commit 
[`eaf236a`](https://github.com/apache/spark/commit/eaf236af538d4f3454d598dc5ba5a254e12647d6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17972: [SPARK-20723][ML]Add intermediate storage level to tree ...

2017-05-24 Thread phatak-dev
Github user phatak-dev commented on the issue:

https://github.com/apache/spark/pull/17972
  
@MLnick can you start a jenkins build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test runs

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18104
  
**[Test build #77330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77330/testReport)**
 for PR 18104 at commit 
[`dab72a6`](https://github.com/apache/spark/commit/dab72a60441e337e9143c4144795a723d5cc0867).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17967
  
@felixcheung @yanboliang I'm fine with either the ascii table or the html 
table. It's your call. 
Hope to get over this minor doc issue and get this PR in soon. I can update 
the doc later if we find a better way. Thanks much. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18104: [SPARK-20877][SPARKR][WIP] add timestamps to test...

2017-05-24 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/18104

[SPARK-20877][SPARKR][WIP] add timestamps to test runs

## What changes were proposed in this pull request?

to investigate how long they run

## How was this patch tested?

Jenkins, AppVeyor

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rtimetest

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18104


commit dab72a60441e337e9143c4144795a723d5cc0867
Author: Felix Cheung 
Date:   2017-05-25T03:53:24Z

timestamp tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77322/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16578
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77326/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16578
  
**[Test build #77326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77326/testReport)**
 for PR 16578 at commit 
[`9f2f340`](https://github.com/apache/spark/commit/9f2f3409172ba09d15494f9faf861bb6ad683911).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18079: [SPARK-20841][SQL] Support column aliases for catalog ta...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18079
  
**[Test build #77322 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77322/testReport)**
 for PR 18079 at commit 
[`b0e5805`](https://github.com/apache/spark/commit/b0e5805951471bb6bb8da98af75e99ac3057bc63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77321/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #77321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77321/testReport)**
 for PR 16989 at commit 
[`b07a3b6`](https://github.com/apache/spark/commit/b07a3b61ba483989b2c205e88cf9fdc73a4205df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class FileSegmentManagedBuffer extends ManagedBuffer `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118413353
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -124,11 +136,13 @@ private[spark] class LiveListenerBus(val 
sparkContext: SparkContext) extends Spa
   logError(s"$name has already stopped! Dropping event $event")
   return
 }
+metrics.numEventsReceived.inc()
 val eventAdded = eventQueue.offer(event)
 if (eventAdded) {
   eventLock.release()
 } else {
   onDropEvent(event)
+  metrics.numDroppedEvents.inc()
--- End diff --

is it better to move this to `onDropEvent`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118413314
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -124,11 +136,13 @@ private[spark] class LiveListenerBus(val 
sparkContext: SparkContext) extends Spa
   logError(s"$name has already stopped! Dropping event $event")
   return
 }
+metrics.numEventsReceived.inc()
--- End diff --

here we also count dropped events?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118413299
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -226,3 +240,34 @@ private[spark] object LiveListenerBus {
   val name = "SparkListenerBus"
 }
 
+private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) 
extends Source {
+  override val sourceName: String = "LiveListenerBus"
+  override val metricRegistry: MetricRegistry = new MetricRegistry
+
+  /**
+   * The total number of events posted to the LiveListenerBus. This counts 
the number of times
+   * that `post()` is called, which might be less than the total number of 
events processed in
+   * case events are dropped.
--- End diff --

according to the code, we also count dropped events, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118413115
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -226,3 +240,34 @@ private[spark] object LiveListenerBus {
   val name = "SparkListenerBus"
 }
 
+private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) 
extends Source {
+  override val sourceName: String = "LiveListenerBus"
+  override val metricRegistry: MetricRegistry = new MetricRegistry
+
+  /**
+   * The total number of events posted to the LiveListenerBus. This counts 
the number of times
+   * that `post()` is called, which might be less than the total number of 
events processed in
+   * case events are dropped.
+   */
+  val numEventsReceived: Counter = 
metricRegistry.counter(MetricRegistry.name("numEventsReceived"))
+
+  /**
+   * The total number of events that were dropped without being delivered 
to listeners.
+   */
+  val numDroppedEvents: Counter = 
metricRegistry.counter(MetricRegistry.name("numEventsDropped"))
+
+  /**
+   * The amount of time taken to post a single event to all listeners.
+   */
+  val eventProcessingTime: Timer = 
metricRegistry.timer(MetricRegistry.name("eventProcessingTime"))
+
+  /**
+   * The number of of messages waiting in the queue.
+   */
+  val queueSize: Gauge[Int] = {
--- End diff --

do we need this metric? Users can easily get it by looking at the 
`spark.scheduler.listenerbus.eventqueue.size` config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118413024
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -226,3 +240,34 @@ private[spark] object LiveListenerBus {
   val name = "SparkListenerBus"
 }
 
+private[spark] class LiveListenerBusMetrics(queue: LinkedBlockingQueue[_]) 
extends Source {
+  override val sourceName: String = "LiveListenerBus"
+  override val metricRegistry: MetricRegistry = new MetricRegistry
+
+  /**
+   * The total number of events posted to the LiveListenerBus. This counts 
the number of times
+   * that `post()` is called, which might be less than the total number of 
events processed in
+   * case events are dropped.
+   */
+  val numEventsReceived: Counter = 
metricRegistry.counter(MetricRegistry.name("numEventsReceived"))
+
+  /**
+   * The total number of events that were dropped without being delivered 
to listeners.
+   */
+  val numDroppedEvents: Counter = 
metricRegistry.counter(MetricRegistry.name("numEventsDropped"))
+
+  /**
+   * The amount of time taken to post a single event to all listeners.
+   */
+  val eventProcessingTime: Timer = 
metricRegistry.timer(MetricRegistry.name("eventProcessingTime"))
+
+  /**
+   * The number of of messages waiting in the queue.
--- End diff --

nit: double `of` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18083: [SPARK-20863] Add metrics/instrumentation to Live...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18083#discussion_r118412901
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
@@ -33,25 +37,24 @@ import org.apache.spark.util.Utils
  * has started will events be actually propagated to all attached 
listeners. This listener bus
  * is stopped when `stop()` is called, and it will drop further events 
after stopping.
  */
-private[spark] class LiveListenerBus(val sparkContext: SparkContext) 
extends SparkListenerBus {
+private[spark] class LiveListenerBus(conf: SparkConf) extends 
SparkListenerBus {
 
   self =>
 
   import LiveListenerBus._
 
+  private var sparkContext: SparkContext = _
+
   // Cap the capacity of the event queue so we get an explicit error 
(rather than
   // an OOM exception) if it's perpetually being added to more quickly 
than it's being drained.
-  private lazy val EVENT_QUEUE_CAPACITY = validateAndGetQueueSize()
-  private lazy val eventQueue = new 
LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY)
-
-  private def validateAndGetQueueSize(): Int = {
-val queueSize = sparkContext.conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE)
-if (queueSize <= 0) {
-  throw new 
SparkException("spark.scheduler.listenerbus.eventqueue.size must be > 0!")
-}
-queueSize
+  private val eventQueue = {
+val capacity = conf.get(LISTENER_BUS_EVENT_QUEUE_SIZE)
+require(capacity > 0, s"${LISTENER_BUS_EVENT_QUEUE_SIZE.key} must be > 
0!")
--- End diff --

this constraint can be put in `LISTENER_BUS_EVENT_QUEUE_SIZE` with 
`TypedConfigBuilder.checkValue`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18058: [SPARK-20768][PYSPARK][ML] Expose numPartitions (...

2017-05-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18058#discussion_r118412802
  
--- Diff: python/pyspark/ml/fpm.py ---
@@ -49,6 +49,32 @@ def getMinSupport(self):
 return self.getOrDefault(self.minSupport)
 
 
+class HasNumPartitions(Params):
+"""
+Mixin for param support.
+"""
+
+numPartitions = Param(
+Params._dummy(),
+"numPartitions",
+"""Number of partitions (at least 1) used by parallel FP-growth.
--- End diff --

does this need to be scrubbed ? I think we have `"""` everywhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18101
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77320/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18101
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17967: [SPARK-14659][ML] RFormula consistent with R when handli...

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17967
  
given that I think I'm ok with an ascii table as a one time thing.
thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18101: [SPARK-20874][Examples]Add Structured Streaming Kafka So...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18101
  
**[Test build #77320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77320/testReport)**
 for PR 18101 at commit 
[`e0c758d`](https://github.com/apache/spark/commit/e0c758d05452076ab96177e81e88e0974ef85846).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18025
  
also, since we have an Rd now what you think about collecting all the 
example into one - that should eliminate all the `Not run` in every other line.

I think then also this will be a great opportunity to do more than simple 
`head(select(...))` something expanded and more practical? what do you think?

also this https://github.com/apache/spark/pull/18025#issuecomment-303838880

I like this approach - these are my comments from your screen shot - I'll 
review more closely after more changes, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18090
  
**[Test build #77329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77329/testReport)**
 for PR 18090 at commit 
[`1a45ff5`](https://github.com/apache/spark/commit/1a45ff5ced6cffd3d8ed41574df3bdd8e463bc21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18025
  
I guess we don't need link to stddev_samp since it's the same page
shouldn't std_dev and var_samp also on this page?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18090
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18090: [SPARK-20250][Core]Improper OOM error when a task...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18090#discussion_r118412069
  
--- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
---
@@ -184,6 +185,10 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
 break;
   }
 }
+  } catch (ClosedByInterruptException e) {
--- End diff --

surprisingly this is also `IOException`...   good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18025: [WIP][SparkR] Grouped documentation for sql functions

2017-05-24 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18025
  
I think we need to give it a title explicitly - see the header/first line 
of 
https://cloud.githubusercontent.com/assets/11082368/26429381/64dd117e-409b-11e7-9661-659b5fbe8206.png


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18090: [SPARK-20250][Core]Improper OOM error when a task been k...

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18090
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18064
  
**[Test build #77328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77328/testReport)**
 for PR 18064 at commit 
[`57f9dde`](https://github.com/apache/spark/commit/57f9dde7d4469bbd7f1e04a04fac2041a2d743e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...

2017-05-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18067#discussion_r118411586
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -430,7 +430,7 @@ We use `svm` in package `e1071` as an example. We use 
all default settings excep
 costs <- exp(seq(from = log(1), to = log(1000), length.out = 5))
 train <- function(cost) {
   stopifnot(requireNamespace("e1071", quietly = TRUE))
-  model <- e1071::svm(Species ~ ., data = iris, cost = cost)
+  model <- e1071::svm(Species ~ Sepal.Length + Sepal.Width + Petal.Length 
+ Petal.Width, data = iris, cost = cost)
--- End diff --

this isn't reverted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18067: [SPARK-20849][DOC][SPARKR] Document R DecisionTre...

2017-05-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18067#discussion_r118411791
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -776,6 +778,20 @@ newDF <- createDataFrame(data.frame(x = c(1.5, 3.2)))
 head(predict(isoregModel, newDF))
 ```
 
+ Decision Tree
+
+`spark.decisionTree` fits a [decision 
tree](https://en.wikipedia.org/wiki/Decision_tree_learning) classification or 
regression model on a `SparkDataFrame`.
+Users can call `summary` to get a summary of the fitted model, `predict` 
to make predictions, and `write.ml`/`read.ml` to save/load fitted models.
+
+We use the `longley` dataset to train a decision tree and make predictions:
+
+```{r}
+df <- createDataFrame(longley)
--- End diff --

as commented, before, please check. I'm pretty sure 
`createDataFrame(longley)` will cause a warning
```
longley
 GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
1947 83.0 234.289  235.6159.0107.608 1947   60.323
1948 88.5 259.426  232.5145.6108.632 1948   61.122
```
so our options are:
- don't use longley (my earlier suggestion)
- use longley but keep `warning=FALSE`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77319/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16989
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #77319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77319/testReport)**
 for PR 16989 at commit 
[`188862e`](https://github.com/apache/spark/commit/188862e1a8f80c5147f504ff931ce427ba7c9084).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public final class FileSegmentManagedBuffer extends ManagedBuffer `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations...

2017-05-24 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18064#discussion_r118410964
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriter.scala
 ---
@@ -86,12 +86,10 @@ private[kafka010] object KafkaWriter extends Logging {
   topic: Option[String] = None): Unit = {
 val schema = queryExecution.analyzed.output
 validateQuery(queryExecution, kafkaParameters, topic)
-SQLExecution.withNewExecutionId(sparkSession, queryExecution) {
--- End diff --

If you mean `KafkaSourceProvider`, is it the same code path as `KafkaSink`? 
In `KafkaSink.addBath`, `KafkaWriter.write` is also called to write into Kafka. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-24 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17113
  
Thanks @tgravescs , I will update the code soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18065
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18065: [SPARK-20844] Remove experimental from Structured Stream...

2017-05-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18065
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77317/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >