[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87875/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20713
  
**[Test build #87875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87875/testReport)**
 for PR 20713 at commit 
[`fd538ca`](https://github.com/apache/spark/commit/fd538ca84936549af623d3678d43ae935a6549e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20715
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20700: [SPARK-23546][SQL] Refactor stateless methods/val...

2018-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20700#discussion_r171773544
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -537,13 +537,38 @@ class CodegenContext {
 extraClasses.append(code)
   }
 
-  final val JAVA_BOOLEAN = "boolean"
-  final val JAVA_BYTE = "byte"
-  final val JAVA_SHORT = "short"
-  final val JAVA_INT = "int"
-  final val JAVA_LONG = "long"
-  final val JAVA_FLOAT = "float"
-  final val JAVA_DOUBLE = "double"
+  /**
+   * Returns true if the Java type has a special accessor and setter in 
[[InternalRow]].
--- End diff --

This comment is copied. But seems not totally correct. `InternalRow` also 
has special accessor for `Decimal`. But it is not a primitive type here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20700: [SPARK-23546][SQL] Refactor stateless methods/val...

2018-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20700#discussion_r171774581
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -734,23 +734,26 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
 val keyToStringFunc = dataToStringFunc("keyToString", kt)
 val valueToStringFunc = dataToStringFunc("valueToString", vt)
 val loopIndex = ctx.freshName("loopIndex")
+val getValueMapKeyArray0 = CodeGenerator.getValue(s"$map.keyArray()", 
kt, "0")
+val getValueMapValArray0 = 
CodeGenerator.getValue(s"$map.valueArray()", vt, "0")
+val getValueMapKeyArray = CodeGenerator.getValue(s"$map.keyArray()", 
kt, loopIndex)
+val getValueMapValArray = CodeGenerator.getValue(s"$map.valueArray()", 
vt, loopIndex)
--- End diff --

`getMapKeyArray` and `getMapValueArray`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87874/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20700: [SPARK-23546][SQL] Refactor stateless methods/val...

2018-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20700#discussion_r171774915
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -833,8 +669,8 @@ class CodegenContext {
   } else if ($isNullB) {
 return 1;
   } else {
-${javaType(elementType)} $elementA = ${getValue("a", 
elementType, "i")};
-${javaType(elementType)} $elementB = ${getValue("b", 
elementType, "i")};
+$jt $elementA = ${CodeGenerator.getValue("a", elementType, 
"i")};
+$jt $elementB = ${CodeGenerator.getValue("b", elementType, 
"i")};
--- End diff --

Inside `CodegenContext`, I think we can just do `import CodeGenerator._`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20700: [SPARK-23546][SQL] Refactor stateless methods/val...

2018-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20700#discussion_r171774634
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -734,23 +734,26 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
 val keyToStringFunc = dataToStringFunc("keyToString", kt)
 val valueToStringFunc = dataToStringFunc("valueToString", vt)
 val loopIndex = ctx.freshName("loopIndex")
+val getValueMapKeyArray0 = CodeGenerator.getValue(s"$map.keyArray()", 
kt, "0")
+val getValueMapValArray0 = 
CodeGenerator.getValue(s"$map.valueArray()", vt, "0")
--- End diff --

`getMapFirstKey` and `getMapFirstValue`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87873/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20715
  
**[Test build #87874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87874/testReport)**
 for PR 20715 at commit 
[`314fae2`](https://github.com/apache/spark/commit/314fae2d36a3b0916fd6e04713a923f1a6f203c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20714
  
**[Test build #87873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87873/testReport)**
 for PR 20714 at commit 
[`d15eba7`](https://github.com/apache/spark/commit/d15eba754a59721bc7d9cdc7d374f2f323d21e41).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sandecho
Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
@sethah Would you recommended closing this one and opening the previous one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20700
  
**[Test build #87881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87881/testReport)**
 for PR 20700 at commit 
[`9fed753`](https://github.com/apache/spark/commit/9fed75338b9e3dc6e834407c177decc2f38c8743).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho
Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
Thanks @wbstclair . That's a good suggestion. Although I would have to take 
it to ML from MLLIB, rest will be the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20700: [SPARK-23546][SQL] Refactor stateless methods/values in ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1224/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread wbstclair
Github user wbstclair commented on the issue:

https://github.com/apache/spark/pull/20708
  
Sorry for being out of touch! I have been busy with work. I will look at
your code soon. I also have an implementation which adheres to the style
which is a decent first draft, but needs to be tested. I think we can build
a decent offering out of the two.

On Mar 1, 2018 9:54 PM, "Sandeep Kumar Choudhary" 
wrote:

> @sethah  Thank you. I accept your
> recommendation. I will take it to ML. Secondly I have written unit tests
> and I have also adhere to style guidelines. But my concern is that no one
> is having a discussion on the JIRA. Even the creator of the JIRA
> @wbstclair  is not reachable.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20681: [SPARK-23518][SQL] Avoid metastore access when th...

2018-03-01 Thread liufengdb
Github user liufengdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/20681#discussion_r171770982
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -67,6 +67,8 @@ sparkSession <- if (windows_with_hadoop()) {
 sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
   }
 sc <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
"getJavaSparkContext", sparkSession)
+# materialize the catalog implementation
+listTables()
--- End diff --

`test_sparkSQL.R` is the only one uses 
`newJObject("org.apache.spark.sql.hive.test.TestHiveContext", ssc, FALSE)` on 
the `ssc`, so the catalog impl spark conf is changed. So ``test_sparkSQL.R` is 
the only one broken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87880/testReport)**
 for PR 19222 at commit 
[`cf2d532`](https://github.com/apache/spark/commit/cf2d532ae9c8688ef314a51a89c76abe2fd5d857).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1223/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20682
  
**[Test build #87879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87879/testReport)**
 for PR 20682 at commit 
[`4dac3a7`](https://github.com/apache/spark/commit/4dac3a7e2fe3fd058b4c771039da02e2565108a1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20682: [SPARK-23522][Python] always use sys.exit over builtin e...

2018-03-01 Thread benjaminp
Github user benjaminp commented on the issue:

https://github.com/apache/spark/pull/20682
  
I've now fixed the style as you suggest and got `python/run-tests` passing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sandecho
Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20708
  
@sethah Thank you. I accept your recommendation. I will take it to ML. 
Secondly I have written unit tests and I have also adhere to style guidelines. 
But my concern is that no one is having a discussion on the JIRA. Even the 
creator of the JIRA @wbstclair is not reachable.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87867/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87866/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20327
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20327: [SPARK-12963][CORE] NM host for driver end points

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20327
  
**[Test build #87866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87866/testReport)**
 for PR 20327 at commit 
[`a674863`](https://github.com/apache/spark/commit/a674863b8243fddc065270d292a6be18e38fbc30).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87867/testReport)**
 for PR 20681 at commit 
[`d0eacc2`](https://github.com/apache/spark/commit/d0eacc2048cf07193aca20f8011b677099884278).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveMetastoreLazyInitializationSuite extends SparkFunSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sandecho
Github user sandecho commented on the issue:

https://github.com/apache/spark/pull/20709
  
Actually the previous pull request was not able to merge. So, I opened a 
new pull request. Can you please run the test? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87878/testReport)**
 for PR 19222 at commit 
[`c9f401a`](https://github.com/apache/spark/commit/c9f401ab5ff2a94950dd57643d2192b47c175d3a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87878/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1222/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87877/testReport)**
 for PR 20647 at commit 
[`1164eec`](https://github.com/apache/spark/commit/1164eecb55d2120bc71c059ea6944ea38d13d300).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #87878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87878/testReport)**
 for PR 19222 at commit 
[`c9f401a`](https://github.com/apache/spark/commit/c9f401ab5ff2a94950dd57643d2192b47c175d3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20706
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87863/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20706
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r171767913
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StringFormat.scala
 ---
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2.DataSourceV2
+import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.util.Utils
+
+/**
+ * A trait that can be used by data source v2 related query plans(both 
logical and physical), to
+ * provide a string format of the data source information for explain.
+ */
+trait DataSourceV2StringFormat {
+
+  /**
+   * The instance of this data source implementation. Note that we only 
consider its class in
+   * equals/hashCode, not the instance itself.
+   */
+  def source: DataSourceV2
+
+  /**
+   * The output of the data source reader, w.r.t. column pruning.
+   */
+  def output: Seq[Attribute]
+
+  /**
+   * The options for this data source reader.
+   */
+  def options: Map[String, String]
+
+  /**
+   * The created data source reader. Here we use it to get the filters 
that has been pushed down
+   * so far, itself doesn't take part in the equals/hashCode.
+   */
+  def reader: DataSourceReader
+
+  private lazy val filters = reader match {
+case s: SupportsPushDownCatalystFilters => 
s.pushedCatalystFilters().toSet
+case s: SupportsPushDownFilters => s.pushedFilters().toSet
+case _ => Set.empty
+  }
+
+  private def sourceName: String = source match {
+case registered: DataSourceRegister => registered.shortName()
+case _ => source.getClass.getSimpleName.stripSuffix("$")
+  }
+
+  def metadataString: String = {
+val entries = scala.collection.mutable.ArrayBuffer.empty[(String, 
String)]
+
+if (filters.nonEmpty) {
+  entries += "Pushed Filters" -> filters.mkString("[", ", ", "]")
+}
+
+// TODO: we should only display some standard options like path, 
table, etc.
+if (options.nonEmpty) {
+  entries += "Options" -> Utils.redact(options).map {
+case (k, v) => s"$k=$v"
+  }.mkString("[", ",", "]")
+}
+
+val outputStr = Utils.truncatedString(output, "[", ", ", "]")
+
+val entriesStr = if (entries.nonEmpty) {
+  Utils.truncatedString(entries.map {
+case (key, value) => key + ": " + 
StringUtils.abbreviate(redact(value), 100)
+  }, " (", ", ", ")")
+} else {
+  ""
+}
+
+s"$sourceName$outputStr$entriesStr"
--- End diff --

`outputStr` doesn't need space, we want `Relation[a: int]` instead of 
`Relation [a: int]`. This is also what data source v1 explains.

`entriesStr` has space. It's added via the paramter of mkString: `" (", ", 
", ")"`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1221/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20706: [SPARK-23550][core] Cleanup `Utils`.

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20706
  
**[Test build #87863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87863/testReport)**
 for PR 20706 at commit 
[`d026fff`](https://github.com/apache/spark/commit/d026fff645bbec19b15d630af55625fc98333555).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18784
  
**[Test build #87876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87876/testReport)**
 for PR 18784 at commit 
[`2c1574e`](https://github.com/apache/spark/commit/2c1574ea5ab55f1b0af1764aaf40ba9de5824d0b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20111: [SPARK-22883][ML][TEST] Streaming tests for spark...

2018-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20111


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171765261
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderEstimatorSuite.scala
 ---
@@ -103,11 +96,12 @@ class OneHotEncoderEstimatorSuite
   .setInputCols(Array("size"))
   .setOutputCols(Array("encoded"))
 val model = encoder.fit(df)
-val output = model.transform(df)
-val group = AttributeGroup.fromStructField(output.schema("encoded"))
-assert(group.size === 2)
-assert(group.getAttr(0) === 
BinaryAttribute.defaultAttr.withName("small").withIndex(0))
-assert(group.getAttr(1) === 
BinaryAttribute.defaultAttr.withName("medium").withIndex(1))
+testTransformerByGlobalCheckFunc[(Double)](df, model, "encoded") { 
rows =>
+val group = 
AttributeGroup.fromStructField(rows.head.schema("encoded"))
+assert(group.size === 2)
+assert(group.getAttr(0) === 
BinaryAttribute.defaultAttr.withName("small").withIndex(0))
+assert(group.getAttr(1) === 
BinaryAttribute.defaultAttr.withName("medium").withIndex(1))
+}
--- End diff --

Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171765221
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala ---
@@ -17,94 +17,72 @@
 
 package org.apache.spark.ml.feature
 
-import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
-import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest}
 import org.apache.spark.ml.util.TestingUtils._
-import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
 
 
-class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext 
with DefaultReadWriteTest {
+class NormalizerSuite extends MLTest with DefaultReadWriteTest {
 
   import testImplicits._
 
-  @transient var data: Array[Vector] = _
-  @transient var dataFrame: DataFrame = _
-  @transient var normalizer: Normalizer = _
-  @transient var l1Normalized: Array[Vector] = _
-  @transient var l2Normalized: Array[Vector] = _
+  @transient val data: Seq[Vector] = Seq(
+Vectors.sparse(3, Seq((0, -2.0), (1, 2.3))),
+Vectors.dense(0.0, 0.0, 0.0),
+Vectors.dense(0.6, -1.1, -3.0),
+Vectors.sparse(3, Seq((1, 0.91), (2, 3.2))),
+Vectors.sparse(3, Seq((0, 5.7), (1, 0.72), (2, 2.7))),
+Vectors.sparse(3, Seq()))
--- End diff --

But '@transient' is about to skipping serialisation for this field


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20713
  
**[Test build #87875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87875/testReport)**
 for PR 20713 at commit 
[`fd538ca`](https://github.com/apache/spark/commit/fd538ca84936549af623d3678d43ae935a6549e3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1220/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20633: [SPARK-23455][ML] Default Params in ML should be saved s...

2018-03-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20633
  
also ping @MLnick @WeichenXu123  for review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...

2018-03-01 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20464
  
Because 2.3 is released, ping @felixcheung again


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20713
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20713
  
**[Test build #87872 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87872/testReport)**
 for PR 20713 at commit 
[`fd538ca`](https://github.com/apache/spark/commit/fd538ca84936549af623d3678d43ae935a6549e3).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87872/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171764043
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), 
("male", "baz", 5))
 .toDF("id", "a", "b")
 val model = formula.fit(original)
+val attr = NominalAttribute.defaultAttr
 val expected = Seq(
 ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0),
 ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0),
 ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0),
 ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0)
 ).toDF("id", "a", "b", "features", "label")
-// assert(result.schema.toString == resultSchema.toString)
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

It was at the level of val +2 extra spaces. Should I indent the dots to the 
same row? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20715
  
**[Test build #87874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87874/testReport)**
 for PR 20715 at commit 
[`314fae2`](https://github.com/apache/spark/commit/314fae2d36a3b0916fd6e04713a923f1a6f203c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1219/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20715
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20714
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20715
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1218/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20715: [SPARK-23434][SQL][BRANCH-2.2] Spark should not w...

2018-03-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20715

[SPARK-23434][SQL][BRANCH-2.2] Spark should not warn `metadata directory` 
for a HDFS file path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
it warns with a wrong warning message during looking up 
`people.json/_spark_metadata`. The root cause of this situation is the 
difference between `LocalFileSystem` and `DistributedFileSystem`. 
`LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` 
raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT

scala> 
spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+
```

## How was this patch tested?

Manual.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23434-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20715.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20715


commit 314fae2d36a3b0916fd6e04713a923f1a6f203c2
Author: Dongjoon Hyun 
Date:   2018-02-21T00:02:44Z

[SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS 
file path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
it warns with a wrong warning message during looking up 
`people.json/_spark_metadata`. The root cause of this situation is the 
difference between `LocalFileSystem` and `DistributedFileSystem`. 
`LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` 
raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT

scala> 
spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+
```

## How was this patch tested?

Manual.

Author: Dongjoon Hyun 

Closes #20616 from dongjoon-hyun/SPARK-23434.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task completion ...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20714
  
**[Test build #87873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87873/testReport)**
 for PR 20714 at commit 
[`d15eba7`](https://github.com/apache/spark/commit/d15eba754a59721bc7d9cdc7d374f2f323d21e41).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20714: [SPARK-23457][SQL][BRANCH-2.3] Register task comp...

2018-03-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20714

[SPARK-23457][SQL][BRANCH-2.3] Register task completion listeners first in 
ParquetFileFormat

## What changes were proposed in this pull request?

ParquetFileFormat leaks opened files in some cases. This PR prevents that 
by registering task completion listers first before initialization.

- 
[spark-branch-2.3-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/205/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/)
- 
[spark-master-test-sbt-hadoop-2.6](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4228/testReport/junit/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/)

```
Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null
at 
org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36)
at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538)
at 
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400)
at
```

## How was this patch tested?

Manual. The following test case generates the same leakage.

```scala
  test("SPARK-23457 Register task completion listeners first in 
ParquetFileFormat") {
withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE.key -> 
s"${Int.MaxValue}") {
  withTempDir { dir =>
val basePath = dir.getCanonicalPath
Seq(0).toDF("a").write.format("parquet").save(new Path(basePath, 
"first").toString)
Seq(1).toDF("a").write.format("parquet").save(new Path(basePath, 
"second").toString)
val df = spark.read.parquet(
  new Path(basePath, "first").toString,
  new Path(basePath, "second").toString)
val e = intercept[SparkException] {
  df.collect()
}
assert(e.getCause.isInstanceOf[OutOfMemoryError])
  }
}
  }
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23457-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20714.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20714


commit d15eba754a59721bc7d9cdc7d374f2f323d21e41
Author: Dongjoon Hyun 
Date:   2018-02-20T05:33:03Z

[SPARK-23457][SQL] Register task completion listeners first in 
ParquetFileFormat

ParquetFileFormat leaks opened files in some cases. This PR prevents that 
by registering task completion listers first before initialization.

- 
[spark-branch-2.3-test-sbt-hadoop-2.7](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/205/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/)
- 
[spark-master-test-sbt-hadoop-2.6](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/4228/testReport/junit/org.apache.spark.sql.execution.datasources.parquet/ParquetQuerySuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/)

```
Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null
at 
org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36)
at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538)
at 
org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149)
at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400)
at
```


[GitHub] spark issue #20632: [SPARK-3159][ML] Add decision tree pruning

2018-03-01 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20632
  
Jenkins test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1217/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20713
  
**[Test build #87872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87872/testReport)**
 for PR 20713 at commit 
[`fd538ca`](https://github.com/apache/spark/commit/fd538ca84936549af623d3678d43ae935a6549e3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20709: [SPARK-18844][MLLIB] Adding more binary classification e...

2018-03-01 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20709
  
Why did you close the old one and re-open this? The discussion is lost now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `me...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20713
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171762299
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/SQLTransformerSuite.scala ---
@@ -63,13 +68,17 @@ class SQLTransformerSuite
   }
 
   test("SPARK-22538: SQLTransformer should not unpersist given dataset") {
-val df = spark.range(10)
+val df = spark.range(10).toDF()
 df.cache()
 df.count()
 assert(df.storageLevel != StorageLevel.NONE)
-new SQLTransformer()
+val sqlTrans = new SQLTransformer()
   .setStatement("SELECT id + 1 AS id1 FROM __THIS__")
-  .transform(df)
-assert(df.storageLevel != StorageLevel.NONE)
+testTransformerByGlobalCheckFunc[Long](
+  df,
+  sqlTrans,
+  "id1") { rows =>
+  assert(df.storageLevel != StorageLevel.NONE)
--- End diff --

Move `assert(df.storageLevel != StorageLevel.NONE)` to here seems 
meaningless, because you do not use `rows` parameter. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171760358
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -32,10 +31,20 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   def testRFormulaTransform[A: Encoder](
   dataframe: DataFrame,
   formulaModel: RFormulaModel,
-  expected: DataFrame): Unit = {
+  expected: DataFrame,
+  expectedAttributes: AttributeGroup*): Unit = {
+val resultSchema = formulaModel.transformSchema(dataframe.schema)
+assert(resultSchema.json == expected.schema.json)
+assert(resultSchema == expected.schema)
 val (first +: rest) = expected.schema.fieldNames.toSeq
 val expectedRows = expected.collect()
 testTransformerByGlobalCheckFunc[A](dataframe, formulaModel, first, 
rest: _*) { rows =>
+  assert(rows.head.schema.toString() == resultSchema.toString())
+  for (expectedAttributeGroup <- expectedAttributes) {
+val attributeGroup =
+  
AttributeGroup.fromStructField(rows.head.schema(expectedAttributeGroup.name))
+assert(attributeGroup == expectedAttributeGroup)
--- End diff --

Should we use `===` instead ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171757214
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala ---
@@ -17,94 +17,72 @@
 
 package org.apache.spark.ml.feature
 
-import org.apache.spark.SparkFunSuite
 import org.apache.spark.ml.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
-import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.ml.util.{DefaultReadWriteTest, MLTest}
 import org.apache.spark.ml.util.TestingUtils._
-import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
 
 
-class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext 
with DefaultReadWriteTest {
+class NormalizerSuite extends MLTest with DefaultReadWriteTest {
 
   import testImplicits._
 
-  @transient var data: Array[Vector] = _
-  @transient var dataFrame: DataFrame = _
-  @transient var normalizer: Normalizer = _
-  @transient var l1Normalized: Array[Vector] = _
-  @transient var l2Normalized: Array[Vector] = _
+  @transient val data: Seq[Vector] = Seq(
+Vectors.sparse(3, Seq((0, -2.0), (1, 2.3))),
+Vectors.dense(0.0, 0.0, 0.0),
+Vectors.dense(0.6, -1.1, -3.0),
+Vectors.sparse(3, Seq((1, 0.91), (2, 3.2))),
+Vectors.sparse(3, Seq((0, 5.7), (1, 0.72), (2, 2.7))),
+Vectors.sparse(3, Seq()))
--- End diff --

I only doubt that when the testsuite object being serialized and then 
deserialized, the `data` will lost. But I am not sure which case serialization 
will occur.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20619
  
Thank you, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171757269
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderEstimatorSuite.scala
 ---
@@ -103,11 +96,12 @@ class OneHotEncoderEstimatorSuite
   .setInputCols(Array("size"))
   .setOutputCols(Array("encoded"))
 val model = encoder.fit(df)
-val output = model.transform(df)
-val group = AttributeGroup.fromStructField(output.schema("encoded"))
-assert(group.size === 2)
-assert(group.getAttr(0) === 
BinaryAttribute.defaultAttr.withName("small").withIndex(0))
-assert(group.getAttr(1) === 
BinaryAttribute.defaultAttr.withName("medium").withIndex(1))
+testTransformerByGlobalCheckFunc[(Double)](df, model, "encoded") { 
rows =>
+val group = 
AttributeGroup.fromStructField(rows.head.schema("encoded"))
+assert(group.size === 2)
+assert(group.getAttr(0) === 
BinaryAttribute.defaultAttr.withName("small").withIndex(0))
+assert(group.getAttr(1) === 
BinaryAttribute.defaultAttr.withName("medium").withIndex(1))
+}
--- End diff --

Discussed with @jkbradley . Agreed with you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171762031
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -538,21 +540,28 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
 
 // Handle unseen labels.
 val formula2 = new RFormula().setFormula("b ~ a + id")
-intercept[SparkException] {
-  formula2.fit(df1).transform(df2).collect()
-}
+testTransformerByInterceptingException[(Int, String, String)](
+  df2,
+  formula2.fit(df1),
+  "Unseen label:",
+  "label")
+
 val model3 = formula2.setHandleInvalid("skip").fit(df1)
 val model4 = formula2.setHandleInvalid("keep").fit(df1)
 
+val attr = NominalAttribute.defaultAttr
 val expected3 = Seq(
   (1, "foo", "zq", Vectors.dense(0.0, 1.0), 0.0),
   (2, "bar", "zq", Vectors.dense(1.0, 2.0), 0.0)
 ).toDF("id", "a", "b", "features", "label")
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171761570
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -32,10 +31,20 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   def testRFormulaTransform[A: Encoder](
   dataframe: DataFrame,
   formulaModel: RFormulaModel,
-  expected: DataFrame): Unit = {
+  expected: DataFrame,
+  expectedAttributes: AttributeGroup*): Unit = {
+val resultSchema = formulaModel.transformSchema(dataframe.schema)
+assert(resultSchema.json == expected.schema.json)
--- End diff --

You compare `schema.json` instead of `schema.toString`. Are you sure they 
have the same effect ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171762037
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -538,21 +540,28 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
 
 // Handle unseen labels.
 val formula2 = new RFormula().setFormula("b ~ a + id")
-intercept[SparkException] {
-  formula2.fit(df1).transform(df2).collect()
-}
+testTransformerByInterceptingException[(Int, String, String)](
+  df2,
+  formula2.fit(df1),
+  "Unseen label:",
+  "label")
+
 val model3 = formula2.setHandleInvalid("skip").fit(df1)
 val model4 = formula2.setHandleInvalid("keep").fit(df1)
 
+val attr = NominalAttribute.defaultAttr
 val expected3 = Seq(
   (1, "foo", "zq", Vectors.dense(0.0, 1.0), 0.0),
   (2, "bar", "zq", Vectors.dense(1.0, 2.0), 0.0)
 ).toDF("id", "a", "b", "features", "label")
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
+
 val expected4 = Seq(
   (1, "foo", "zq", Vectors.dense(0.0, 1.0, 1.0), 0.0),
   (2, "bar", "zq", Vectors.dense(1.0, 0.0, 2.0), 0.0),
   (3, "bar", "zy", Vectors.dense(1.0, 0.0, 3.0), 2.0)
 ).toDF("id", "a", "b", "features", "label")
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171761966
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -381,31 +386,31 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
 NumericAttribute.defaultAttr)).toMetadata()
 val original = base.select(base.col("id"), base.col("vec").as("vec2", 
metadata))
 val model = formula.fit(original)
-val result = model.transform(original)
-val attrs = AttributeGroup.fromStructField(result.schema("features"))
+val expected = Seq(
+  (1, Vectors.dense(0.0, 1.0), Vectors.dense(0.0, 1.0), 1.0),
+  (2, Vectors.dense(1.0, 2.0), Vectors.dense(1.0, 2.0), 2.0)
+).toDF("id", "vec2", "features", "label")
+  .select($"id", $"vec2".as("vec2", metadata), $"features", $"label")
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171759634
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderEstimatorSuite.scala
 ---
@@ -151,29 +146,30 @@ class OneHotEncoderEstimatorSuite
 
 val df = spark.createDataFrame(sc.parallelize(data), schema)
 
-val dfWithTypes = df
-  .withColumn("shortInput", df("input").cast(ShortType))
-  .withColumn("longInput", df("input").cast(LongType))
-  .withColumn("intInput", df("input").cast(IntegerType))
-  .withColumn("floatInput", df("input").cast(FloatType))
-  .withColumn("decimalInput", df("input").cast(DecimalType(10, 0)))
-
-val cols = Array("input", "shortInput", "longInput", "intInput",
-  "floatInput", "decimalInput")
-for (col <- cols) {
-  val encoder = new OneHotEncoderEstimator()
-.setInputCols(Array(col))
+class NumericTypeWithEncoder[A](val numericType: NumericType)
+  (implicit val encoder: Encoder[(A, Vector)])
+
+val types = Seq(
+  new NumericTypeWithEncoder[Short](ShortType),
+  new NumericTypeWithEncoder[Long](LongType),
+  new NumericTypeWithEncoder[Int](IntegerType),
+  new NumericTypeWithEncoder[Float](FloatType),
+  new NumericTypeWithEncoder[Byte](ByteType),
+  new NumericTypeWithEncoder[Double](DoubleType),
+  new NumericTypeWithEncoder[Decimal](DecimalType(10, 
0))(ExpressionEncoder()))
--- End diff --

Oh I see. This is an syntax issue that `testTransformer` need generic 
parameter. When I design the `testTransformer` helper function, I cannot 
eliminate the generic parameter which make things difficult.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20686: [SPARK-22915][MLlib] Streaming tests for spark.ml...

2018-03-01 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20686#discussion_r171761941
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala ---
@@ -313,13 +306,14 @@ class RFormulaSuite extends MLTest with 
DefaultReadWriteTest {
   Seq(("male", "foo", 4), ("female", "bar", 4), ("female", "bar", 5), 
("male", "baz", 5))
 .toDF("id", "a", "b")
 val model = formula.fit(original)
+val attr = NominalAttribute.defaultAttr
 val expected = Seq(
 ("male", "foo", 4, Vectors.dense(0.0, 1.0, 4.0), 1.0),
 ("female", "bar", 4, Vectors.dense(1.0, 0.0, 4.0), 0.0),
 ("female", "bar", 5, Vectors.dense(1.0, 0.0, 5.0), 0.0),
 ("male", "baz", 5, Vectors.dense(0.0, 0.0, 5.0), 1.0)
 ).toDF("id", "a", "b", "features", "label")
-// assert(result.schema.toString == resultSchema.toString)
+  .select($"id", $"a", $"b", $"features", $"label".as("label", 
attr.toMetadata()))
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20713: [SPARK-23434][SQL][BRANCH-2.3] Spark should not w...

2018-03-01 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/20713

[SPARK-23434][SQL][BRANCH-2.3] Spark should not warn `metadata directory` 
for a HDFS file path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
it warns with a wrong warning message during looking up 
`people.json/_spark_metadata`. The root cause of this situation is the 
difference between `LocalFileSystem` and `DistributedFileSystem`. 
`LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` 
raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT

scala> 
spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+
```

## How was this patch tested?

Manual.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23434-2.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20713


commit fd538ca84936549af623d3678d43ae935a6549e3
Author: Dongjoon Hyun 
Date:   2018-02-21T00:02:44Z

[SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS 
file path

## What changes were proposed in this pull request?

In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), 
it warns with a wrong warning message during looking up 
`people.json/_spark_metadata`. The root cause of this situation is the 
difference between `LocalFileSystem` and `DistributedFileSystem`. 
`LocalFileSystem.exists()` returns `false`, but `DistributedFileSystem.exists` 
raises `org.apache.hadoop.security.AccessControlException`.

```scala
scala> spark.version
res0: String = 2.4.0-SNAPSHOT

scala> 
spark.read.json("file:///usr/hdp/current/spark-client/examples/src/main/resources/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+

scala> spark.read.json("hdfs:///tmp/people.json")
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
18/02/15 05:00:48 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
```

After this PR,
```scala
scala> spark.read.json("hdfs:///tmp/people.json").show
++---+
| age|   name|
++---+
|null|Michael|
|  30|   Andy|
|  19| Justin|
++---+
```

## How was this patch tested?

Manual.

Author: Dongjoon Hyun 

Closes #20616 from dongjoon-hyun/SPARK-23434.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-03-01 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/20708
  
* Only committers can trigger the tests. 
* MLlib is in maintenance only mode, so we wouldn't accept this patch as 
is. 
* If this were to go into ML, I think you'd need to discuss it more 
thoroughly on the JIRA. Another good alternative is to make this a Spark 
package. 
* It's best if you write unit tests and adhere to the style guides when you 
submit new patches. 

I would recommend closing this PR until more discussion has taken place 
about whether or not this is a good fit for Spark ML. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...

2018-03-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20616
  
Thank you, @cloud-fan .
Then, I'll make a backport PR to pass Jenkins once more for each branch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20712
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20712: [SPARK-23563] config codegen compile cache size

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20712
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20712: [SPARK-23563] config codegen compile cache size

2018-03-01 Thread passionke
GitHub user passionke opened a pull request:

https://github.com/apache/spark/pull/20712

[SPARK-23563] config codegen compile cache size

## What changes were proposed in this pull request?

config codegen compile class cache size

## How was this patch tested?
spark sql integration tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/passionke/spark 
feature/codegenerator_cache_size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20712


commit 195fb75a86029dc2784bcacf0a5f96155480201e
Author: 万两 
Date:   2018-03-02T03:28:46Z

config codegen compile cache size




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...

2018-03-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20616
  
no objection from my side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20619: [SPARK-23457][SQL] Register task completion listeners fi...

2018-03-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20619
  
Yea, please go ahead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87861/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20705
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87868/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20705: [SPARK-23553][TESTS] Tests should not assume the default...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20705
  
**[Test build #87861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87861/testReport)**
 for PR 20705 at commit 
[`5192e6a`](https://github.com/apache/spark/commit/5192e6a152781bdc44a2908cc78e19b0421fb287).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19108
  
**[Test build #87868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87868/testReport)**
 for PR 19108 at commit 
[`aa9772e`](https://github.com/apache/spark/commit/aa9772e227a6406596d9cec97d73ef285204f785).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20692: [SPARK-23531][SQL] Show attribute type in explain

2018-03-01 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/20692
  
Good point on nested types. I don't think heavy nesting is the usual case, 
but we can definitely improve the explain result in the long term by separating 
it out. Maybe just using a high-level type (e.g. struct<...>) for now would 
work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20711: [SPARKR][DOC] fix link in vignettes

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20711
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >