[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20314
  
**[Test build #86337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86337/testReport)**
 for PR 20314 at commit 
[`b13ad38`](https://github.com/apache/spark/commit/b13ad382f3ce8a3f33b553e954b78fa9185882ba).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20314
  
**[Test build #86336 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86336/testReport)**
 for PR 20314 at commit 
[`365aa9c`](https://github.com/apache/spark/commit/365aa9c0662dcf246d5dd54a1fb01cdc69e59dcd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20307: [SPARK-23141][SQL][PYSPARK] Support data type str...

2018-01-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20307#discussion_r162308708
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2108,7 +2108,8 @@ def udf(f=None, returnType=StringType()):
 can fail on special rows, the workaround is to incorporate the 
condition into the functions.
 
 :param f: python function if used as a standalone function
-:param returnType: a :class:`pyspark.sql.types.DataType` object
+:param returnType: the return type of the registered user-defined 
function. The value can be
--- End diff --

Seems typo: `the return type of the registered user-defined function.` -> 
`the return type of the user-defined function.`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes sc...

2018-01-18 Thread foxish
GitHub user foxish opened a pull request:

https://github.com/apache/spark/pull/20314

[SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler documentation

## What changes were proposed in this pull request?

Docs changes:
- Adding a warning that the backend is experimental.
- Removing a defunct internal-only option from documentation
- Clarifying that node selectors can be used right away, and other minor 
cosmetic changes

## How was this patch tested?

Docs only change


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/foxish/spark ambiguous-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20314.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20314


commit 27fc9cc2a4f000aa532240db7c871037292324c6
Author: foxish 
Date:   2018-01-17T22:46:15Z

Basic changes

commit 5369564344f2655b5453740aba6de867383c7ac3
Author: foxish 
Date:   2018-01-18T10:49:03Z

Add section about backend

commit 7b45c8d728a114704647ce714643db1e35174b7f
Author: foxish 
Date:   2018-01-18T10:49:41Z

Remove option to set executor pod prefix

commit 365aa9c0662dcf246d5dd54a1fb01cdc69e59dcd
Author: foxish 
Date:   2018-01-18T10:57:37Z

clarify




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20314: [SPARK-23104][K8S][Docs] Changes to Kubernetes scheduler...

2018-01-18 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20314
  
cc/ @vanzin @sameeragarwal @liyinan926 @ash211 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20312
  
**[Test build #4059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4059/testReport)**
 for PR 20312 at commit 
[`10afff2`](https://github.com/apache/spark/commit/10afff276d23cbe98f8e4187791fb1358eff15fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86328/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20305
  
**[Test build #86328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86328/testReport)**
 for PR 20305 at commit 
[`a978dcc`](https://github.com/apache/spark/commit/a978dcc3052f0df4485594c6cd4a944d8b6dab5e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20312
  
**[Test build #4059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4059/testReport)**
 for PR 20312 at commit 
[`10afff2`](https://github.com/apache/spark/commit/10afff276d23cbe98f8e4187791fb1358eff15fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86332/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20313
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20313
  
**[Test build #86332 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86332/testReport)**
 for PR 20313 at commit 
[`aeae308`](https://github.com/apache/spark/commit/aeae308055cd16d95ef9ff86df882ec1aa20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86326/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20305
  
**[Test build #86326 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86326/testReport)**
 for PR 20305 at commit 
[`094b7eb`](https://github.com/apache/spark/commit/094b7ebbaf7bfe75e706cf42565f0c077938e821).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20309
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20309
  
**[Test build #86334 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86334/testReport)**
 for PR 20309 at commit 
[`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20309
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86334/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162299712
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

How about what I suggested at 
https://github.com/apache/spark/pull/20306#discussion_r162269190?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20306
  
**[Test build #86335 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86335/testReport)**
 for PR 20306 at commit 
[`74c1735`](https://github.com/apache/spark/commit/74c17353bb6372b123c5aee1b6d58a21de36f99a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20306
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162296434
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

maybe we should do the same thing for python UDT in the future, and leave 
it for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162293870
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

You suggested like this? 
https://github.com/apache/spark/compare/master...maropu:SPARK-23054-2

If so, this just dumps an internal structure;
```
scala> val df1 = Seq((1, Vectors.dense(Array(1.0, 2.0, 3.0.toDF("a", 
"b")
scala> df1.selectExpr("CAST(b AS STRING)").show(false)
+--+
|b |
+--+
|[1,,, [1.0, 2.0, 3.0]]|
+--+

scala> val df2 = Seq((1, Vectors.sparse(3, Array(0, 2), Array(1.0, 
3.0.toDF("a", "b")
scala> df2.selectExpr("CAST(b AS STRING)").show(false)
+--+
|b |
+--+
|[0, 3, [0, 2], [1.0, 3.0]]|
+--+
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162293629
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

Yes, it works to cast to string.

Btw, as for `VectorUDT`, seems like `DenseVector` and `SparseVector` 
override `toString()` at least for `show()` on purpose(?):

https://github.com/apache/spark/blob/74c17353bb6372b123c5aee1b6d58a21de36f99a/python/pyspark/ml/classification.py#L1497-L1503

If we also use cast to string for `show()`, the result will be like:

```
+-+--+
| features|prediction|
+-+--+
|[1,,, [1.0, 0.0]]|   1.0|
|[1,,, [0.0, 0.0]]|   0.0|
+-+--+
```

I'm not sure we can change the string here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...

2018-01-18 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20275#discussion_r162292944
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with Logging {
 assert(vec.toArray === arr)
   }
 
+  test("zero-length sparse vector") {
--- End diff --

While we're doing this we may as well also add a test to `intercept` the 
exception for negative size (as per the other sparse vector construction 
tests), for both `ml` and `mllib`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20275: [SPARK-23085][ML] API parity for mllib.linalg.Vec...

2018-01-18 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/20275#discussion_r162292520
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -113,6 +113,13 @@ class VectorsSuite extends SparkFunSuite with Logging {
 assert(vec.toArray === arr)
   }
 
+  test("zero-length sparse vector") {
--- End diff --

We may as well add the same test to `ml.linalg`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20309
  
**[Test build #86334 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86334/testReport)**
 for PR 20309 at commit 
[`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20309
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20310: revert [SPARK-10030] Use tags to control which tests to ...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20310
  
**[Test build #86333 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86333/testReport)**
 for PR 20310 at commit 
[`b6c46b5`](https://github.com/apache/spark/commit/b6c46b5900bf1109f836674b8ba5ee3cb4712771).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20313: [SPARK-22974][ML] Attach attributes to output column of ...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20313
  
**[Test build #86332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86332/testReport)**
 for PR 20313 at commit 
[`aeae308`](https://github.com/apache/spark/commit/aeae308055cd16d95ef9ff86df882ec1aa20).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162287902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

Like the python UDT, we cursively call `castToStringCode(pudt.sqlType, 
ctx)`, does it work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20313: [SPARK-22974][ML] Attach attributes to output col...

2018-01-18 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20313

[SPARK-22974][ML] Attach attributes to output column of CountVectorModel

## What changes were proposed in this pull request?

The output column from `CountVectorModel` lacks attribute. So a later 
transformer like `Interaction` can raise error because no attribute available.

## How was this patch tested?

Added test.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-22974

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20313


commit aeae308055cd16d95ef9ff86df882ec1aa20
Author: Liang-Chi Hsieh 
Date:   2018-01-18T09:25:54Z

Attach attributes to output column of CountVectorModel.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20023
  
LGTM, pending jenkins


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20311: [SPARK-23144][SS] Added console sink for continuous proc...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20311
  
**[Test build #86331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86331/testReport)**
 for PR 20311 at commit 
[`6f69669`](https://github.com/apache/spark/commit/6f69669c6b34a6d6bbcd11c3fb635262fe802d28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20312: [Docs] change to dataset for java code in structured-str...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20312
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20311: [SPARK-23144][SS] Added console sink for continuous proc...

2018-01-18 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/20311
  
@jose-torres PTAL


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20312: [Docs] change to dataset for java code in structu...

2018-01-18 Thread brandonJY
GitHub user brandonJY opened a pull request:

https://github.com/apache/spark/pull/20312

[Docs] change to dataset for java code in 
structured-streaming-kafka-integration document

## What changes were proposed in this pull request?

In latest structured-streaming-kafka-integration document, Java code 
example for Kafka integration is using `DataFrame`, shouldn't it be 
changed to `DataSet`?

## How was this patch tested?

manual test has been performed to test the updated example Java code in 
Spark 2.2.1 with Kafka 1.0


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/brandonJY/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20312


commit 10afff276d23cbe98f8e4187791fb1358eff15fb
Author: brandonJY 
Date:   2018-01-18T08:57:56Z

[Docs] change to dataset for java code in 
structured-streaming-kafka-integration document

In latest structured-streaming-kafka-integration document, Java code 
example for Kafka integration is using `DataFrame`, shouldn't it be 
changed to `DataSet`?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20311: [SPARK-23144][SS] Added console sink for continuo...

2018-01-18 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/20311

[SPARK-23144][SS] Added console sink for continuous processing

## What changes were proposed in this pull request?
Refactored ConsoleWriter into ConsoleMicrobatchWriter and 
ConsoleContinuousWriter.

## How was this patch tested?
new unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-23144

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20311


commit 6f69669c6b34a6d6bbcd11c3fb635262fe802d28
Author: Tathagata Das 
Date:   2018-01-18T09:07:00Z

added console sink for continuous processing




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19492: [SPARK-22228][SQL] Add support for array...

2018-01-18 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19492
  
@viirya did you have any chance to look at this? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20309
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86329/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20309
  
**[Test build #86329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86329/testReport)**
 for PR 20309 at commit 
[`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20309
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20023
  
**[Test build #86330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86330/testReport)**
 for PR 20023 at commit 
[`b4b0350`](https://github.com/apache/spark/commit/b4b0350dea09db897b70485ef1fad41a742eae30).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162277049
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

But, `VectorUDT.sqlType` has non-array formats:

https://github.com/apache/spark/blob/1c76a91e5fae11dcb66c453889e587b48039fdc9/mllib/src/main/scala/org/apache/spark/ml/linalg/VectorUDT.scala#L88

In this case, how do we convert `VectorUDT` data into array-lie strings 
(e.g., [0, 1, 2, ...])?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20309: [SPARK-23143][SS][PYTHON] Added python API for setting c...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20309
  
**[Test build #86329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86329/testReport)**
 for PR 20309 at commit 
[`5f905aa`](https://github.com/apache/spark/commit/5f905aabdbf8ae402c90769e6d7841a7a5d76d70).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20305
  
**[Test build #86328 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86328/testReport)**
 for PR 20305 at commit 
[`a978dcc`](https://github.com/apache/spark/commit/a978dcc3052f0df4485594c6cd4a944d8b6dab5e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20306#discussion_r162275882
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -838,6 +839,7 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
  |$evPrim = $buffer.build();
""".stripMargin
 }
+  case pudt: PythonUserDefinedType => castToStringCode(pudt.sqlType, 
ctx)
--- End diff --

New thought: since UDT is not finalized yet(internal only), the only thing 
we care about is to have a reasonable string representation.

It's unclear that UDT class always have a reasonable `toString`, and 
`UDT.deserialize` may be pretty slow, how about we always use `UDT.sqlType` to 
string casting and `showString`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86327 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86327/testReport)**
 for PR 20276 at commit 
[`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20276
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...

2018-01-18 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20305#discussion_r162274760
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala 
---
@@ -98,20 +98,7 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
   override def extraPlanningStrategies: Seq[Strategy] =
 super.extraPlanningStrategies ++ customPlanningStrategies
 
-  override def strategies: Seq[Strategy] = {
-experimentalMethods.extraStrategies ++
-  extraPlanningStrategies ++ Seq(
-  FileSourceStrategy,
-  DataSourceStrategy(conf),
-  SpecialLimits,
-  InMemoryScans,
-  HiveTableScans,
-  Scripts,
-  Aggregation,
-  JoinSelection,
-  BasicOperators
-)
-  }
+  override def strategies: Seq[Strategy] = Seq(HiveTableScans, 
Scripts) ++ super.strategies
--- End diff --

OK, let me update it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...

2018-01-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20305#discussion_r162274134
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala 
---
@@ -98,20 +98,7 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
   override def extraPlanningStrategies: Seq[Strategy] =
 super.extraPlanningStrategies ++ customPlanningStrategies
 
-  override def strategies: Seq[Strategy] = {
-experimentalMethods.extraStrategies ++
-  extraPlanningStrategies ++ Seq(
-  FileSourceStrategy,
-  DataSourceStrategy(conf),
-  SpecialLimits,
-  InMemoryScans,
-  HiveTableScans,
-  Scripts,
-  Aggregation,
-  JoinSelection,
-  BasicOperators
-)
-  }
+  override def strategies: Seq[Strategy] = Seq(HiveTableScans, 
Scripts) ++ super.strategies
--- End diff --

This breaks the assumption that `experimentalMethods.extraStrategies` 
should always run first.

I think we can just do:
```
override def extraPlanningStrategies: Seq[Strategy] =
  super.extraPlanningStrategies ++ customPlanningStrategies ++ 
Seq(HiveTableScans, Scripts)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20305
  
**[Test build #86326 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86326/testReport)**
 for PR 20305 at commit 
[`094b7eb`](https://github.com/apache/spark/commit/094b7ebbaf7bfe75e706cf42565f0c077938e821).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20306
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86314/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20306
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86318/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20298
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86315/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86324/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20306: [SPARK-23054][SQL][PYSPARK][FOLLOWUP] Use sqlType castin...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20306
  
**[Test build #86314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86314/testReport)**
 for PR 20306 at commit 
[`74c1735`](https://github.com/apache/spark/commit/74c17353bb6372b123c5aee1b6d58a21de36f99a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20276
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20305
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20298
  
**[Test build #86315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86315/testReport)**
 for PR 20298 at commit 
[`38916f7`](https://github.com/apache/spark/commit/38916f769252938fbce891cf1d21972e50a01181).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20298: [SPARK-22976][Core]: Cluster mode driver dir removed whi...

2018-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20298
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hive Sess...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20305
  
**[Test build #86318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86318/testReport)**
 for PR 20305 at commit 
[`f17b44d`](https://github.com/apache/spark/commit/f17b44de6e4d2ece008d3856fdcc037cce7dd147).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-01-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20276
  
**[Test build #86324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86324/testReport)**
 for PR 20276 at commit 
[`d0bdddf`](https://github.com/apache/spark/commit/d0bdddfffc18258ba1536c9cff4ea0856026094c).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20305: [SPARK-23140][SQL] Add DataSourceV2Strategy to Hi...

2018-01-18 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20305#discussion_r162270415
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala 
---
@@ -101,6 +102,7 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
   override def strategies: Seq[Strategy] = {
 experimentalMethods.extraStrategies ++
   extraPlanningStrategies ++ Seq(
--- End diff --

Looks like the ordering matters, If I put Hive related strategies in the 
end, some unit tests will be failed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    4   5   6   7   8   9