[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3689/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21427
  
I guess sending configurations is not that difficult.
We can write configs (as `Map[String, String]` for further configurations 
in the future?) before `PythonUDFRunner.writeUDFs(dataOut, funcs, argOffsets)` 
in `ArrowPythonRunner.writeCommand()` (and `PythonUDFRunner.writeCommand()`?), 
and read them before read udfs at `worker.py`. The `timezone` can be included 
in the configs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3555/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Yup, my impression was that there could be a corner case too but I wasn't 
sure how much the corner case makes sense, and haven't checked it closelt yet. 
I believe elaborating the case might be helpful to judge we should block this 
or now. The current approach looks fine in general to me though. I think it's 
fine if it's a bit of behaviour change as long as we mention it in the 
migration guide cc @cloud-fan too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3555/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3688/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91274/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21437
  
**[Test build #91274 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91274/testReport)**
 for PR 21437 at commit 
[`9d95c12`](https://github.com/apache/spark/commit/9d95c12a0ada0520f426723406a7d99aada2760d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21413
  
**[Test build #91282 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91282/testReport)**
 for PR 21413 at commit 
[`d8f3906`](https://github.com/apache/spark/commit/d8f3906be4d4178d3c41bff41eaeb39f430ade6b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191611779
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0.
--- End diff --

Sorry.  Fixed. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91281 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91281/testReport)**
 for PR 21366 at commit 
[`5b9c00f`](https://github.com/apache/spark/commit/5b9c00fa39d1c83435ca65de5394345e5d6f1f00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191611421
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Thanks @BryanCutler for catching this stupid stuff. not at comfortable with 
python. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91273/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r191610297
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -131,211 +132,214 @@ object FilterPushdownBenchmark {
 }
 
 /*
+OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 
4.14.26-46.32.amzn1.x86_64
 Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
 Select 0 string row (value IS NULL): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized8452 / 8504  1.9 
537.3   1.0X
-Parquet Vectorized (Pushdown)  274 /  281 57.3 
 17.4  30.8X
-Native ORC Vectorized 8167 / 8185  1.9 
519.3   1.0X
-Native ORC Vectorized (Pushdown)   365 /  379 43.1 
 23.2  23.1X
+Parquet Vectorized2961 / 3123  5.3 
188.3   1.0X
+Parquet Vectorized (Pushdown) 3057 / 3121  5.1 
194.4   1.0X
--- End diff --

I have not tried it yet, but is it related to the recent change we made in 
the parquet reader?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91273/testReport)**
 for PR 21366 at commit 
[`b30ed39`](https://github.com/apache/spark/commit/b30ed39ebecc72cadfc9ec20b135d60f618762a4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191609540
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 2.4.0.
--- End diff --

sorry, this should be `.. note:: Deprecated in 2.4.0 and will be removed in 
3.0.0.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91271/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21453
  
Jenkins, add to whitelist.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21453
  
Jenkins, test this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21451
  
**[Test build #91271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91271/testReport)**
 for PR 21451 at commit 
[`68c5d5f`](https://github.com/apache/spark/commit/68c5d5f5f60da7cbc0ce356acd8e5ab31db70ea5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21454
  
LGTM



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191607288
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
   case expr @ In(v, list) if expr.inSetConvertible =>
 val newList = ExpressionSet(list).toSeq
-if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) {
+if (newList.length == 1) {
+  EqualTo(v, newList.head)
--- End diff --

This will fail, since the schema mismatches when the data type is struct. 
The test cases were added a few days ago. 
https://github.com/apache/spark/pull/21425


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/21427
  
I'm sorry for the late review, but I think the current fix is still 
behavior change..


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21413
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21413
  
**[Test build #91280 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91280/testReport)**
 for PR 21413 at commit 
[`714ab33`](https://github.com/apache/spark/commit/714ab3338f952c13c1a306b50bb967c605a38076).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91280/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3687/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21413
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Python GBT...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21413
  
**[Test build #91280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91280/testReport)**
 for PR 21413 at commit 
[`714ab33`](https://github.com/apache/spark/commit/714ab3338f952c13c1a306b50bb967c605a38076).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191602398
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0.
--- End diff --

Fixed. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread PenguinToast
Github user PenguinToast commented on the issue:

https://github.com/apache/spark/pull/21454
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91279/testReport)**
 for PR 21454 at commit 
[`badbf0e`](https://github.com/apache/spark/commit/badbf0e6766a99565e061063041f231d119d6d3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

2018-05-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21346
  
So, one thing that I was thinking about is whether it would be worth it to 
make error handling a little better here. I think this is no worse than the 
current status quo, and looking at the related PR I'm not sure how much better 
this would make things, but...

The current implementation sends a  "header" message + the streamed payload 
as a single RPC, so there's a single opportunity for the receiver to return an 
error. That means that if, for example, the receiver does not have enough space 
to store a block that is being uploaded, it can return an error, but the sender 
will still try to send all the block data to the receiver (which will just 
ignore it).

I'm wondering if it would be worth to try to implement this as a couple of 
"chained RPCs", one that sends the metadata and a second one that streams the 
data. That way the receiver can error out on the first RPC and the sender can 
just throw away the second RPC, instead of having to transfer everything.

It might create the "some state needs to be stored somewhere" problem on 
the receiver side, though. Haven't really thought that far yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3686/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21437
  
**[Test build #91277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91277/testReport)**
 for PR 21437 at commit 
[`2ea9cbc`](https://github.com/apache/spark/commit/2ea9cbc80787f1417fa4502c3c2b9b89f46d0632).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21428: [SPARK-24235][SS] Implement continuous shuffle writer fo...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21428
  
**[Test build #91278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91278/testReport)**
 for PR 21428 at commit 
[`65837ac`](https://github.com/apache/spark/commit/65837ac611991f2db7710d0657e56b222a2f5c74).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91270/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-29 Thread jose-torres
Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/21428#discussion_r191596882
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
 ---
@@ -40,22 +60,129 @@ class ContinuousShuffleReadSuite extends StreamTest {
 messages.foreach(endpoint.askSync[Unit](_))
   }
 
-  // In this unit test, we emulate that we're in the task thread where
-  // ContinuousShuffleReadRDD.compute() will be evaluated. This requires a 
task context
-  // thread local to be set.
-  var ctx: TaskContextImpl = _
+  private def readRDDEndpoint(rdd: ContinuousShuffleReadRDD) = {
+rdd.partitions(0).asInstanceOf[ContinuousShuffleReadPartition].endpoint
+  }
 
-  override def beforeEach(): Unit = {
-super.beforeEach()
-ctx = TaskContext.empty()
-TaskContext.setTaskContext(ctx)
+  private def readEpoch(rdd: ContinuousShuffleReadRDD) = {
+rdd.compute(rdd.partitions(0), ctx).toSeq.map(_.getInt(0))
   }
 
-  override def afterEach(): Unit = {
-ctx.markTaskCompleted(None)
-TaskContext.unset()
-ctx = null
-super.afterEach()
+  test("one epoch") {
--- End diff --

Reordered.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21454
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191596607
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

Right. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21454
  
**[Test build #91270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91270/testReport)**
 for PR 21454 at commit 
[`f198f28`](https://github.com/apache/spark/commit/f198f28b1a3d7380a09e5687438a264101cc6965).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

2018-05-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21427#discussion_r191596459
  
--- Diff: python/pyspark/worker.py ---
@@ -111,9 +114,16 @@ def wrapped(key_series, value_series):
 "Number of columns of the returned pandas.DataFrame "
 "doesn't match specified schema. "
 "Expected: {} Actual: {}".format(len(return_type), 
len(result.columns)))
-arrow_return_types = (to_arrow_type(field.dataType) for field in 
return_type)
-return [(result[result.columns[i]], arrow_type)
-for i, arrow_type in enumerate(arrow_return_types)]
+try:
+# Assign result columns by schema name
+return [(result[field.name], to_arrow_type(field.dataType)) 
for field in return_type]
+except KeyError:
+if all(not isinstance(name, basestring) for name in 
result.columns):
+# Assign result columns by position if they are not named 
with strings
+return [(result[result.columns[i]], 
to_arrow_type(field.dataType))
+for i, field in enumerate(return_type)]
+else:
+raise
--- End diff --

Ah, I saw you add document for this behavior. Looks good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191594326
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
--- End diff --

Do we need `u` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191594348
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191593987
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you're
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if eager_eval:
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+console_row, console_truncate, vertical)
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591921
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
--- End diff --

How about declaring those as `@property`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591799
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

What's `_support_repr_html` for?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191593927
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
--- End diff --

I guess

```python
return self._jdf.showString(
console_row, console_truncate, vertical=False)
```

should work without `vertical` variable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591455
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

We should do this in `showString`? And we can move `minimumColWidth` into 
the `showString` in that case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191595442
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -231,16 +234,17 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Compose the string representing rows for output
+   * Get rows represented in Sequence by specific truncate and vertical 
requirement.
*
-   * @param _numRows Number of rows to show
+   * @param numRows Number of rows to return
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
-   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param vertical If set to true, the rows to return don't need 
truncate.
*/
-  private[sql] def showString(
-  _numRows: Int, truncate: Int = 20, vertical: Boolean = false): 
String = {
-val numRows = _numRows.max(0).min(Int.MaxValue - 1)
--- End diff --

Don't we need to check the `numRows` range when called from 
`getRowsToPython`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191595951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
   case expr @ In(v, list) if expr.inSetConvertible =>
 val newList = ExpressionSet(list).toSeq
-if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) {
+if (newList.length == 1) {
--- End diff --

Sounds good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21450
  
This doesn't seem to be addressing the issue reported in the bug. The exact 
same error happens with your patch:

```
$ ./bin/run-example 
Exception in thread "main" java.lang.IllegalArgumentException: Missing 
application resource.
at 
org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:241)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:185)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:300)
at 
org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:166)
at org.apache.spark.launcher.Main.main(Main.java:86)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91267/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocal...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21437#discussion_r191589537
  
--- Diff: python/pyspark/taskcontext.py ---
@@ -88,3 +89,9 @@ def taskAttemptId(self):
 TaskAttemptID.
 """
 return self._taskAttemptId
+
+def getLocalProperty(self, key):
+"""
+Get a local property set upstream in the driver, or None if it is 
missing.
--- End diff --

If it's missing it will result in a `KeyError`, maybe you want `return 
self._localProperties.get(key)` which returns `None` as the default?  That 
seems better to me too, although you might want to add an optional `default` 
value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21450
  
**[Test build #91267 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91267/testReport)**
 for PR 21450 at commit 
[`a69850b`](https://github.com/apache/spark/commit/a69850b6fdcbe2e234e70a597d9ad6beae6a6937).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...

2018-05-29 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/21457
  
+1 pending sparkQa, changes look good, and I manually verified against both 
the jira use cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91275/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21427
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21427
  
**[Test build #91275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)**
 for PR 21427 at commit 
[`e322e1a`](https://github.com/apache/spark/commit/e322e1a1caa6cf422ed8b33244656345e4c13bb3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191585661
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
   case expr @ In(v, list) if expr.inSetConvertible =>
 val newList = ExpressionSet(list).toSeq
-if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) {
+if (newList.length == 1) {
+  EqualTo(v, newList.head)
+} else if (newList.size > 
SQLConf.get.optimizerInSetConversionThreshold) {
   val hSet = newList.map(e => e.eval(EmptyRow))
   InSet(v, HashSet() ++ hSet)
 } else if (newList.size < list.size) {
--- End diff --

nit: In line 235 the comment 
```// newList.length == list.length```
can be updated as 
```// newList.length == list.length && newList.length > 1```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21454: [SPARK-24337][Core] Improve error messages for Spark con...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21454
  
IIUC this PR print the config key in the error message if the config 
value(either default or get from the configMap) can't be cast properly. 
Personally I think it add some value to include this change. I only have some 
nits.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21442#discussion_r191585050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,10 +219,15 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
   case expr @ In(v, list) if expr.inSetConvertible =>
 val newList = ExpressionSet(list).toSeq
-if (newList.size > SQLConf.get.optimizerInSetConversionThreshold) {
+if (newList.length == 1) {
+  EqualTo(v, newList.head)
+} else if (newList.size > 
SQLConf.get.optimizerInSetConversionThreshold) {
--- End diff --

nit: size => length
because we use `length` in the previous `if`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21454#discussion_r191584812
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -448,6 +473,20 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
*/
   private[spark] def getenv(name: String): String = System.getenv(name)
 
+  /**
+   * Wrapper method for get*() methods which require some specific value 
format. This catches
+   * any [[NumberFormatException]] or [[IllegalArgumentException]] and 
re-raises it with the
+   * incorrectly configured key in the exception message.
+   */
+  private def catchIllegalArgument[T](key: String)(getValue: => T): T = {
--- End diff --

According to what it actually does `catchIllegalArgument` don't seems to be 
a great name for this function, maybe `catchIllegalValue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21454#discussion_r191582665
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   }
 
 
-  /** Get a parameter as an integer, falling back to a default if not set 
*/
-  def getInt(key: String, defaultValue: Int): Int = {
+  /**
+   * Get a parameter as an integer, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an integer
+   */
+  def getInt(key: String, defaultValue: Int): Int = 
catchIllegalArgument(key) {
 getOption(key).map(_.toInt).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a long, falling back to a default if not set */
-  def getLong(key: String, defaultValue: Long): Long = {
+  /**
+   * Get a parameter as a long, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an long
+   */
+  def getLong(key: String, defaultValue: Long): Long = 
catchIllegalArgument(key) {
 getOption(key).map(_.toLong).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a double, falling back to a default if not set */
-  def getDouble(key: String, defaultValue: Double): Double = {
+  /**
+   * Get a parameter as a double, falling back to a default if not ste
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an double
+   */
+  def getDouble(key: String, defaultValue: Double): Double = 
catchIllegalArgument(key) {
 getOption(key).map(_.toDouble).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a boolean, falling back to a default if not set */
-  def getBoolean(key: String, defaultValue: Boolean): Boolean = {
+  /**
+   * Get a parameter as a boolean, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an boolean
--- End diff --

nit: `an boolean` -> `a boolean`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21454#discussion_r191582611
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   }
 
 
-  /** Get a parameter as an integer, falling back to a default if not set 
*/
-  def getInt(key: String, defaultValue: Int): Int = {
+  /**
+   * Get a parameter as an integer, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an integer
+   */
+  def getInt(key: String, defaultValue: Int): Int = 
catchIllegalArgument(key) {
 getOption(key).map(_.toInt).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a long, falling back to a default if not set */
-  def getLong(key: String, defaultValue: Long): Long = {
+  /**
+   * Get a parameter as a long, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an long
+   */
+  def getLong(key: String, defaultValue: Long): Long = 
catchIllegalArgument(key) {
 getOption(key).map(_.toLong).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a double, falling back to a default if not set */
-  def getDouble(key: String, defaultValue: Double): Double = {
+  /**
+   * Get a parameter as a double, falling back to a default if not ste
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an double
--- End diff --

nit: `an double` -> `a double`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21454: [SPARK-24337][Core] Improve error messages for Sp...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21454#discussion_r191582499
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -394,23 +407,35 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   }
 
 
-  /** Get a parameter as an integer, falling back to a default if not set 
*/
-  def getInt(key: String, defaultValue: Int): Int = {
+  /**
+   * Get a parameter as an integer, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an integer
+   */
+  def getInt(key: String, defaultValue: Int): Int = 
catchIllegalArgument(key) {
 getOption(key).map(_.toInt).getOrElse(defaultValue)
   }
 
-  /** Get a parameter as a long, falling back to a default if not set */
-  def getLong(key: String, defaultValue: Long): Long = {
+  /**
+   * Get a parameter as a long, falling back to a default if not set
+   * @throws IllegalArgumentException If the value can't be interpreted as 
an long
--- End diff --

nit: `an long` -> `a long`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21457
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3685/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21457
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21413: [SPARK-23161][PYSPARK][ML]Add missing APIs to Pyt...

2018-05-29 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21413#discussion_r191581932
  
--- Diff: python/pyspark/ml/regression.py ---
@@ -619,6 +627,22 @@ def getSubsamplingRate(self):
 """
 return self.getOrDefault(self.subsamplingRate)
 
+@since("1.4.0")
+def setFeatureSubsetStrategy(self, value):
+"""
+Sets the value of :py:attr:`featureSubsetStrategy`.
+
+.. note:: Deprecated in 2.1.0 and will be removed in 3.0.0.
--- End diff --

This should technically be marked as deprecated in 2.4.0, even though the 
Scala version was before


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91269/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21409
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21409: [SPARK-24365][SQL] Add Data Source write benchmark

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21409
  
**[Test build #91269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91269/testReport)**
 for PR 21409 at commit 
[`e90fa00`](https://github.com/apache/spark/commit/e90fa00e8963eb985bdd30d9a262c61f6ca1ce61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21457: [SPARK-24414][ui] Calculate the correct number of tasks ...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21457
  
**[Test build #91276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91276/testReport)**
 for PR 21457 at commit 
[`40b6cb7`](https://github.com/apache/spark/commit/40b6cb7117598560d91bf6efb148c482eadd8daf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21457: [SPARK-24414][ui] Calculate the correct number of...

2018-05-29 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/21457

[SPARK-24414][ui] Calculate the correct number of tasks for a stage.

This change takes into account all non-pending tasks when calculating
the number of tasks to be shown. This also means that when the stage
is pending, the task table (or, in fact, most of the data in the stage
page) will not be rendered.

I also fixed the label when the known number of tasks is larger than
the recorded number of tasks (it was inverted).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-24414

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21457


commit 40b6cb7117598560d91bf6efb148c482eadd8daf
Author: Marcelo Vanzin 
Date:   2018-05-29T21:12:12Z

[SPARK-24414][ui] Calculate the correct number of tasks for a stage.

This change takes into account all non-pending tasks when calculating
the number of tasks to be shown. This also means that when the stage
is pending, the task table (or, in fact, most of the data in the stage
page) will not be rendered.

I also fixed the label when the known number of tasks is larger than
the recorded number of tasks (it was inverted).




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21427
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3684/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21453
  
Here is the issue in Scala side. https://github.com/scala/bug/issues/10913


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21427
  
**[Test build #91275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91275/testReport)**
 for PR 21427 at commit 
[`e322e1a`](https://github.com/apache/spark/commit/e322e1a1caa6cf422ed8b33244656345e4c13bb3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21453
  
I'm also looking at this issue. The challenge is that one of the hacks we 
use to initialize the Spark
before REPL sees any files was removed in Scala 2.11.12.


https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L109

We might need to work with Scala team to upgrade our Scala version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread daniel-shields
Github user daniel-shields commented on the issue:

https://github.com/apache/spark/pull/21449
  
This case can also occur when the datasets are different but share a common 
lineage. Consider the following:
`df = spark.range(10)
df1 = df.groupby('id').count()
df2 = df.groupby('id').sum('id')
df1.join(df2, df2['id'].eqNullSafe(df1['id'])).collect()`
This currently fails with eqNullSafe, but works with ==.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-29 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21390
  
Are there any other concerns over this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3683/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21437
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3549/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...

2018-05-29 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/21455
  
@jerryshao  can you review this minor update ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191567638
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/IntegrationTestBackend.scala
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.k8s.integrationtest.backend
+
+import io.fabric8.kubernetes.client.DefaultKubernetesClient
+
+import 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend
+
+private[spark] trait IntegrationTestBackend {
+  def initialize(): Unit
+  def getKubernetesClient: DefaultKubernetesClient
+  def cleanUp(): Unit = {}
+}
+
+private[spark] object IntegrationTestBackendFactory {
+  val DeployModeConfigKey = "spark.kubernetes.test.deployMode"
--- End diff --

nit: lower case `d` in the var name


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20697: [SPARK-23010][k8s] Initial checkin of k8s integra...

2018-05-29 Thread mccheah
Github user mccheah commented on a diff in the pull request:

https://github.com/apache/spark/pull/20697#discussion_r191568423
  
--- Diff: 
resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh
 ---
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
+UNPACKED_SPARK_TGZ="$TEST_ROOT_DIR/target/spark-dist-unpacked"
+IMAGE_TAG_OUTPUT_FILE="$TEST_ROOT_DIR/target/image-tag.txt"
+DEPLOY_MODE="minikube"
+IMAGE_REPO="docker.io/kubespark"
+IMAGE_TAG="N/A"
+SPARK_TGZ="N/A"
+
+# Parse arguments
+while (( "$#" )); do
+  case $1 in
+--unpacked-spark-tgz)
+  UNPACKED_SPARK_TGZ="$2"
+  shift
+  ;;
+--image-repo)
+  IMAGE_REPO="$2"
+  shift
+  ;;
+--image-tag)
+  IMAGE_TAG="$2"
+  shift
+  ;;
+--image-tag-output-file)
+  IMAGE_TAG_OUTPUT_FILE="$2"
+  shift
+  ;;
+--deploy-mode)
+  DEPLOY_MODE="$2"
+  shift
+  ;;
+--spark-tgz)
+  SPARK_TGZ="$2"
+  shift
+  ;;
+*)
+  break
+  ;;
+  esac
+  shift
+done
+
+if [[ $SPARK_TGZ == "N/A" ]];
+then
+  echo "Must specify a Spark tarball to build Docker images against with 
--spark-tgz." && exit 1;
--- End diff --

Can we just use the repository and not require a tarball?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21437: [SPARK-24397][PYSPARK] Added TaskContext.getLocalPropert...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21437
  
**[Test build #91274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91274/testReport)**
 for PR 21437 at commit 
[`9d95c12`](https://github.com/apache/spark/commit/9d95c12a0ada0520f426723406a7d99aada2760d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3549/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3682/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S] Use the Kubernetes API to populate an...

2018-05-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-05-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21449
  
@daniel-shields in that case you have 2 different datasets `df1` and `df2`. 
So they are 2 distinct attributes and the check `a.sameRef(b)` would return 
false. This is applied only in case you have self-joins, ie. you have the same 
dataset on both sides.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21403: [SPARK-24341][WIP][SQL] Support IN subqueries with struc...

2018-05-29 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21403
  
@juliuszsompolski yes, you're right, sorry, SPARK-24395 uses literal and 
not subqueries, sorry.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21450: [SPARK-24319][SPARK SUBMIT] Fix spark-submit execution w...

2018-05-29 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/21450
  
cc @vanzin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >