[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150897857
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150899254
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150899253
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150899252
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150900046
  
**[Test build #44315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/consoleFull)**
 for PR 9270 at commit 
[`5e8efea`](https://github.com/apache/spark/commit/5e8efeacdf35df7281224338866a9b18207fd27f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150900013
  
**[Test build #44314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/consoleFull)**
 for PR 9269 at commit 
[`2822191`](https://github.com/apache/spark/commit/2822191e1c270237ca085757721c9746ad9b5734).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `
logError(\"Sink class \" + classPath + \" cannot be instantiated\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150900038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150900040
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900314
  
**[Test build #44316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)**
 for PR 8840 at commit 
[`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150900669
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150900670
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150900664
  
**[Test build #44313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)**
 for PR 9180 at commit 
[`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9267#issuecomment-150897568
  
**[Test build #44311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/consoleFull)**
 for PR 9267 at commit 
[`81f667a`](https://github.com/apache/spark/commit/81f667a4537b60071cb1888ca88aa4bd0734ad2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class FPGrowthModel[Item: ClassTag: TypeTag] @Since(\"1.3.0\") (`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/9270

[SPARK-9162][SQL] Implement code generation for ScalaUDF

JIRA: https://issues.apache.org/jira/browse/SPARK-9162

Currently ScalaUDF extends CodegenFallback and doesn't provide code 
generation implementation. This path implements code generation for ScalaUDF.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 scalaudf-codegen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9270.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9270


commit 5e8efeacdf35df7281224338866a9b18207fd27f
Author: Liang-Chi Hsieh 
Date:   2015-10-25T08:00:31Z

Add codegen support for ScalaUDF.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150900205
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150900204
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42942564
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

A base path is not always associated with a `PartitionValues`. If there is 
no partition, we can still have a base path.

That is why I don't make `case class PartitionValues(columnNames: 
Seq[String], literals: Seq[Literal])` to something like `case class 
PartitionValues(path: String, columnNames: Seq[String], literals: 
Seq[Literal])`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread KaiXinXiaoLei
GitHub user KaiXinXiaoLei opened a pull request:

https://github.com/apache/spark/pull/9268

[SPARK-11298] When driver sends message "GetExecutorLossReason" to AM, the 
SparkContext may be stop

I get lastest code form github, and just run "bin/spark-shell --master yarn 
--conf spark.dynamicAllocation.enabled=true --conf 
spark.dynamicAllocation.initialExecutors=1 --conf 
spark.shuffle.service.enabled=true". There is error infor:
15/10/25 12:11:02 ERROR TransportChannelHandler: Connection to 
/9.96.1.113:35066 has been quiet for 12 ms while there are outstanding 
requests. Assuming connection is dead; please adjust spark.network.timeout if 
this is wrong.
15/10/25 12:11:02 ERROR TransportResponseHandler: Still have 1 requests 
outstanding when connection from vm113/9.96.1.113:35066 is closed
15/10/25 12:11:02 WARN NettyRpcEndpointRef: Ignore message 
Failure(java.io.IOException: Connection from vm113/9.96.1.113:35066 closed)
15/10/25 12:11:02 ERROR YarnScheduler: Lost executor 1 on vm111: Slave lost

From log, when driver sends message "GetExecutorLossReason" to AM, the 
error appears. From code, i think AM gets this message, should reply.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KaiXinXiaoLei/spark replayAM

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9268


commit cf661ded1f864d9bb75293a63f758640c847010f
Author: KaiXinXiaoLei 
Date:   2015-10-25T06:33:01Z

reply




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150896693
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9267#issuecomment-150897588
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44311/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6724] [MLlib] Support model save/load f...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9267#issuecomment-150897587
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150899436
  
**[Test build #44314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44314/consoleFull)**
 for PR 9269 at commit 
[`2822191`](https://github.com/apache/spark/commit/2822191e1c270237ca085757721c9746ad9b5734).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150899445
  
**[Test build #44312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/consoleFull)**
 for PR 9268 at commit 
[`cf661de`](https://github.com/apache/spark/commit/cf661ded1f864d9bb75293a63f758640c847010f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900104
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900105
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150900439
  
@rxin I am curious that although I don't observe significant performance 
improvement from a simple projection + filter operation by now with simple 
experiment, by making this filters pushed down to Parquet side, do we retrieve 
less data and reduce the memory footprint? If so, even under the same 
performance level, is this patch still worth merging?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOCS] Fix link to Scala DataFram...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150898011
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299] Fix link to Scala DataFrame Func...

2015-10-25 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/9269

[SPARK-11299] Fix link to Scala DataFrame Functions reference

The SQL programming guide's link to the DataFrame functions reference 
points to the wrong location; this patch fixes that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-11299

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9269.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9269


commit 2822191e1c270237ca085757721c9746ad9b5734
Author: Josh Rosen 
Date:   2015-10-25T07:04:22Z

Fix link to Scala DataFrame functions reference




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150899440
  
**[Test build #44313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44313/consoleFull)**
 for PR 9180 at commit 
[`0a43033`](https://github.com/apache/spark/commit/0a4303356455f28ca3b87ffd446cb5ef5f25d0e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150899665
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150899670
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11298] When driver sends message "GetEx...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9268#issuecomment-150900188
  
**[Test build #44312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44312/consoleFull)**
 for PR 9268 at commit 
[`cf661de`](https://github.com/apache/spark/commit/cf661ded1f864d9bb75293a63f758640c847010f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `
logError(\"Sink class \" + classPath + \" cannot be instantiated\")`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150904443
  
If we don't observe performance improvements, it's definitely not worth it. 
Can you post your how you measured it, and performance results? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150905113
  
How does pushdown avoid OOM?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150904969
  
ok. Thanks. Because we found that with pushdown filters, we can avoid the 
OOM problem when processing large data in our daily usage. I am wondering if it 
is helpful to others too.

I will post the the performance test later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943239
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Using scalaUDFClassName should not work because it is just `ScalaUDF`'s 
class name. I will try the static array approach later. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42942860
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

can we put these branches in a loop?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943031
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Hmm, part of this can be reduced like you show. But seems we still have a 
(smaller) pattern matching. I will do it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943173
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

But I think we don't have callFunc? I just created callFunc later which is 
the code used to invoke the function in codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11277] [SQL] sort_array throws exceptio...

2015-10-25 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/9247#issuecomment-150918873
  
Hi @jliwork , thanks for working on it!
But sorting on array of null type doesn't make sense to me, can you check 
the behaviour of other SQL system like Hive? And how about struct type? It's 
also order-able.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

2015-10-25 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r42945007
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, 
"alice")
+  fail(s"Expected an exception, got the token $token")
+}
+val inner = e.getCause
+if (inner == null) {
+  fail("No inner cause", e)
+}
+if (!inner.isInstanceOf[HiveException]) {
+  fail(s"Not a hive exception", inner)
--- End diff --

I want to include the inner exception if it's of the wrong type, so that 
the Junit/jenkins report can diagnose the failure. an 
`assert(inner.isInstanceOf[HiveException])` will say when the exception was of 
the wrong type, but not include the details, including the stack trace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9269#issuecomment-150905138
  
Thanks - I've merged it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42942936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

You meant using a script to generate it like the `f`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11299][DOC] Fix link to Scala DataFrame...

2015-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943369
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

If you meant `function: AnyRef` in `ScalaUDF`, it is the user-defined 
function. `function.getClass.getName` will get something like ` 
org.apache.spark.sql.UDFSuite$$anonfun$10$$anonfun$apply$mcV$sp$12` in 
`UDFSuite`. So I think it should not work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911391
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911390
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911363
  
**[Test build #44316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)**
 for PR 8840 at commit 
[`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

2015-10-25 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r42944993
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
@@ -142,6 +145,104 @@ class YarnSparkHadoopUtil extends SparkHadoopUtil {
 val containerIdString = 
System.getenv(ApplicationConstants.Environment.CONTAINER_ID.name())
 ConverterUtils.toContainerId(containerIdString)
   }
+
+  /**
+   * Obtains token for the Hive metastore, using the current user as the 
principal.
+   * Some exceptions are caught and downgraded to a log message.
+   * @param conf hadoop configuration; the Hive configuration will be 
based on this
+   * @return a token, or `None` if there's no need for a token (no 
metastore URI or principal
+   * in the config), or if a binding exception was caught and 
downgraded.
+   */
+  def obtainTokenForHiveMetastore(conf: Configuration): 
Option[Token[DelegationTokenIdentifier]] = {
+try {
+  obtainTokenForHiveMetastoreInner(conf, 
UserGroupInformation.getCurrentUser().getUserName)
+} catch {
+  case e: NoSuchMethodException =>
+logInfo("Hive Method not found", e)
+None
+  case e: ClassNotFoundException =>
--- End diff --

+1. I'd left it in there as it may have had a valid reason for being there, 
but i do things it's correct. Detecting config problems, that is something to 
throw up.

Note that `Client.obtainTokenForHBase()` has similar behaviour; this patch 
doesn't address it. When someone sits down to do it, the policy about how to 
react to failures could be converted into a wrapper around a closure which 
executes the token retrieval (here `obtainTokenForHiveMetastoreInner`), so 
there'd be no divergence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150905267
  
**[Test build #44317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)**
 for PR 9180 at commit 
[`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42942985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Maybe I'm missing something, but I think you can just write a loop instead 
of having branches?

```
...
val funcClassName = callFunc.getClass.getName
...
val evals = children.map(_.gen(ctx))
...
// generate callFunc
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150905272
  
Is that the case? I thought we load them one by one (or small batch at a 
time) and then apply the filter directly on them?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11297] Add new code tags

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9265#issuecomment-150905287
  
Can you post a before and after screenshot?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150905206
  
Because we can pre-filtering the data? Without pushdown, the whole data 
will be loaded into memory and then has been filtered later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11279] [PYSPARK] Add DataFrame#toDF in ...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9248#discussion_r42943011
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1266,6 +1266,17 @@ def drop(self, col):
 raise TypeError("col should be a string or a Column")
 return DataFrame(jdf, self.sql_ctx)
 
+def toDF(self, *cols):
--- End diff --

I think you need to add the ignore utf8 prefix annotation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943320
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

What about function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread Lewuathe
Github user Lewuathe commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150917384
  
@dbtsai Sorry for bothering many times but could check again please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

2015-10-25 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r42944947
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, 
"alice")
+  fail(s"Expected an exception, got the token $token")
--- End diff --

wanted to include any token returned in the assert failure, on the basis 
that if something came back, it would be useful to find out what went wrong. 
`intercept`, just like JUnit's `@Test(expected=)` feature, picks up on the 
failure to raise the specific exception, but doesn't appear to say much else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150904892
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150904896
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6428][SQL] Removed unnecessary typecast...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9262#issuecomment-150905300
  
Thanks - I've merged this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10895][SPARK-11164][SQL] Push down InSe...

2015-10-25 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8956#issuecomment-150905344
  
Hmm, I am not sure about that. Because I supposed that Parquet relation 
will read all data first if no pushdown filters are applied. Then Spark's 
`Filter` operation will be applied later. Maybe @liancheng can answer this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11292][SQL] Python API for text data so...

2015-10-25 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/9259#issuecomment-150905342
  
cc @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6428][SQL] Removed unnecessary typecast...

2015-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943091
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

For that one, can you just do callFunc.getClass.getName?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Ah sorry. Can we just use scalaUDFClassName?

If not, we can create a static array in the beginning that covers all 22 
versions of this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150911125
  
**[Test build #44317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/consoleFull)**
 for PR 9180 at commit 
[`59383fd`](https://github.com/apache/spark/commit/59383fd41f1d6b96274c564eb2fb7c96f5ab07e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150911095
  
**[Test build #44315 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/consoleFull)**
 for PR 9270 at commit 
[`5e8efea`](https://github.com/apache/spark/commit/5e8efeacdf35df7281224338866a9b18207fd27f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Why wouldn't it work? Isn't that better because we can even specialize it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150911128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44315/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/9270#discussion_r42943415
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -959,6 +963,861 @@ case class ScalaUDF(
   }
   }
 
+  // Generate codes used to convert the arguments to Scala type for 
user-defined funtions
+  private[this] def genCodeForConverter(ctx: CodeGenContext, index: Int): 
String  = {
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+val scalaUDFClassName = classOf[ScalaUDF].getName
+
+val converterTerm = ctx.freshName("converter" + index.toString)
+ctx.addMutableState(converterClassName, converterTerm,
+  s"this.$converterTerm = 
($converterClassName)$typeConvertersClassName.createToScalaConverter(((${expressionClassName})((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).getChildren().apply($index))).dataType());")
+converterTerm
+  }
+
+  override def genCode(
+  ctx: CodeGenContext,
+  ev: GeneratedExpressionCode): String = {
+
+ctx.references += this
+
+val scalaUDFClassName = classOf[ScalaUDF].getName
+val converterClassName = classOf[Any => Any].getName
+val typeConvertersClassName = CatalystTypeConverters.getClass.getName 
+ ".MODULE$"
+val expressionClassName = classOf[Expression].getName
+
+// Generate codes used to convert the returned value of user-defined 
functions to Catalyst type
+val catalystConverterTerm = ctx.freshName("catalystConverter")
+ctx.addMutableState(converterClassName, catalystConverterTerm,
+  s"this.$catalystConverterTerm = 
($converterClassName)$typeConvertersClassName.createToCatalystConverter((($scalaUDFClassName)expressions[${ctx.references.size
 - 1}]).dataType());")
+
+val resultTerm = ctx.freshName("result")
+
+val (evalCode, callFunc) = children.size match {
--- End diff --

Hmm, you are right. I should have it a try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150911153
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11207][ML] Add test cases for solver se...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9180#issuecomment-150911155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44317/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9162][SQL] Implement code generation fo...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9270#issuecomment-150911127
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...

2015-10-25 Thread felixb
Github user felixb commented on the pull request:

https://github.com/apache/spark/pull/8639#issuecomment-150916898
  
Is there anything else I can do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11265] [YARN] [WIP] YarnClient can't ge...

2015-10-25 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/9232#discussion_r42944952
  
--- Diff: 
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala 
---
@@ -245,4 +247,28 @@ class YarnSparkHadoopUtilSuite extends SparkFunSuite 
with Matchers with Logging
   System.clearProperty("SPARK_YARN_MODE")
 }
   }
+
+  test("Obtain tokens For HiveMetastore") {
+val hadoopConf = new Configuration()
+hadoopConf.set("hive.metastore.kerberos.principal", "bob")
+// thrift picks up on port 0 and bails out, without trying to talk to 
endpoint
+hadoopConf.set("hive.metastore.uris", "http://localhost:0;)
+val util = new YarnSparkHadoopUtil
+val e = intercept[InvocationTargetException] {
+  val token = util.obtainTokenForHiveMetastoreInner(hadoopConf, 
"alice")
+  fail(s"Expected an exception, got the token $token")
+}
+val inner = e.getCause
+if (inner == null) {
+  fail("No inner cause", e)
--- End diff --

good point


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-25 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9226#issuecomment-150980439
  
@liancheng My concern is that if we do not lowercase those partition names, 
Hive will not be able to read those partition dirs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10622] [core] [yarn] Differentiate dead...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8887#issuecomment-150992179
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10622] [core] [yarn] Differentiate dead...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8887#issuecomment-150992191
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10342] [SQL] [WIP] Cooperative memory m...

2015-10-25 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9241#discussion_r42953420
  
--- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -227,62 +238,147 @@ public BytesToBytesMap(
*/
   public int numElements() { return numElements; }
 
-  public static final class BytesToBytesMapIterator implements 
Iterator {
+  public final class BytesToBytesMapIterator implements Iterator 
{
 
-private final int numRecords;
-private final Iterator dataPagesIterator;
+private int numRecords;
 private final Location loc;
 
 private MemoryBlock currentPage = null;
-private int currentRecordNumber = 0;
+private int recordsInPage = 0;
 private Object pageBaseObject;
 private long offsetInPage;
 
 // If this iterator destructive or not. When it is true, it frees each 
page as it moves onto
 // next one.
 private boolean destructive = false;
-private BytesToBytesMap bmap;
 
-private BytesToBytesMapIterator(
-int numRecords, Iterator dataPagesIterator, Location 
loc,
-boolean destructive, BytesToBytesMap bmap) {
+private LinkedList spillWriters =
+  new LinkedList();
+private UnsafeSorterSpillReader reader = null;
+
+private BytesToBytesMapIterator(int numRecords, Location loc, boolean 
destructive) {
   this.numRecords = numRecords;
-  this.dataPagesIterator = dataPagesIterator;
   this.loc = loc;
   this.destructive = destructive;
-  this.bmap = bmap;
-  if (dataPagesIterator.hasNext()) {
-advanceToNextPage();
-  }
+  destructiveIterator = this;
 }
 
 private void advanceToNextPage() {
-  if (destructive && currentPage != null) {
-dataPagesIterator.remove();
-this.bmap.taskMemoryManager.freePage(currentPage);
-this.bmap.shuffleMemoryManager.release(currentPage.size());
+  synchronized (this) {
+int nextIdx = dataPages.indexOf(currentPage) + 1;
+if (destructive && currentPage != null) {
+  dataPages.remove(currentPage);
+  taskMemoryManager.freePage(currentPage);
+  shuffleMemoryManager.release(currentPage.size());
+  nextIdx --;
+}
+if (dataPages.size() > nextIdx) {
+  currentPage = dataPages.get(nextIdx);
+  pageBaseObject = currentPage.getBaseObject();
+  offsetInPage = currentPage.getBaseOffset();
+  recordsInPage = Platform.getInt(pageBaseObject, offsetInPage);
+  offsetInPage += 4;
+} else {
+  currentPage = null;
+  try {
+reader = spillWriters.removeFirst().getReader(blockManager);
+recordsInPage = -1;
+  } catch (IOException e) {
+// Scala iterator does not handle exception
+Platform.throwException(e);
+  }
+}
   }
-  currentPage = dataPagesIterator.next();
-  pageBaseObject = currentPage.getBaseObject();
-  offsetInPage = currentPage.getBaseOffset();
 }
 
 @Override
 public boolean hasNext() {
-  return currentRecordNumber != numRecords;
+  return numRecords > 0;
 }
 
 @Override
 public Location next() {
-  int totalLength = Platform.getInt(pageBaseObject, offsetInPage);
-  if (totalLength == END_OF_PAGE_MARKER) {
+  if (recordsInPage == 0) {
 advanceToNextPage();
-totalLength = Platform.getInt(pageBaseObject, offsetInPage);
   }
-  loc.with(currentPage, offsetInPage);
-  offsetInPage += 4 + totalLength;
-  currentRecordNumber++;
-  return loc;
+  numRecords --;
+  if (currentPage != null) {
+int totalLength = Platform.getInt(pageBaseObject, offsetInPage);
+loc.with(currentPage, offsetInPage);
+offsetInPage += 4 + totalLength;
+recordsInPage --;
+return loc;
+  } else {
+assert(reader != null);
+if (!reader.hasNext()) {
+  advanceToNextPage();
+}
+try {
+  reader.loadNext();
+} catch (IOException e) {
+  // Scala iterator does not handle exception
+  Platform.throwException(e);
+}
+loc.with(reader.getBaseObject(), reader.getBaseOffset(), 
reader.getRecordLength());
+return loc;
+  }
+}
+
+public long spill(long numBytes) throws IOException {
+  synchronized (this) {
+if (!destructive || dataPages.size() == 1) {
+  

[GitHub] spark pull request: [SPARK-11272][Core][UI][WIP] Support importing...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9238#issuecomment-150995352
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11272][Core][UI][WIP] Support importing...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9238#issuecomment-150995368
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11306] Fix hang when JVM exits.

2015-10-25 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/9273#issuecomment-150995282
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8652#discussion_r42954743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -268,6 +268,27 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
   object CartesianProduct extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+  // Not like the equal-join, BroadcastNestedLoopJoin doesn't support 
condition
+  // for cartesian join, as in cartesian join, probably, the records 
satisfy the
+  // condition, but exists in another partition of the large table, so 
we may not able
+  // to eliminate the duplicates.
--- End diff --

Yes, the comment is stale. 

If we restrict the outer join condition as `None` here, then it's more like 
a `CartesianProduct`, that's why I put the rule in the `CartesianProduct`, and 
more importantly, we'd like to take those 2 rules as higher priority than the 
rule in Line 292.

I am totally agree with you to combine the `CartesianProduct` and 
`BroadcastNestedLoopJoin`, as the later just a special case of former.

Will update the code soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11178] Improving naming around task fai...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9164#issuecomment-151007358
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44322/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11178] Improving naming around task fai...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9164#issuecomment-151007357
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9238#issuecomment-151009628
  
**[Test build #44324 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44324/consoleFull)**
 for PR 9238 at commit 
[`af8b3cb`](https://github.com/apache/spark/commit/af8b3cb03f89d2b05e2566371d44405c5f8237d3).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9238#issuecomment-151009668
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44324/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8652#issuecomment-151009666
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8652#issuecomment-151009673
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8652#discussion_r42957158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -295,8 +295,21 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
   }
 
 
-  object BroadcastNestedLoopJoin extends Strategy {
+  object CartesianProduct extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
+  case logical.Join(
+CanBroadcast(left), right, joinType, condition) if joinType != 
LeftSemiJoin =>
+execution.joins.BroadcastNestedLoopJoin(
+  planLater(left), planLater(right), joins.BuildLeft, joinType, 
condition) :: Nil
+  case logical.Join(
+left, CanBroadcast(right), joinType, condition) if joinType != 
LeftSemiJoin =>
+execution.joins.BroadcastNestedLoopJoin(
+  planLater(left), planLater(right), joins.BuildRight, joinType, 
condition) :: Nil
+  case logical.Join(left, right, _, None) =>
+execution.joins.CartesianProduct(planLater(left), 
planLater(right)) :: Nil
+  case logical.Join(left, right, Inner, Some(condition)) =>
+execution.Filter(condition,
+  execution.joins.CartesianProduct(planLater(left), 
planLater(right))) :: Nil
   case logical.Join(left, right, joinType, condition) =>
 val buildSide =
   if (right.statistics.sizeInBytes <= left.statistics.sizeInBytes) 
{
--- End diff --

Yes, I think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11272][Core][UI] Support importing and ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9238#issuecomment-151009667
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8652#discussion_r42957141
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala ---
@@ -44,8 +44,7 @@ class SparkPlanner(val sqlContext: SQLContext) extends 
SparkStrategies {
   EquiJoinSelection ::
   InMemoryScans ::
   BasicOperators ::
-  CartesianProduct ::
-  BroadcastNestedLoopJoin :: Nil)
+  CartesianProduct :: Nil)
--- End diff --

Any suggestion for the name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10484][SQL] Optimize the cartesian join...

2015-10-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/8652#discussion_r42958162
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanner.scala ---
@@ -44,8 +44,7 @@ class SparkPlanner(val sqlContext: SQLContext) extends 
SparkStrategies {
   EquiJoinSelection ::
   InMemoryScans ::
   BasicOperators ::
-  CartesianProduct ::
-  BroadcastNestedLoopJoin :: Nil)
+  CartesianProduct :: Nil)
--- End diff --

how about `NonEquiJoinSelection`? w.r.t `EquiJoinSelection`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10562][SQL] support mixed case partitio...

2015-10-25 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9226#issuecomment-151017266
  
Had a discussion with @liancheng and @cloud-fan. We think this fix is 
better than #9251. When a data source table is partitioned, we will not save 
data in a Hive-compatible way. So, let's make sure we can read data back. 
Later, when we want to save partitioned data in a hive compatible way, we can 
discuss the right way for that in that JIRA (we may want to add a flag for it 
since the saved metadata may be quite different).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10891][STREAMING][KINESIS] Add MessageH...

2015-10-25 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/8954#issuecomment-151019368
  
Great. I will merge this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10984] Simplify *MemoryManager class st...

2015-10-25 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/9127#issuecomment-151020332
  
Woohoo, this passes tests!

There are still a few minor follow-up tasks that I'd like to do for this, 
but I'm going to defer them to separate patches: this patch is fairly large and 
has conflicts with several other memory-management-related patches that are 
in-flight or which will be opened soon. @andrewor14, if you have any post-hoc 
review comments, I'll handle them in a followup. @davies, this should unblock 
your open patch.

Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10891][STREAMING][KINESIS] Add MessageH...

2015-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8954


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-8582][Core]Optimize checkpointing to av...

2015-10-25 Thread ryan-williams
Github user ryan-williams commented on a diff in the pull request:

https://github.com/apache/spark/pull/9258#discussion_r42951707
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -258,6 +258,16 @@ abstract class RDD[T: ClassTag](
* subclasses of RDD.
*/
   final def iterator(split: Partition, context: TaskContext): Iterator[T] 
= {
+if (!isCheckpointedAndMaterialized) {
--- End diff --

just out of curiosity, any reason not to do an `if`/`else` here?
```
if (!isCheckpointedAndMaterialized && 
checkpointData.exists(_.isInstanceOf[ReliableRDDCheckpointData[T]])) {
  SparkEnv.get.checkpointMananger.getOrCompute(
this, checkpointData.get.asInstanceOf[ReliableRDDCheckpointData[T]], 
split, context)
} else {
  computeOrReadCache(split, context)
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11306] Fix hang when JVM exits.

2015-10-25 Thread kayousterhout
GitHub user kayousterhout opened a pull request:

https://github.com/apache/spark/pull/9273

[SPARK-11306] Fix hang when JVM exits.

This commit fixes a bug where, in Standalone mode, if a task fails and 
crashes the JVM, the
failure is considered a "normal failure" (meaning it's considered unrelated 
to the task), so
the failure isn't counted against the task's maximum number of failures:

https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
As a result, if a task fails in a way that results in it crashing the JVM, 
it will continuously be
re-launched, resulting in a hang. This commit fixes that problem.

This bug was introduced by #8007; @andrewor14 @mcchea @vanzin can you take 
a look at this?

This error is hard to trigger because we handle executor losses through 2 
code paths (the second is via Akka, where Akka notices that the executor 
endpoint is disconnected).  In my setup, the Akka code path completes first, 
and doesn't have this bug, so things work fine (see my recent email to the dev 
list about this).  If I manually disable the Akka code path, I can see the hang 
(and this commit fixes the issue).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kayousterhout/spark-1 SPARK-11306

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9273


commit 42a1defca0b2f0c9558b6ad8d24c6b1eb389ea10
Author: Kay Ousterhout 
Date:   2015-10-25T23:46:20Z

[SPARK-11306] Fix hang when JVM exits.

This commit fixes a bug where, in Standalone mode, if a task fails and 
crashes the JVM, the
failure is considered a "normal failure" (meaning it's considered unrelated 
to the task), so
the failure isn't counted against the task's maximum number of failures:

https://github.com/apache/spark/commit/af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0#diff-a755f3d892ff2506a7aa7db52022d77cL138.
As a result, if a task fails in a way that results in it crashing the JVM, 
it will continuously be
re-launched, resulting in a hang. This commit fixes that problem.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >