date:20170714

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-07-14 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16441
  
Should be in 2.2.0
On Sat, 15 Jul 2017 at 07:54, yonglyhoo  wrote:

> In which release this fix is going to be available? Thanks!
>
> â
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16441: [SPARK-14975][ML] Fixed GBTClassifier to predict probabi...

2017-07-14 Thread yonglyhoo

Github user yonglyhoo commented on the issue:

https://github.com/apache/spark/pull/16441
  
In which release this fix is going to be available? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18618
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18618
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79631/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18618
  
**[Test build #79631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79631/testReport)**
 for PR 18618 at commit 
[`eaa910d`](https://github.com/apache/spark/commit/eaa910dfdb31448724384352674440232fb584b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames i...

2017-07-14 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18618#discussion_r127576287
  
--- Diff: python/pyspark/sql/types.py ---
@@ -562,6 +562,16 @@ def jsonValue(self):
 def fromJson(cls, json):
 return StructType([StructField.fromJson(f) for f in 
json["fields"]])
 
+def fieldNames(self):
+"""
+Returns all field names in a list.
+
+>>> struct = StructType([StructField("f1", StringType(), True)])
+>>> struct.fieldNames()
+['f1']
+"""
+return list(self.names)
--- End diff --

Just to note that this `list` call is required to make a copy to prevent an 
unexpected behaviour described in the PR description by manipulating this names.

```python
>>> df = spark.range(1)
>>> a = df.schema.fieldNames()
>>> b = df.schema.names
>>> df.schema.names[0] = "a"
>>> a
['id']
>>> b
['a']
>>> a[0] = ""
>>> a
['']
>>> b
['a']
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18618
  
**[Test build #79631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79631/testReport)**
 for PR 18618 at commit 
[`eaa910d`](https://github.com/apache/spark/commit/eaa910dfdb31448724384352674440232fb584b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18618
  
Either way is fine to me. Let me update this to return a list. I was just 
thinking struct/row are a tuple-like and the output for this could be as so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-07-14 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/17980
  
If new test cases works for existing orc component, how about updating test 
cases at first?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18571: [SPARK-21344][SQL] BinaryType comparison does sig...

2017-07-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18571: [SPARK-21344][SQL] BinaryType comparison does signed byt...

2017-07-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18571
  
Thanks! Merging to master/2.2/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18640
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18640
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79627/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18640
  
**[Test build #79627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)**
 for PR 18640 at commit 
[`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79626/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18633
  
**[Test build #79626 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79626/testReport)**
 for PR 18633 at commit 
[`9a4f012`](https://github.com/apache/spark/commit/9a4f01217b7ff6219097fe7f4080ecc51125).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18639: [SPARK-21408][core] Better default number of RPC dispatc...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18639
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79624/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18639: [SPARK-21408][core] Better default number of RPC dispatc...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18639
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18639: [SPARK-21408][core] Better default number of RPC dispatc...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18639
  
**[Test build #79624 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79624/testReport)**
 for PR 18639 at commit 
[`85029e0`](https://github.com/apache/spark/commit/85029e0a1c4347259997cae7e85bf6d432f83385).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18281
  
**[Test build #79630 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79630/testReport)**
 for PR 18281 at commit 
[`a95a8af`](https://github.com/apache/spark/commit/a95a8af2073b29aac751ae58489b737a3d7a39ae).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18281
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79630/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18281
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18281
  
**[Test build #79630 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79630/testReport)**
 for PR 18281 at commit 
[`a95a8af`](https://github.com/apache/spark/commit/a95a8af2073b29aac751ae58489b737a3d7a39ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18637
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18637
  
**[Test build #79621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79621/testReport)**
 for PR 18637 at commit 
[`4b20f82`](https://github.com/apache/spark/commit/4b20f822c9d76a2168fc4fe8094d8f22eecd5d93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...

2017-07-14 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18487
  
`maxReqsInFlight` and `maxBytesInFlight` is hard to control the # of blocks 
in a single request. When # of map is very high, this change can alleviate the 
pressure of shuffle server. 
@dhruve what is the proper value for `maxBlocksInFlightPerAddress`? I like 
this pr if there is no performance issue. It will be great if you can post some 
benchmark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18571: [SPARK-21344][SQL] BinaryType comparison does signed byt...

2017-07-14 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18571
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18633: [SPARK-21411][YARN] Lazily create FS within kerbe...

2017-07-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18633#discussion_r127565024
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -42,7 +42,7 @@ import org.apache.spark.internal.Logging
 private[spark] class HadoopDelegationTokenManager(
 sparkConf: SparkConf,
 hadoopConf: Configuration,
-fileSystems: Set[FileSystem])
+fileSystems: () => Set[FileSystem])
--- End diff --

Ohh, I see, that looks like another issue, let me change the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16571: [SPARK-19208][ML][WIP] MaxAbsScaler and MinMaxScaler are...

2017-07-14 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/16571
  
This PR is very similar to my early PR. Is that right? @jkbradley #14950 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18630#discussion_r127562278
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala ---
@@ -66,4 +75,50 @@ object DriverWrapper {
 System.exit(-1)
 }
   }
+
+  // R or Python are not supported in cluster mode so just get jars and 
files for the driver
+  private def setupDependencies(loader: MutableURLClassLoader, userJar: 
String): Unit = {
+
+var packagesExclusions = sys.props.get("spark.jars.excludes").orNull
+var packages = sys.props.get("spark.jars.packages").orNull
+var repositories = sys.props.get("spark.jars.repositories").orNull
+val hadoopConf = new HadoopConfiguration()
+val childClasspath = new ArrayBuffer[String]()
+var jars = sys.props.get("spark.jars").orNull
+var files = sys.props.get("spark.files").orNull
+var ivyRepoPath = sys.props.get("spark.jars.ivy").orNull
+
+val exclusions: Seq[String] =
+  if (!StringUtils.isBlank(packagesExclusions)) {
+packagesExclusions.split(",")
+  } else {
+Nil
+  }
+
+// Create the IvySettings, either load from file or build defaults
+val ivySettings = sys.props.get("spark.jars.ivySettings").map { 
ivySettingsFile =>
+  SparkSubmitUtils.loadIvySettings(ivySettingsFile, 
Option(repositories),
+Option(ivyRepoPath))
+}.getOrElse {
+  SparkSubmitUtils.buildIvySettings(Option(repositories), 
Option(ivyRepoPath))
+}
+
+val resolvedMavenCoordinates = 
SparkSubmitUtils.resolveMavenCoordinates(packages,
+  ivySettings, exclusions = exclusions)
+
+if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
+  jars = SparkSubmit.mergeFileLists(jars, resolvedMavenCoordinates)
+}
+
+// filter out the user jar
+jars = 
jars.split(",").filterNot(_.contains(userJar.split("/").last)).mkString(",")
+jars = Option(jars).map(SparkSubmit.downloadFileList(_, 
hadoopConf)).orNull
--- End diff --

The API interface is changed, this is not compilable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18630#discussion_r127561962
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -473,6 +474,12 @@ object SparkSubmit extends CommandLineUtils {
   OptionAssigner(args.driverExtraLibraryPath, ALL_CLUSTER_MGRS, 
ALL_DEPLOY_MODES,
 sysProp = "spark.driver.extraLibraryPath"),
 
+  // Standalone only - propagate attributes for dependency resolution 
at the driver side
+  OptionAssigner(args.packages, STANDALONE, CLUSTER, sysProp = 
"spark.jars.packages"),
--- End diff --

You can merge this with mesos cluster `OptionAssigner` like 
`OptionAssigner(args.packages, STANDALONE | MESOS, CLUSTER, sysProp = 
"spark.jars.packages")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-14 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18630#discussion_r127563323
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala ---
@@ -66,4 +75,50 @@ object DriverWrapper {
 System.exit(-1)
 }
   }
+
+  // R or Python are not supported in cluster mode so just get jars and 
files for the driver
+  private def setupDependencies(loader: MutableURLClassLoader, userJar: 
String): Unit = {
+
+var packagesExclusions = sys.props.get("spark.jars.excludes").orNull
+var packages = sys.props.get("spark.jars.packages").orNull
+var repositories = sys.props.get("spark.jars.repositories").orNull
+val hadoopConf = new HadoopConfiguration()
+val childClasspath = new ArrayBuffer[String]()
+var jars = sys.props.get("spark.jars").orNull
--- End diff --

Here I think `jars` and `files` can only be remote resources, otherwise how 
can remote driver visit SparkSubmit local resources? So I think some defensive 
code may be necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18616
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18616
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79629/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18616
  
**[Test build #79629 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79629/testReport)**
 for PR 18616 at commit 
[`545ae9a`](https://github.com/apache/spark/commit/545ae9a71bfea6d7093ca47df827c8f1fa7aad28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18633: [SPARK-21411][YARN] Lazily create FS within kerbe...

2017-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18633#discussion_r127561224
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopFSDelegationTokenProvider.scala
 ---
@@ -45,11 +45,11 @@ private[deploy] class 
HadoopFSDelegationTokenProvider(fileSystems: Set[FileSyste
 
 val newCreds = fetchDelegationTokens(
   getTokenRenewer(hadoopConf),
-  fileSystems)
+  fileSystems())
--- End diff --

Cache the result of this call since it might be used again below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18633: [SPARK-21411][YARN] Lazily create FS within kerbe...

2017-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18633#discussion_r127562035
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
 ---
@@ -42,7 +42,7 @@ import org.apache.spark.internal.Logging
 private[spark] class HadoopDelegationTokenManager(
 sparkConf: SparkConf,
 hadoopConf: Configuration,
-fileSystems: Set[FileSystem])
+fileSystems: () => Set[FileSystem])
--- End diff --

I think this closure needs to take a `Configuration` as paramater.

If you look at the old code, when renewing tokens, the file system list 
uses `AMCredentialRenewer.freshHadoopConf` as the configuration, not the 
`ApplicationMaster` configuration as your code is doing. That sounds more 
correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18628
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79628/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-07-14 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18630
  
Are you trying to support `--packages` in standalone cluster? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18628
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18628
  
**[Test build #79628 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79628/testReport)**
 for PR 18628 at commit 
[`787e72c`](https://github.com/apache/spark/commit/787e72c419bb1ea29af6c7eee6b9c71d02d24a57).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18607
  
**[Test build #3842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3842/testReport)**
 for PR 18607 at commit 
[`6d6e2d7`](https://github.com/apache/spark/commit/6d6e2d77b032f2deaa4f44ea1589135a4745e959).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18616
  
**[Test build #79629 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79629/testReport)**
 for PR 18616 at commit 
[`545ae9a`](https://github.com/apache/spark/commit/545ae9a71bfea6d7093ca47df827c8f1fa7aad28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18616
  
Thanks @vanzin for your review, I will update it soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18428
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79623/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18428
  
**[Test build #79623 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79623/testReport)**
 for PR 18428 at commit 
[`6a7162d`](https://github.com/apache/spark/commit/6a7162dfbefbc900cc103f6fd7d7df5510cf2154).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18628
  
**[Test build #79628 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79628/testReport)**
 for PR 18628 at commit 
[`787e72c`](https://github.com/apache/spark/commit/787e72c419bb1ea29af6c7eee6b9c71d02d24a57).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18428
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127558429
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127554746
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127555147
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127498459
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ---
@@ -0,0 +1,185 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.Vectors
+import org.apache.spark.ml.param.{IntParam, ParamMap, ParamValidators}
+import org.apache.spark.ml.param.shared.{HasInputCols, HasOutputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.mllib.feature.{HashingTF => OldHashingTF}
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+import org.apache.spark.util.collection.OpenHashMap
+
+/**
+ * Feature hashing projects a set of categorical or numerical features 
into a feature vector of
+ * specified dimension (typically substantially smaller than that of the 
original feature
+ * space). This is done using the hashing trick 
(https://en.wikipedia.org/wiki/Feature_hashing)
+ * to map features to indices in the feature vector.
+ *
+ * The [[FeatureHasher]] transformer operates on multiple columns. Each 
column may be numeric
+ * (representing a real feature) or string (representing a categorical 
feature). Boolean columns
+ * are also supported, and treated as categorical features. For numeric 
features, the hash value of
+ * the column name is used to map the feature value to its index in the 
feature vector.
+ * For categorical features, the hash value of the string 
"column_name=value" is used to map to the
+ * vector index, with an indicator value of `1.0`. Thus, categorical 
features are "one-hot" encoded
+ * (similarly to using [[OneHotEncoder]] with `dropLast=false`).
+ *
+ * Null (missing) values are ignored (implicitly zero in the resulting 
feature vector).
+ *
+ * Since a simple modulo is used to transform the hash function to a 
vector index,
+ * it is advisable to use a power of two as the numFeatures parameter;
+ * otherwise the features will not be mapped evenly to the vector indices.
+ *
+ * {{{
+ *   val df = Seq(
+ *(2.0, true, "1", "foo"),
+ *(3.0, false, "2", "bar")
+ *   ).toDF("real", "bool", "stringNum", "string")
+ *
+ *   val hasher = new FeatureHasher()
+ *.setInputCols("real", "bool", "stringNum", "num")
+ *.setOutputCol("features")
+ *
+ *   hasher.transform(df).show()
+ *
+ *   ++-+-+--++
+ *   |real| bool|stringNum|string|features|
+ *   ++-+-+--++
+ *   | 2.0| true|1|   foo|(262144,[51871,63...|
+ *   | 3.0|false|2|   bar|(262144,[6031,806...|
+ *   ++-+-+--++
+ * }}}
+ */
+@Since("2.3.0")
+class FeatureHasher(@Since("2.3.0") override val uid: String) extends 
Transformer
+  with HasInputCols with HasOutputCol with DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("featureHasher"))
+
+  /**
+   * Number of features. Should be greater than 0.
+   * (default = 2^18^)
+   * @group param
+   */
+  @Since("2.3.0")
+  val numFeatures = new IntParam(this, "numFeatures", "number of features 
(> 0)",
+ParamValidators.gt(0))
+
+  setDefault(numFeatures -> (1 << 18))
+
+  /** @group getParam */
+  @Since("2.3.0")
+  def getNumFeatures: Int = $(numFeatures)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setNumFeatures(value: Int): this.type = set(numFeatures, value)
+
+  /** @group

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127557871
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.types._
+
+class FeatureHasherSuite extends SparkFunSuite
+  with MLlibTestSparkContext
+  with DefaultReadWriteTest {
+
+  import testImplicits._
+
+  import HashingTFSuite.murmur3FeatureIdx
+
+  implicit val vectorEncoder = ExpressionEncoder[Vector]()
--- End diff --

private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18513#discussion_r127491688
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/FeatureHasherSuite.scala ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.{Vector, Vectors}
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.ml.util.TestingUtils._
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.functions.col
+import org.apache.spark.sql.types._
+
+class FeatureHasherSuite extends SparkFunSuite
+  with MLlibTestSparkContext
+  with DefaultReadWriteTest {
+
+  import testImplicits._
+
+  import HashingTFSuite.murmur3FeatureIdx
+
+  implicit val vectorEncoder = ExpressionEncoder[Vector]()
+
+  test("params") {
+ParamsSuite.checkParams(new FeatureHasher)
+  }
+
+  test("specify input cols using varargs or array") {
+val featureHasher1 = new FeatureHasher()
+  .setInputCols("int", "double", "float", "stringNum", "string")
+val featureHasher2 = new FeatureHasher()
+  .setInputCols(Array("int", "double", "float", "stringNum", "string"))
+assert(featureHasher1.getInputCols === featureHasher2.getInputCols)
+  }
+
+  test("feature hashing") {
+val df = Seq(
+  (2.0, true, "1", "foo"),
+  (3.0, false, "2", "bar")
+).toDF("real", "bool", "stringNum", "string")
+
+val n = 100
+val hasher = new FeatureHasher()
+  .setInputCols("real", "bool", "stringNum", "string")
+  .setOutputCol("features")
+  .setNumFeatures(n)
+val output = hasher.transform(df)
+val attrGroup = 
AttributeGroup.fromStructField(output.schema("features"))
+require(attrGroup.numAttributes === Some(n))
--- End diff --

make this an `assert`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18428
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79622/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18428
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18428: [Spark-21221][ML] CrossValidator and TrainValidationSpli...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18428
  
**[Test build #79622 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79622/testReport)**
 for PR 18428 at commit 
[`a6bd197`](https://github.com/apache/spark/commit/a6bd1972cfcebe66b67f263ed82fc533f5287e9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread ajaysaini725

Github user ajaysaini725 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127558107
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1229,11 +1229,30 @@ def test_output_columns(self):
  (2.0, Vectors.dense(0.5, 0.5))],
 ["label", "features"])
 lr = LogisticRegression(maxIter=5, regParam=0.01)
-ovr = OneVsRest(classifier=lr)
+ovr = OneVsRest(classifier=lr, parallelism=1)
 model = ovr.fit(df)
 output = model.transform(df)
 self.assertEqual(output.columns, ["label", "features", 
"prediction"])
 
+def test_parallelism_doesnt_change_output(self):
+df = self.spark.createDataFrame([(0.0, Vectors.dense(1.0, 0.8)),
+ (1.0, Vectors.sparse(2, [], [])),
+ (2.0, Vectors.dense(0.5, 0.5))],
+["label", "features"])
+ovrPar1 = OneVsRest(classifier=LogisticRegression(maxIter=5, 
regParam=.01), parallelism=1)
+modelPar1 = ovrPar1.fit(df)
+ovrPar2 = OneVsRest(classifier=LogisticRegression(maxIter=5, 
regParam=.01), parallelism=2)
+modelPar2 = ovrPar2.fit(df)
+self.assertEqual(modelPar1.getPredictionCol(), 
modelPar2.getPredictionCol())
+for model in modelPar1.models:
+foundCloseCoeffs = False
+for model2 in modelPar2.models:
--- End diff --

See comment about the Scala version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18640
  
**[Test build #79627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)**
 for PR 18640 at commit 
[`0f29656`](https://github.com/apache/spark/commit/0f29656cd2a933fad37af33e59115d376026d09d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18640
  
This aims to reduce the review scope for #17980 .
cc @kiszk .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread ajaysaini725

Github user ajaysaini725 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127557890
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,50 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
+val numClasses = 3
+val ovaPar1 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+
+val ovaModelPar1 = ovaPar1.fit(dataset)
+
+val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
+
+val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val ovaPar2 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+  .setParallelism(2)
+
+val ovaModelPar2 = ovaPar2.fit(dataset)
+
+val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
+
+val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
+val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
+assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
+
+for (i <- 0 until ovaModelPar1.models.length) {
+  var foundCloseCoeffs = false
+  val currentCoeffs = ovaModelPar1.models(i)
+  
.asInstanceOf[LogisticRegressionModel].coefficients
+  for (j <- 0 until ovaModelPar2.models.length) {
--- End diff --

Because of the parallelism the order in which the models are written to the 
array might be different between runs. As a result, it is necessary to compare 
all pairs of models in the arrays not just models at the same indices.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/18640

[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

## What changes were proposed in this pull request?

Like Parquet, this PR aims to depend on the latest Apache ORC 1.4 for 
Apache Spark 2.3. There are key benefits for now.

- Stability: Apache ORC 1.4.0 has many fixes and we can depend on ORC 
community more.
- Maintainability: Reduce the Hive dependency and can remove old legacy 
code later.

Later, we can get the following two key benefits by [adding new 
ORCFileFormat](#17980) in SPARK-20728, too.
- Usability: User can use ORC data sources without hive module, i.e, -Phive.
- Speed: Use both Spark ColumnarBatch and ORC RowBatch together. This will 
be faster than the current implementation in Spark.

## How was this patch tested?

Pass the jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-21422

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18640


commit 0f29656cd2a933fad37af33e59115d376026d09d
Author: Dongjoon Hyun 
Date:   2017-07-14T22:01:55Z

[SPARK-21422][BUILD] Depend on Apache ORC 1.4.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18616: [SPARK-21377][YARN] Make jars specify with --jars...

2017-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18616#discussion_r127556964
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -438,6 +441,24 @@ private[spark] class ApplicationMaster(
 registerAM(sparkConf, rpcEnv, driverRef, 
sparkConf.getOption("spark.driver.appUIAddress"),
   securityMgr)
 
+// If the credentials file config is present, we must periodically 
renew tokens. So create
+// a new AMDelegationTokenRenewer
+if (sparkConf.contains(CREDENTIALS_FILE_PATH.key)) {
--- End diff --

nit: `.key` is not necessary (also in other call)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18616: [SPARK-21377][YARN] Make jars specify with --jars...

2017-07-14 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18616#discussion_r127557008
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
 ---
@@ -438,6 +441,24 @@ private[spark] class ApplicationMaster(
 registerAM(sparkConf, rpcEnv, driverRef, 
sparkConf.getOption("spark.driver.appUIAddress"),
   securityMgr)
 
+// If the credentials file config is present, we must periodically 
renew tokens. So create
+// a new AMDelegationTokenRenewer
+if (sparkConf.contains(CREDENTIALS_FILE_PATH.key)) {
+  // Start a short-lived thread for AMCredentialRenewer, the only 
purpose is to set the
+  // classloader so that main jar and secondary jars could be used by 
AMCredentialRenewer.
+  val credentialRenewerThread = new Thread {
+setName("AMCredentialRenewerThread")
--- End diff --

nit: `s/Thread/Starter`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-07-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17980
  
Hi, @kiszk . I will start with `Adding Apache ORC dependency (pom and 
dependency changes)` in 
 [SPARK-21422](https://issues.apache.org/jira/browse/SPARK-21422) first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types ...

2017-07-14 Thread zasdfgbnm

Github user zasdfgbnm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18444#discussion_r127555013
  
--- Diff: core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
---
@@ -57,11 +57,11 @@ private[spark] object SerDeUtil extends Logging {
 //  };
 // TODO: support Py_UNICODE with 2 bytes
--- End diff --

Since `array('u')` pass the tests, should this TODO be removed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-14 Thread zasdfgbnm

Github user zasdfgbnm commented on the issue:

https://github.com/apache/spark/pull/18444
  
I updated my code according to @HyukjinKwon's suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18633
  
**[Test build #79626 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79626/testReport)**
 for PR 18633 at commit 
[`9a4f012`](https://github.com/apache/spark/commit/9a4f01217b7ff6219097fe7f4080ecc51125).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-14 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18633
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18616
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79625/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79619/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18616
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18637
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18616: [SPARK-21377][YARN] Make jars specify with --jars/--pack...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18616
  
**[Test build #79625 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79625/testReport)**
 for PR 18616 at commit 
[`a9e1a21`](https://github.com/apache/spark/commit/a9e1a21e9b89fa025a1b823680d7acabc34c833a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18638: [SPARK-21421][SS]Add the query id as a local prop...

2017-07-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18513: [SPARK-13969][ML] Add FeatureHasher transformer

2017-07-14 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/18513
  
Just to clarify:

* If I want to treat a column as categorical that is represented by 
integers, I'd have to map those integers to strings, right? I believe that's 
one of your bullets above.
* This is going to one-hot encoding on categorical columns, effectively, 
which is going to create linearly dependent columns since there is no parameter 
to drop the last column. Maybe there's a good solution, but I don't think we 
have to address it here. Just wanted to check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18444
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79618/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18637: [SPARK-15526][ML][FOLLOWUP][test-maven] Make JPMML provi...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18637
  
**[Test build #79619 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79619/testReport)**
 for PR 18637 at commit 
[`4b20f82`](https://github.com/apache/spark/commit/4b20f822c9d76a2168fc4fe8094d8f22eecd5d93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18444
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18444
  
**[Test build #79618 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79618/testReport)**
 for PR 18444 at commit 
[`6c49a48`](https://github.com/apache/spark/commit/6c49a4856ab26d9f207bd781f123fe01b03e13b2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18638: [SPARK-21421][SS]Add the query id as a local property to...

2017-07-14 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18638
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127552088
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,50 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
+val numClasses = 3
+val ovaPar1 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+
+val ovaModelPar1 = ovaPar1.fit(dataset)
+
+val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
+
+val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val ovaPar2 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+  .setParallelism(2)
+
+val ovaModelPar2 = ovaPar2.fit(dataset)
+
+val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
+
+val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
+val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
+assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
+
+for (i <- 0 until ovaModelPar1.models.length) {
+  var foundCloseCoeffs = false
+  val currentCoeffs = ovaModelPar1.models(i)
+  
.asInstanceOf[LogisticRegressionModel].coefficients
--- End diff --

and below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18370: [SPARK-9825][yarn] Do not overwrite final Hadoop ...

2017-07-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18370


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127552778
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1511,27 +1512,47 @@ class OneVsRest(Estimator, OneVsRestParams, 
MLReadable, MLWritable):
 .. versionadded:: 2.0.0
 """
 
+parallelism = Param(Params._dummy(), "parallelism",
+"number of threads to use when fitting models in 
parallel",
+typeConverter=TypeConverters.toInt)
+
 @keyword_only
 def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
- classifier=None):
+ classifier=None, parallelism=1):
 """
 __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
  classifier=None)
--- End diff --

Add parallelism=1 here too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127552478
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1511,27 +1512,47 @@ class OneVsRest(Estimator, OneVsRestParams, 
MLReadable, MLWritable):
 .. versionadded:: 2.0.0
 """
 
+parallelism = Param(Params._dummy(), "parallelism",
--- End diff --

Since this is a shared Param in Scala, do you want to make it one in Python 
too?  You can add it to _shared_params_code_gen.py and then re-generate the 
shared.py file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127552072
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,50 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
+val numClasses = 3
+val ovaPar1 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+
+val ovaModelPar1 = ovaPar1.fit(dataset)
+
+val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
+
+val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val ovaPar2 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+  .setParallelism(2)
+
+val ovaModelPar2 = ovaPar2.fit(dataset)
+
+val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
+
+val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
+val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
+assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
+
+for (i <- 0 until ovaModelPar1.models.length) {
+  var foundCloseCoeffs = false
+  val currentCoeffs = ovaModelPar1.models(i)
+  
.asInstanceOf[LogisticRegressionModel].coefficients
--- End diff --

fix indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127552824
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1511,27 +1512,47 @@ class OneVsRest(Estimator, OneVsRestParams, 
MLReadable, MLWritable):
 .. versionadded:: 2.0.0
 """
 
+parallelism = Param(Params._dummy(), "parallelism",
+"number of threads to use when fitting models in 
parallel",
+typeConverter=TypeConverters.toInt)
+
 @keyword_only
 def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
- classifier=None):
+ classifier=None, parallelism=1):
 """
 __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
  classifier=None)
 """
 super(OneVsRest, self).__init__()
+self._setDefault(parallelism=1)
 kwargs = self._input_kwargs
 self._set(**kwargs)
 
 @keyword_only
 @since("2.0.0")
-def setParams(self, featuresCol=None, labelCol=None, 
predictionCol=None, classifier=None):
+def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+  classifier=None, parallelism=1):
 """
 setParams(self, featuresCol=None, labelCol=None, 
predictionCol=None, classifier=None):
--- End diff --

ditto: add parallelism=1 to doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127550679
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import scala.concurrent.ExecutionContext
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.param.{IntParam, Params, ParamValidators}
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Common parameter for estimators trained in a multithreaded environment.
+ */
+private[ml] trait HasParallelism extends Params {
+
+  /**
+   * param for the number of threads to use when running parallel one vs. 
rest
+   * The implementation of parallel one vs. rest runs the classification 
for
+   * each class in a separate threads.
+   * @group expertParam
+   */
+  @Since("2.3.0")
+  val parallelism = new IntParam(this, "parallelism",
+"the number of threads to use when running parallel algorithms", 
ParamValidators.gtEq(1))
+
+  setDefault(parallelism -> 1)
+
+  /** @group getParam */
--- End diff --

expertGetParam


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127553451
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -1229,11 +1229,30 @@ def test_output_columns(self):
  (2.0, Vectors.dense(0.5, 0.5))],
 ["label", "features"])
 lr = LogisticRegression(maxIter=5, regParam=0.01)
-ovr = OneVsRest(classifier=lr)
+ovr = OneVsRest(classifier=lr, parallelism=1)
 model = ovr.fit(df)
 output = model.transform(df)
 self.assertEqual(output.columns, ["label", "features", 
"prediction"])
 
+def test_parallelism_doesnt_change_output(self):
+df = self.spark.createDataFrame([(0.0, Vectors.dense(1.0, 0.8)),
+ (1.0, Vectors.sparse(2, [], [])),
+ (2.0, Vectors.dense(0.5, 0.5))],
+["label", "features"])
+ovrPar1 = OneVsRest(classifier=LogisticRegression(maxIter=5, 
regParam=.01), parallelism=1)
+modelPar1 = ovrPar1.fit(df)
+ovrPar2 = OneVsRest(classifier=LogisticRegression(maxIter=5, 
regParam=.01), parallelism=2)
+modelPar2 = ovrPar2.fit(df)
+self.assertEqual(modelPar1.getPredictionCol(), 
modelPar2.getPredictionCol())
+for model in modelPar1.models:
+foundCloseCoeffs = False
+for model2 in modelPar2.models:
--- End diff --

As in Scala, this seems like a roundabout way to compare the models.  Can 
you just zip the two arrays of models together and compare the pairs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127551356
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -325,8 +326,11 @@ final class OneVsRest @Since("1.4.0") (
   multiclassLabeled.persist(StorageLevel.MEMORY_AND_DISK)
 }
 
+val executionContext = getExecutionContext
+instr.logParams(parallelism)
--- End diff --

This can be logged with the other Params above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127551019
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import scala.concurrent.ExecutionContext
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.param.{IntParam, Params, ParamValidators}
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Common parameter for estimators trained in a multithreaded environment.
+ */
+private[ml] trait HasParallelism extends Params {
+
+  /**
+   * param for the number of threads to use when running parallel one vs. 
rest
+   * The implementation of parallel one vs. rest runs the classification 
for
+   * each class in a separate threads.
+   * @group expertParam
+   */
+  @Since("2.3.0")
+  val parallelism = new IntParam(this, "parallelism",
+"the number of threads to use when running parallel algorithms", 
ParamValidators.gtEq(1))
+
+  setDefault(parallelism -> 1)
+
+  /** @group getParam */
+  def getParallelism: Int = $(parallelism)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setParallelism(value: Int): this.type = {
+set(parallelism, value)
+  }
+
+  protected def getExecutionContext: ExecutionContext = {
--- End diff --

make package private: ```private[ml]```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127550735
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/HasParallelism.scala ---
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.util
+
+import scala.concurrent.ExecutionContext
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.ml.param.{IntParam, Params, ParamValidators}
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Common parameter for estimators trained in a multithreaded environment.
+ */
+private[ml] trait HasParallelism extends Params {
+
+  /**
+   * param for the number of threads to use when running parallel one vs. 
rest
+   * The implementation of parallel one vs. rest runs the classification 
for
+   * each class in a separate threads.
+   * @group expertParam
+   */
+  @Since("2.3.0")
+  val parallelism = new IntParam(this, "parallelism",
+"the number of threads to use when running parallel algorithms", 
ParamValidators.gtEq(1))
+
+  setDefault(parallelism -> 1)
+
+  /** @group getParam */
+  def getParallelism: Int = $(parallelism)
+
+  /** @group setParam */
--- End diff --

expertSetParam


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-14 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r127553419
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,50 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
+val numClasses = 3
+val ovaPar1 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+
+val ovaModelPar1 = ovaPar1.fit(dataset)
+
+val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
+
+val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val ovaPar2 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+  .setParallelism(2)
+
+val ovaModelPar2 = ovaPar2.fit(dataset)
+
+val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
+
+val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
+val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
+assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
+
+for (i <- 0 until ovaModelPar1.models.length) {
+  var foundCloseCoeffs = false
+  val currentCoeffs = ovaModelPar1.models(i)
+  
.asInstanceOf[LogisticRegressionModel].coefficients
+  for (j <- 0 until ovaModelPar2.models.length) {
--- End diff --

This seems like a roundabout way to compare the models.  Can you just zip 
the two arrays of models together and compare the pairs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18547: [SPARK-21321][Spark Core] Spark very verbose on shutdown

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18547
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2017-07-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18555
  
Checking whether the values are set or not is not enough. We need to check 
whether these parameters are effective or not. That means, we need to check the 
behaviors of Spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18547: [SPARK-21321][Spark Core] Spark very verbose on shutdown

2017-07-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18370: [SPARK-9825][yarn] Do not overwrite final Hadoop config ...

2017-07-14 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18370
  
>  it feels like we shouldn't ship the hadoop conf dir but that might break 
some people

Yeah, we rely on that to ship other configs (like Hive and HBase) with the 
application without the user having to mess with `--files` and whatnot.

Thanks, merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...

2017-07-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/17980
  
Thank you for review, @kiszk . 
- The first one about [adding new ORC 
source](https://github.com/apache/spark/pull/17924) is more smaller than this .
- Also, there is [more smaller version without vectorized 
part](https://github.com/apache/spark/pull/17943).

If we need to split more, we can split the one adding `Apache ORC` 
dependency into a separate one, too. Is that helpful for review?

In general, about 50% lines are new ORC test codes which is duplicated from 
old ORC test codes. Could you recommend some better way? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 247 matches

Mail list logo