[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on the pull request:

https://github.com/apache/spark/pull/4087#issuecomment-70446766
  
[~leahmcguire],

Thanks for the patch!

A few comments:
1. PySpark calls the Scala API for MLlib, so for API compatibility, we 
can't use enumerations on the public APIs.  I suggest using a string for the 
train() functions but keeping the enumeration for the internal API.
2. Can you create a new JIRA for updating the PySpark MLlib NB API?  I can 
post details on what needs to change there -- if you don't want to do the PR 
for that, I can.
3. The populateMatrix function is verbose.  Breeze seems to support 
element-wise operations 
(https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) which 
might be negate the need for the populateMatrix function.
4. Can you update the MLlib docs in docs/mllib-naive-bayes.md ?

Thanks!   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] fix typo in class description

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4100#issuecomment-70453298
  
  [Test build #25746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25746/consoleFull)
 for   PR 4100 at commit 
[`b13b9d6`](https://github.com/apache/spark/commit/b13b9d6345df178e49fb1a5be6016008b0b08488).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...

2015-01-18 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/4043#issuecomment-70453703
  
@pwendell - patch updated to latest master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70453833
  
  [Test build #25740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25740/consoleFull)
 for   PR 4098 at commit 
[`b349b77`](https://github.com/apache/spark/commit/b349b77509229eee3ea5a7f3fbad6737b82d2e95).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val elem = sarray (class $`
  * `val elem = sexternalizable object (class $`
  * `val elem = sobject (class $`
  * `  implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) 
extends AnyVal `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Bug fix for SPARK-5242: ec2/spark_ec2.py lauc...

2015-01-18 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/4038#issuecomment-70442384
  
cc @shivaram 

I haven't had a chance to look at this more closely yet, and likely won't 
until next weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread tianyi
Github user tianyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/3946#discussion_r23142124
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suite.scala
 ---
@@ -384,4 +388,32 @@ class HiveThriftServer2Suite extends FunSuite with 
Logging {
   }
 }
   }
+
+  test(SPARK-5100 monitor page) {
--- End diff --

@JoshRosen, I have talked with @liancheng about the UISeleniumSuite. I did 
not add more complex web UI tests because that we worried about the test costs 
too much time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-70447973
  
  [Test build #25737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25737/consoleFull)
 for   PR 2495 at commit 
[`0461ed0`](https://github.com/apache/spark/commit/0461ed06a66966480a93085e41fdb0a620804222).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-70447975
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25737/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70448381
  
  [Test build #25738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25738/consoleFull)
 for   PR 3897 at commit 
[`8232aa8`](https://github.com/apache/spark/commit/8232aa8b07a10cb6d1e07e8be49741585f1b4126).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...

2015-01-18 Thread jackylk
Github user jackylk commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-70450975
  
Yes, I have tested the parallel FP-Growth algorithm using a open data set 
from http://fimi.ua.ac.be/data/, performance test result can be found at 
https://issues.apache.org/jira/browse/SPARK-4001

All modification is done except for the 7th (generic type), please review 
the code for now.
I am still considering whether it is worthy to implement generic type since 
it adds more complexity to the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4699][SQL] make caseSensitive configura...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3558#issuecomment-70456370
  
  [Test build #25751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25751/consoleFull)
 for   PR 3558 at commit 
[`05b09a3`](https://github.com/apache/spark/commit/05b09a3c1008869571e438c12e8593def7ecdc2c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4699][SQL] make caseSensitive configura...

2015-01-18 Thread jackylk
Github user jackylk commented on the pull request:

https://github.com/apache/spark/pull/3558#issuecomment-70456295
  
I have updated the code based on SPARK-3965 (SPARK-5168)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70446535
  
  [Test build #25735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25735/consoleFull)
 for   PR 3946 at commit 
[`daed3d1`](https://github.com/apache/spark/commit/daed3d126a5112d9e4e94fac7592ff804775ec05).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70447398
  
The problem is that: Currently the `GetField` class is an operation which 
picks the first field whose name equal to the required `fieldName` with case 
sensitive. As I said before, we will parse `a.b[0].c.d` to 
`GetField(GetField(GetItem(Unresolved(a.b), 0), c), d)`. For the `a.b`, 
we can check anything we want before build `GetField`, but for the 2 outer 
`GetFiled`, we can only do the check in `Analyzer`(or we can expose `resolver` 
to `GetField`, but it's not recommended).

So we need a way to indicate whether a `GetField` need analyse or not.

For SPARK-3698, we can do this by searching required field with case 
sensitive, if success, we are done. if not, we still have chance if the 
resolver is case insensitive, so we can do the check in `Analyzer` as @marmbrus 
did in https://github.com/apache/spark/pull/3724.

For SPARK-5278 here, it's more complicated. it seems to me that the only 
way is adding a flag to `GetField`, or introduce `UnresolvedGetField`.

What do you think? @marmbrus @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] fix typo in class description

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4100#issuecomment-70452986
  
  [Test build #25744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25744/consoleFull)
 for   PR 4100 at commit 
[`fcc8c85`](https://github.com/apache/spark/commit/fcc8c857aef468d1a86c085554a2a7184ff769a3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4961] [CORE] Put HadoopRDD.getPartition...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3794#issuecomment-70452982
  
  [Test build #25745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25745/consoleFull)
 for   PR 3794 at commit 
[`b535a53`](https://github.com/apache/spark/commit/b535a531ee853c29d63cda0154be54512740bc78).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] fix typo in class description

2015-01-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4100#issuecomment-70457188
  
Thanks. Merging in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5282][mllib]: RowMatrix easily gets int...

2015-01-18 Thread hhbyyh
Github user hhbyyh commented on the pull request:

https://github.com/apache/spark/pull/4069#issuecomment-70441745
  
@srowen Would you mind to take another look? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4908][SQL]narrow the scope of synchroni...

2015-01-18 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/4001#issuecomment-70441804
  
`HiveShim.getCommandProcess` delegates to methods defined in 
`CommandProcessorFactory`, which tries to find a cached `Driver` object and 
initialize it. The underlying `Driver` cache map is synchronized. However, I'm 
not quite sure whether `Driver` is thread-safe. Also, `HiveServer2` actually 
creates a new `Driver` instance for every SQL statement and never caches them. 
Considering all the above, I'd agree that the risks is greater than the 
benefits. A better solution for this is to avoid using 
`HiveShim.getCommandProcess` (which caches `Driver` objects) but mimicing what 
`HiveServer2` does and create new `Driver` instances for every SQL statement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70445416
  
  [Test build #25734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25734/consoleFull)
 for   PR 3897 at commit 
[`932289f`](https://github.com/apache/spark/commit/932289f6d808932da9fa54c21b32c61efca5a18f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread tianyi
Github user tianyi commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70446387
  
rebased from latest master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

2015-01-18 Thread mfawzymkh
Github user mfawzymkh commented on the pull request:

https://github.com/apache/spark/pull/1813#issuecomment-70446580
  
do we have an ETA to get this pull request merged to master?  The guava 
shading issue is causing a problem for client libs that has a dependency on 
swift-service  when spark is compiled with hadoop-2.4 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/4087#discussion_r23142812
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -75,9 +106,12 @@ class NaiveBayesModel private[mllib] (
  * document classification.  By making every vector a 0-1 vector, it can 
also be used as
  * Bernoulli NB ([[http://tinyurl.com/p7c96j6]]). The input feature values 
must be nonnegative.
  */
-class NaiveBayes private (private var lambda: Double) extends Serializable 
with Logging {
+class NaiveBayes private (private var lambda: Double,
+  var model: NaiveBayesModels) extends 
Serializable with Logging {
 
-  def this() = this(1.0)
+  def this(lambda: Double) = this(lambda, NaiveBayesModels.Multinomial)
+
+  def this() = this(1.0, NaiveBayesModels.Multinomial)
--- End diff --

I suggest removing the default model for the internal API.  Backwards 
compatibility only matters for public API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70449375
  
  [Test build #25733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25733/consoleFull)
 for   PR 3946 at commit 
[`14a461d`](https://github.com/apache/spark/commit/14a461dc3dc05b66bd8f6c4027e4c1a39a84d90d).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70449381
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25733/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70449484
  
  [Test build #25739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25739/consoleFull)
 for   PR 3997 at commit 
[`0d9d130`](https://github.com/apache/spark/commit/0d9d13040e4d2730ec1c8ceaf5d8d48ead9d0bd8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4098

[SPARK-5307] SerializationDebugger - take 2

This patch adds a SerializationDebugger that is used to add serialization 
path to a NotSerializableException. When a NotSerializableException is 
encountered, the debugger visits the object graph to find the path towards the 
object that cannot be serialized, and constructs information to help user to 
find the object.

Compared with an earlier attempt, this one provides extra information 
including field names, array offsets, writeExternal calls, etc.

An example serialization stack:
```
Serialization stack:
  -object not serializable (class: 
org.apache.spark.serializer.NotSerializable, value: 
org.apache.spark.serializer.NotSerializable@2c43caa4)
  -element of array (index: 0)
  -array (class [Ljava.lang.Object;, size 1)
  -field (class: org.apache.spark.serializer.SerializableArray, name: 
arrayField, type: class [Ljava.lang.Object;)
  -object (class org.apache.spark.serializer.SerializableArray, 
org.apache.spark.serializer.SerializableArray@193c5908)
  -writeExternal data
  -externalizable object (class 
org.apache.spark.serializer.ExternalizableClass, 
org.apache.spark.serializer.ExternalizableClass@320bdadc)
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SerializationDebugger

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4098.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4098


commit b349b77509229eee3ea5a7f3fbad6737b82d2e95
Author: Reynold Xin r...@databricks.com
Date:   2015-01-19T05:55:01Z

[SPARK-5307] SerializationDebugger to help debug NotSerializableException - 
take 2

This patch adds a SerializationDebugger that is used to add serialization 
path to
a NotSerializableException. When a NotSerializableException is encountered, 
the debugger
visits the object graph to find the path towards the object that cannot be 
serialized,
and constructs information to help user to find the object.

Compared with an earlier attempt, this one provides extra information 
including
field names, array offsets, writeExternal calls, etc.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4987] [SQL] parquet timestamp type supp...

2015-01-18 Thread felixcheung
Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/3820#issuecomment-70450284
  
I've tested this PR but the result seems to be off.
Parquet generated from Hive with timestamp values set by 
'from_utc_timestamp('1970-01-01 08:00:00','PST')'

What I see with this PR:
scala t.take(10).foreach(println(_))
...
15/01/18 22:06:41 INFO NewHadoopRDD: Input split: ParquetInputSplit{part: 
file:/users/x/parquetwithtimestamp start: 0 end: 25448 length: 25448 hosts: [] 
requestedSchema: message root {
  optional binary code (UTF8);
  optional binary description (UTF8);
  optional int32 total_emp;
  optional int32 salary;
  optional int96 timestamp;
}
 readSupportMetadata: 
{org.apache.spark.sql.parquet.row.metadata={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]},
 
org.apache.spark.sql.parquet.row.requested_schema={type:struct,fields:[{name:code,type:string,nullable:true,metadata:{}},{name:description,type:string,nullable:true,metadata:{}},{name:total_emp,type:integer,nullable:true,metadata:{}},{name:salary,type:integer,nullable:true,metadata:{}},{name:timestamp,type:timestamp,nullable:true,metadata:{}}]}}}
15/01/18 22:06:41 WARN ParquetRecordReader: Can not initialize counter due 
to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
15/01/18 22:06:41 INFO InternalParquetRecordReader: RecordReader 
initialized will read a total of 823 records.
15/01/18 22:06:41 INFO InternalParquetRecordReader: at row 0. reading next 
block
15/01/18 22:06:41 INFO CodecPool: Got brand-new decompressor [.snappy]
15/01/18 22:06:41 INFO InternalParquetRecordReader: block read in memory in 
27 ms. row count = 823
[00-,All Occupations,134354250,40690,1974-01-07 17:58:00.08896]
[11-,Management occupations,6003930,96150,1974-01-07 17:58:00.08896]

Expect: 1970-01-01 08:00:00

Actual: 1974-01-07 17:58:00.08896

Any idea?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-70450297
  
  [Test build #25742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25742/consoleFull)
 for   PR 2847 at commit 
[`eb3e4ca`](https://github.com/apache/spark/commit/eb3e4ca0709696b6b2b8afd1cfc56a5a9f87555d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70450293
  
  [Test build #25741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25741/consoleFull)
 for   PR 4098 at commit 
[`572d0cb`](https://github.com/apache/spark/commit/572d0cbfdbbd816d45290b38c6c6c86d2447efdc).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] fix typo in class description

2015-01-18 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/spark/pull/4100

[SQL] fix typo in class description



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/spark patch-9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4100


commit fcc8c857aef468d1a86c085554a2a7184ff769a3
Author: Jacky Li jacky.li...@gmail.com
Date:   2015-01-19T06:52:57Z

[SQL] fix typo in class description




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70454202
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25747/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5297][Streaming] Fix Java file stream t...

2015-01-18 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/4101

[SPARK-5297][Streaming] Fix Java file stream type erasure problem

Current Java file stream doesn't support custom key/value type because of 
loss of type information, details can be seen in 
[SPARK-5297](https://issues.apache.org/jira/browse/SPARK-5297). Fix this 
problem by getting correct `ClassTag` from `Class[_]`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-5297

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4101.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4101


commit 6c179f50bde9cfa46c6e2225313c992b231fb25f
Author: jerryshao saisai.s...@intel.com
Date:   2015-01-19T06:49:00Z

Fix Java fileInputStream type erasure problem

commit ec0131c1a2f4a4097d6d7b2f8a27d7abbf39b746
Author: jerryshao saisai.s...@intel.com
Date:   2015-01-19T07:12:35Z

Add Mima exclusion




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70454200
  
  [Test build #25747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25747/consoleFull)
 for   PR 4068 at commit 
[`bfe069b`](https://github.com/apache/spark/commit/bfe069bfb3ac6e80fa82849b4e1dee90a606e731).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...

2015-01-18 Thread wangxiaojing
Github user wangxiaojing commented on the pull request:

https://github.com/apache/spark/pull/2765#issuecomment-70437483
  
@tdas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/4087#discussion_r23142620
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -75,9 +106,12 @@ class NaiveBayesModel private[mllib] (
  * document classification.  By making every vector a 0-1 vector, it can 
also be used as
  * Bernoulli NB ([[http://tinyurl.com/p7c96j6]]). The input feature values 
must be nonnegative.
  */
-class NaiveBayes private (private var lambda: Double) extends Serializable 
with Logging {
+class NaiveBayes private (private var lambda: Double,
+  var model: NaiveBayesModels) extends 
Serializable with Logging {
--- End diff --

Model should probably be a val, not a var.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70449497
  
Link to the earlier attempt: https://github.com/apache/spark/pull/4093 by 
me and https://github.com/apache/spark/issues/3518 by @ilganeli




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread MechCoder
GitHub user MechCoder opened a pull request:

https://github.com/apache/spark/pull/4099

[SPARK-5022] [Sql] Change VectorUDT to object



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MechCoder/spark spark-5022

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4099.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4099


commit 0014a59c0d0d263482208c16cc8601205fe565bf
Author: MechCoder manojkumarsivaraj...@gmail.com
Date:   2015-01-19T06:16:15Z

[SPARK-5022] Change VectorUDT to object




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4099#issuecomment-70450637
  
cc @rxin I am unable to understand how to change this line

`@SQLUserDefinedType(udt = classOf[VectorUDT])` . I tried doing 
`@SQLUserDefinedType(udt = VectorUDT.getClass)`

Sorry if this seems dumb, because I'm relatively new.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2630 Input data size of CoalescedRDD cou...

2015-01-18 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2310#issuecomment-70440912
  
Sounds good, I concur.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-18 Thread MechCoder
Github user MechCoder commented on the pull request:

https://github.com/apache/spark/pull/4096#issuecomment-70441989
  
Alright, but maybe the documentation can be updated that the indices should 
be non-negative?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...

2015-01-18 Thread idanz
Github user idanz commented on the pull request:

https://github.com/apache/spark/pull/4094#issuecomment-70443024
  
I see,
I don't want to repeat old discussions so to be more pragmatic, the real
problem for me is to set the partition size when using sparksql.
My cluster uses 128MB blocks for hdfs, and when I use hiveContext.sql, it
just takes the partitions in that size.
This causes memory issues so I wanted to use smaller partitions.
However the only way I found to do that requires setting mapred.map.tasks
which is an undocumented setting.

Would you suggest opening a new ticket for this requirement?

Thanks

Here's some links to prior discussions of this:

   - https://issues.apache.org/jira/browse/SPARK-822
   - mesos/spark#718 https://github.com/mesos/spark/pull/718

—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/4094#issuecomment-70436546.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-4131][SQL] Writing data into the f...

2015-01-18 Thread nieldomingo
Github user nieldomingo commented on the pull request:

https://github.com/apache/spark/pull/2997#issuecomment-70443145
  
this would really help me


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3218, SPARK-3219, SPARK-3261, SPARK-342...

2015-01-18 Thread derrickburns
Github user derrickburns commented on the pull request:

https://github.com/apache/spark/pull/2634#issuecomment-70443890
  
@mengxr

I have implemented several variants of Kullback-Leibler divergence in
my separate
GitHub repository
https://github.com/derrickburns/generalized-kmeans-clustering.  These
variants are more efficient that the standard KL-divergence which is
defined on R+ ^ n because they take advantage of extra knowledge of the
domain. I have used these variants with much success (i.e. much faster
running time) in my large scale clustering runs.

On Sat, Jan 17, 2015 at 7:02 PM, UCB AMPLab notificati...@github.com
wrote:

 Test FAILed.
 Refer to this link for build results (access rights to CI server needed):
 https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25711/
 Test FAILed.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/2634#issuecomment-70394598.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70444727
  
  [Test build #25733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25733/consoleFull)
 for   PR 3946 at commit 
[`14a461d`](https://github.com/apache/spark/commit/14a461dc3dc05b66bd8f6c4027e4c1a39a84d90d).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread jongyoul
Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70444764
  
Rebase is not finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/4087#discussion_r23142579
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD
  * @param pi log of class priors, whose dimension is C, number of labels
  * @param theta log of class conditional probabilities, whose dimension is 
C-by-D,
  *  where D is number of features
+ * @param model The type of NB model to fit from the enumeration 
NaiveBayesModels, can be
+ *  Multinomial or Bernoulli
  */
+
 class NaiveBayesModel private[mllib] (
 val labels: Array[Double],
 val pi: Array[Double],
-val theta: Array[Array[Double]]) extends ClassificationModel with 
Serializable {
-
-  private val brzPi = new BDV[Double](pi)
-  private val brzTheta = new BDM[Double](theta.length, theta(0).length)
+val theta: Array[Array[Double]],
+val model: NaiveBayesModels) extends ClassificationModel with 
Serializable {
 
-  {
-// Need to put an extra pair of braces to prevent Scala treating `i` 
as a member.
+  def populateMatrix(arrayIn: Array[Array[Double]],
+ matrixIn: BDM[Double],
+ transformation: (Double) = Double = (x) = x) = {
 var i = 0
-while (i  theta.length) {
+while (i  arrayIn.length) {
   var j = 0
-  while (j  theta(i).length) {
-brzTheta(i, j) = theta(i)(j)
+  while (j  arrayIn(i).length) {
+matrixIn(i, j) = transformation(theta(i)(j))
 j += 1
   }
   i += 1
 }
   }
 
+  private val brzPi = new BDV[Double](pi)
+  private val brzTheta = new BDM[Double](theta.length, theta(0).length)
+  populateMatrix(theta, brzTheta)
+
+  private val brzNegTheta: Option[BDM[Double]] = model match {
--- End diff --

Why use an Option if this method is only called for Bernoulli anyway?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread jongyoul
Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70445944
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread jongyoul
Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70455615
  
@mateiz I've rebased this PR and finished tests successfully. Merge this, 
please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70455616
  
  [Test build #25750 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25750/consoleFull)
 for   PR 4068 at commit 
[`d8c1dc9`](https://github.com/apache/spark/commit/d8c1dc958148a4b052b387f5573b147cfd9385da).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70456466
  
  [Test build #25750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25750/consoleFull)
 for   PR 4068 at commit 
[`d8c1dc9`](https://github.com/apache/spark/commit/d8c1dc958148a4b052b387f5573b147cfd9385da).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70456470
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25750/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/4087#discussion_r23142512
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD
  * @param pi log of class priors, whose dimension is C, number of labels
  * @param theta log of class conditional probabilities, whose dimension is 
C-by-D,
  *  where D is number of features
+ * @param model The type of NB model to fit from the enumeration 
NaiveBayesModels, can be
+ *  Multinomial or Bernoulli
  */
+
 class NaiveBayesModel private[mllib] (
 val labels: Array[Double],
 val pi: Array[Double],
-val theta: Array[Array[Double]]) extends ClassificationModel with 
Serializable {
-
-  private val brzPi = new BDV[Double](pi)
-  private val brzTheta = new BDM[Double](theta.length, theta(0).length)
+val theta: Array[Array[Double]],
+val model: NaiveBayesModels) extends ClassificationModel with 
Serializable {
 
-  {
-// Need to put an extra pair of braces to prevent Scala treating `i` 
as a member.
+  def populateMatrix(arrayIn: Array[Array[Double]],
--- End diff --

This function seems excessive.  Does the Breeze library support 
element-wise log/exp and addition/subtraction with matrices?  If so, that would 
be cleaner and less verbose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4894][mllib] Added Bernoulli option to ...

2015-01-18 Thread rnowling
Github user rnowling commented on a diff in the pull request:

https://github.com/apache/spark/pull/4087#discussion_r23142533
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -32,28 +42,42 @@ import org.apache.spark.rdd.RDD
  * @param pi log of class priors, whose dimension is C, number of labels
  * @param theta log of class conditional probabilities, whose dimension is 
C-by-D,
  *  where D is number of features
+ * @param model The type of NB model to fit from the enumeration 
NaiveBayesModels, can be
+ *  Multinomial or Bernoulli
  */
+
 class NaiveBayesModel private[mllib] (
 val labels: Array[Double],
 val pi: Array[Double],
-val theta: Array[Array[Double]]) extends ClassificationModel with 
Serializable {
-
-  private val brzPi = new BDV[Double](pi)
-  private val brzTheta = new BDM[Double](theta.length, theta(0).length)
+val theta: Array[Array[Double]],
--- End diff --

This should probably be converted to a Breeze matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70445835
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25732/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70445832
  
  [Test build #25732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25732/consoleFull)
 for   PR 3897 at commit 
[`25f3617`](https://github.com/apache/spark/commit/25f3617182c8d4491a0545e3231dbdafe668c4a5).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `  command.setValue(scd $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70446696
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25735/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70446695
  
  [Test build #25735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25735/consoleFull)
 for   PR 3946 at commit 
[`daed3d1`](https://github.com/apache/spark/commit/daed3d126a5112d9e4e94fac7592ff804775ec05).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-01-18 Thread ankurdave
Github user ankurdave commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-70447655
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3650] Fix TriangleCount handling of rev...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2495#issuecomment-70447688
  
  [Test build #25737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25737/consoleFull)
 for   PR 2495 at commit 
[`0461ed0`](https://github.com/apache/spark/commit/0461ed06a66966480a93085e41fdb0a620804222).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70447740
  
  [Test build #25734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25734/consoleFull)
 for   PR 3897 at commit 
[`932289f`](https://github.com/apache/spark/commit/932289f6d808932da9fa54c21b32c61efca5a18f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70447746
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25734/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70452235
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25738/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70452231
  
  [Test build #25738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25738/consoleFull)
 for   PR 3897 at commit 
[`8232aa8`](https://github.com/apache/spark/commit/8232aa8b07a10cb6d1e07e8be49741585f1b4126).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70453518
  
  [Test build #25739 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25739/consoleFull)
 for   PR 3997 at commit 
[`0d9d130`](https://github.com/apache/spark/commit/0d9d13040e4d2730ec1c8ceaf5d8d48ead9d0bd8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70453523
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25739/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5278][SQL] complete the check of ambigu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4068#issuecomment-70453627
  
  [Test build #25747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25747/consoleFull)
 for   PR 4068 at commit 
[`bfe069b`](https://github.com/apache/spark/commit/bfe069bfb3ac6e80fa82849b4e1dee90a606e731).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4043#issuecomment-70453629
  
  [Test build #25748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25748/consoleFull)
 for   PR 4043 at commit 
[`3b11803`](https://github.com/apache/spark/commit/3b11803ae9b64acba2d64ad02d1e31d756783eaf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4984][CORE][WEBUI] Adding a pop-up cont...

2015-01-18 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3819#issuecomment-70437399
  
Hmm, agree with you, but have not found a easy way to spot truncated 
description.
if we add `...` for truncated decs, we will consider the case of window 
scaling and different screen which make this more complex 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5088] Use spark-class for running execu...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3897#issuecomment-70441182
  
  [Test build #25732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25732/consoleFull)
 for   PR 3897 at commit 
[`25f3617`](https://github.com/apache/spark/commit/25f3617182c8d4491a0545e3231dbdafe668c4a5).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70449764
  
  [Test build #25740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25740/consoleFull)
 for   PR 4098 at commit 
[`b349b77`](https://github.com/apache/spark/commit/b349b77509229eee3ea5a7f3fbad6737b82d2e95).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4099#issuecomment-70451197
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25743/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4099#issuecomment-70451196
  
  [Test build #25743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25743/consoleFull)
 for   PR 4099 at commit 
[`0014a59`](https://github.com/apache/spark/commit/0014a59c0d0d263482208c16cc8601205fe565bf).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-70452828
  
  [Test build #25742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25742/consoleFull)
 for   PR 2847 at commit 
[`eb3e4ca`](https://github.com/apache/spark/commit/eb3e4ca0709696b6b2b8afd1cfc56a5a9f87555d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2847#issuecomment-70452833
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25742/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70453837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25740/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/4098#discussion_r23146186
  
--- Diff: 
core/src/test/scala/org/apache/spark/serializer/SerializationDebuggerSuite.scala
 ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.serializer
+
+import java.io.{ObjectOutput, ObjectInput}
+
+import org.scalatest.{BeforeAndAfterEach, FunSuite}
+
+
+class SerializationDebuggerSuite extends FunSuite with BeforeAndAfterEach {
+
+  import SerializationDebugger.find
+
+  override def beforeEach(): Unit = {
+SerializationDebugger.enableDebugging = true
+  }
+
+  test(primitives, strings, and nulls) {
+assert(find(1) === List.empty)
+assert(find(1L) === List.empty)
+assert(find(1.toShort) === List.empty)
+assert(find(1.0) === List.empty)
+assert(find(1) === List.empty)
+assert(find(null) === List.empty)
+  }
+
+  test(primitive arrays) {
+assert(find(Array[Int](1, 2)) === List.empty)
+assert(find(Array[Long](1, 2)) === List.empty)
+  }
+
+  test(non-primitive arrays) {
+assert(find(Array(aa, bb)) === List.empty)
+assert(find(Array(new SerializableClass1)) === List.empty)
+  }
+
+  test(serializable object) {
+assert(find(new Foo(1, b, 'c', 'd', null, null, null)) === 
List.empty)
+  }
+
+  test(nested arrays) {
+val foo1 = new Foo(1, b, 'c', 'd', null, null, null)
+val foo2 = new Foo(1, b, 'c', 'd', null, Array(foo1), null)
+assert(find(new Foo(1, b, 'c', 'd', null, Array(foo2), null)) === 
List.empty)
+  }
+
+  test(nested objects) {
+val foo1 = new Foo(1, b, 'c', 'd', null, null, null)
+val foo2 = new Foo(1, b, 'c', 'd', null, null, foo1)
+assert(find(new Foo(1, b, 'c', 'd', null, null, foo2)) === 
List.empty)
+  }
+
+  test(cycles (should not loop forever)) {
+val foo1 = new Foo(1, b, 'c', 'd', null, null, null)
+foo1.g = foo1
+assert(find(new Foo(1, b, 'c', 'd', null, null, foo1)) === 
List.empty)
+  }
+
+  test(root object not serializable) {
+val s = find(new NotSerializable)
+assert(s.size === 1)
+assert(s.head.contains(NotSerializable))
+  }
+
+  test(array containing not serializable element) {
+val s = find(new SerializableArray(Array(new NotSerializable)))
+assert(s.size === 5)
+assert(s(0).contains(NotSerializable))
+assert(s(1).contains(element of array))
+assert(s(2).contains(array))
+assert(s(3).contains(arrayField))
+assert(s(4).contains(SerializableArray))
+  }
+
+  test(object containing not serializable field) {
+val s = find(new SerializableClass2(new NotSerializable))
+assert(s.size === 3)
+assert(s(0).contains(NotSerializable))
+assert(s(1).contains(objectField))
+assert(s(2).contains(SerializableClass2))
+  }
+
+  test(externalizable class writing out not serializable object) {
+val s = find(new ExternalizableClass)
+assert(s.size === 5)
+assert(s(0).contains(NotSerializable))
+assert(s(1).contains(objectField))
+assert(s(2).contains(SerializableClass2))
+assert(s(3).contains(writeExternal))
+assert(s(4).contains(ExternalizableClass))
+  }
+}
+
+
+class SerializableClass1 extends Serializable
+
+
+class SerializableClass2(val objectField: Object) extends Serializable
+
+
+class SerializableArray(val arrayField: Array[Object]) extends Serializable
+
+
+class ExternalizableClass extends java.io.Externalizable {
+  override def writeExternal(out: ObjectOutput): Unit = {
+out.writeInt(1)
+out.writeObject(new SerializableClass2(new NotSerializable))
+  }
+
+  override def readExternal(in: ObjectInput): Unit = {}
+}
+
+
+class Foo(
+a: Int,
+b: 

[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70447201
  
  [Test build #25736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25736/consoleFull)
 for   PR 3946 at commit 
[`fb507df`](https://github.com/apache/spark/commit/fb507df555db0084ea7d91ae8a7167d0164480c0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...

2015-01-18 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4094#issuecomment-70448190
  
Hey @idanz first of all, we should add some comments to the code 
referencing SPARK-822, so that we don't go through this all over again for the 
core Spark API.

Second, maybe we should have a configuration option in Spark SQL that 
allows you to tune this for input tables there (if that's doable). It would be 
more narrowly scoped to only Spark SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70450761
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25736/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5100][SQL] add thriftserver-ui support

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3946#issuecomment-70450760
  
  [Test build #25736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25736/consoleFull)
 for   PR 3946 at commit 
[`fb507df`](https://github.com/apache/spark/commit/fb507df555db0084ea7d91ae8a7167d0164480c0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5022] [Sql] Change VectorUDT to object

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4099#issuecomment-70450843
  
  [Test build #25743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25743/consoleFull)
 for   PR 4099 at commit 
[`0014a59`](https://github.com/apache/spark/commit/0014a59c0d0d263482208c16cc8601205fe565bf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70454498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25741/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger - take 2

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4098#issuecomment-70454492
  
  [Test build #25741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25741/consoleFull)
 for   PR 4098 at commit 
[`572d0cb`](https://github.com/apache/spark/commit/572d0cbfdbbd816d45290b38c6c6c86d2447efdc).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `val elem = sarray (class $`
  * `val elem = sexternalizable object (class $`
  * `val elem = sobject (class $`
  * `  implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) 
extends AnyVal `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5297][Streaming] Fix Java file stream t...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4101#issuecomment-70454568
  
  [Test build #25749 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25749/consoleFull)
 for   PR 4101 at commit 
[`ec0131c`](https://github.com/apache/spark/commit/ec0131c1a2f4a4097d6d7b2f8a27d7abbf39b746).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][Minor] Refactors deeply nested FP style ...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4091#issuecomment-70400460
  
  [Test build #25718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25718/consoleFull)
 for   PR 4091 at commit 
[`cd8860b`](https://github.com/apache/spark/commit/cd8860bf30ad99480794f85529cfcc7230ba01ee).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70400053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25714/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5186] [MLLIB] Vector.equals and Vector....

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3997#issuecomment-70400050
  
  [Test build #25714 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25714/consoleFull)
 for   PR 3997 at commit 
[`93f0d46`](https://github.com/apache/spark/commit/93f0d461487f9582a6bc2a34f09179dbe8672d3d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5307] SerializationDebugger to help deb...

2015-01-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4093#issuecomment-70400534
  
  [Test build #25716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25716/consoleFull)
 for   PR 4093 at commit 
[`bde6512`](https://github.com/apache/spark/commit/bde6512a55765a48ca74f321068f9ab91516edae).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `out.stack.map(o = s  - $o (class $`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-18 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-70434374
  
@hhbyyh  Yes, please review the design doc linked from the JIRA.  There is 
quite a bit of functionality which will not be in this initial PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL][minor] Put DataTypes.java in java dir.

2015-01-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4097


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...

2015-01-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3571#issuecomment-70435538
  
Hi @jacek-lewandowski,

Thanks for bringing this up to date.  I took a quick pass through and left 
some minor comments.

Just to clarify: this only adds SSL support for internal HttpServer and 
Akka traffic, and not the Spark web UI?  When we last discussed this in #2739, 
I think the idea was that SSLOptions could use namespaced configs in order to 
allow the web UI to use different SSL configurations than, say, Akka.  I see 
that there's some namespace support built into this patch (the `ns` argument to 
`parse`); is this support sufficient to support HTTPs in the UI?  Also, does it 
support scenarios where I want to enable SSL only for the UI or only for Akka?  
Settings like `spark.ssl.enabled` sound like they're systemwide settings, so we 
should think through how these might interact with different UI configurations, 
etc.  I'm not asking to implement SSL for the UI in this patch, but I'd like to 
just make sure that the SSLOptions configuration code will be compatible with 
it.

It would be great if you could add a short summary of this PR's changes to 
the PR description, since that description will become this PR's commit message.

There's a big block comment at the top of `SecurityManager.scala` which 
should be updated to reflect this PR's changes (it currently says  We 
currently do not support SSL (https) ...).

It would also be great to add a small section to the security documentation 
(`docs/security.md`) to mention how to configure this.  The documentation 
should mention the relevant Spark options, describe how/why someone would use 
the `useNodeLocalConf` setting, etc.  It could also contain a pointer to 
external instructions for generating your own keystore / truststores, etc., 
since this isn't a trivial process.  The new configuration options should also 
be documented in `docs/configuration.md` alongside the other security 
configurations.  In addition, the documentation should describe how the key 
stores are / aren't distributed depending on the choice of cluster manager.  If 
this works in fundamentally different ways on different cluster managers, then 
the docs should make this clear so users know what to expect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-18 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/4074#discussion_r23139343
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 
---
@@ -436,6 +436,12 @@ trait JavaRDDLike[T, This : JavaRDDLike[T, This]] 
extends Serializable {
   def first(): T = rdd.first()
 
   /**
+   * @return true if and only if the RDD contains no elements at all. Note 
that an RDD
+   * may be empty even when it has at least 1 partition.
+   */
+  def isEmpty(): Boolean = rdd.isEmpty()
--- End diff --

Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5270 [CORE] Elegantly check if RDD is em...

2015-01-18 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4074#issuecomment-70435986
  
LGTM @srowen - are you still working on it or is it good from your end? 
Will leave a bit of time for others to comment as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-18 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-70436107
  
Okay - @AdamGS thanks for sending this patch but I think we'll pass on 
adding this API. Overall we're pretty conservative with adding API's like this 
if there isn't a compelling reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-18 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-70436115
  
Let's close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4920][UI]: back port the PR-3763 to bra...

2015-01-18 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/3768


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...

2015-01-18 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4094#issuecomment-70436485
  
Yeah, this has always been broken. What's even more confusing is what 
Hadoop actually does with this minSplits if you trace the code through Hadoop - 
I remember looking through it and the logic on the Hadoop side is really 
complicated. @idanz can you create a JIRA for this? Also, can you explain what 
Hadoop is actually doing with this parameter - IIRC it's not as simple as what 
it appears to be.

An issue with changing this is that we could cause behavior to change in a 
very unexpected way for Hadoop RDD's. Right now this is effectively a no-op 
because it is almost always set to 2. I've only seen it affect things when 
someone is running a file in local mode that really could have been processed 
with a single spit.

If we change it, it could affect user applications a bunch. For instance in 
a large cluster it will actually cause all reads of Hadoop files to be split 
over # cores tasks, even if there are just a small amount of data in the 
file. That might not be desirable.

I wonder if we should just set it to 2 (i.e. hard code it) and just add a 
note saying it's set this way for legacy reasons, and that really users should 
pass in their own minSplits when creating a hadoopRDD if they want to control 
the read splits.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: use defaultParallelism for defaultMinPartition...

2015-01-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4094#issuecomment-70436546
  
Here's some links to prior discussions of this:

- https://issues.apache.org/jira/browse/SPARK-822
- https://github.com/mesos/spark/pull/718


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5257] [MLlib] SparseVector indices must...

2015-01-18 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4096#issuecomment-70436617
  
@MechCoder Similar to #3791, this will significantly hurt performance. 
Having indices being nonnegative and ordered is a contract. If you want to 
ensure these, please use the factory method `Vectors.sparse(size, entries)` to 
construct a sparse vector. Do you minding close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >