[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-09-11 Thread sabhyankar
Github user sabhyankar closed the pull request at: https://github.com/apache/spark/pull/8241 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-09-11 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-139660661 closing per discussion on parent Jira --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-09-09 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-139096528 @sabhyankar Thanks for discussing these issues. Given the various arguments, I'd vote for: * eliminating the trait --- It's a nice idea but seems like it saves

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-28 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135895566 @sabhyankar Yes, the first issue also applies to the current approach. But I was expecting to solve it with this effort (broadcast less and only once). It is awkward to

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-28 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135881841 Hi @mengxr The current approach of rebroadcasting on every predict broadcasts the entire model and so I suppose the first issue you identified also applies to the

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135469129 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135482010 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135469703 [Test build #41697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/consoleFull) for PR 8241 at commit

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135481837 [Test build #41697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/console) for PR 8241 at commit

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135469158 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135468036 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135482014 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-27 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135629255 Couple issues. Some were already mentioned by @feynmanliang : 1. Broadcasting the entire model. For example, we might just need some pieces of it. For linear

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135118506 @jkbradley and I talked offline, summary: lets make it `private[mllib]` and clearly document in the trait's scaladoc that it's to prevent rebroadcasting during

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r38010012 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135121895 @feynmanliang - That sounds good - I can document the intent of the Broadcastable trait in the scaladoc. The reason I made this trait private[spark] is because we

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread sabhyankar
Github user sabhyankar commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r38011826 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135046260 @jkbradley @feynmanliang I can certainly update the PR and change the filename and trait name to be the same (Broadcastable). I understand the concern

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread sabhyankar
Github user sabhyankar commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37980863 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135169071 I'm OK with it being private to spark (not mllib) as long as the docs warn about usage. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135254343 LGTM pending tests @jkbradley @mengxr can you trigger tests? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-26 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-135241069 Sounds good @jkbradley . I dont think there is anything pending on my side for this PR. Let me know if you see something otherwise. I will update the other PRs once

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134796319 @feynmanliang @jkbradley thank you for the review guys. Traits will not allow context bound (to ClassTag) for type parameters and so I ended up using implicits in

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134831829 OK, I understand why the ClassTag is necessary, thanks. @jkbradley @sabhyankar I have some concerns with this PR: 1. In the other PRs, `lclBcModel`

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134803105 @sabhyankar why is the ClassTag necessary? What's the error you're getting? You can tag multiple JIRA issues in a single PR, see #7507 for example --- If

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134806330 @feynmanliang without the ClassTag, the compiler complains that no ClassTag is available for our type T. No ClassTag available for T [error] case

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37946459 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134810307 I think the classtag is needed because Broadcast needs the classtag. @feynmanliang This trait is for spark.mllib too, not just spark.ml. (Model is only a

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134809744 @sabhyankar Actually, I'd recommend just keeping 1 PR per class. I think that's easier for avoiding conflicts, even if the changes are all pretty similar. --- If

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134826539 @jkbradley I have resolved the conflicts and updated the PR with the changes identified. Let me know if you see any issues. Since I am not combining PRs, this will

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37936395 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37933321 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37933318 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37933322 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134771542 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134771502 I'll make some comments, but can you please resolved conflicts here? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134771964 I like this approach. I'll check again after the conflicts have been resolved. Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37934185 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-25 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37938624 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-24 Thread feynmanliang
Github user feynmanliang commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-134412363 @sabhyankar looks good overall: * Do you mind resolving merge conflicts? * Can you put #8248 #8247 #8243 #8241 and #8249 all in one PR and close the

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread sabhyankar
Github user sabhyankar commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37410650 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] (

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132631199 Hi @holdenk I have moved the common logic to a trait and am mixin it with the model. Let me know if you see something that needs to be corrected. I am still getting

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132745648 Awesome, great work :) Looking at the trait, it seems like you could maybe even move the var inside of the trait. Also passing the RDD around to get the context out of

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132746927 I know my own PRs get triggered by jenkins, lets see if I can tell jenkins this is good to test or if we will need to get someone else. --- If your project is set up

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132805558 I forgot to mention, I am also only passing in the SparkContext now. I should have done that to begin with :( --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132805093 Thanks @holdenk ! That makes sense. I moved the private var to the trait. Let me know if you see something else that is out of place. I believe the generic types I

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-19 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132836131 Glad to help, maybe @jkbradley can say if this is ok to test to jenkins :) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132362277 Thanks for pointing that out @holdenk ! I have pushed a change to the PR! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132374365 @holdenk Not sure if you are reviewing the other PRs, but the fix should now be in all of them. Thx! --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread sabhyankar
Github user sabhyankar commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132364417 @holdenk yep :) I am updating those as well! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132365264 Great :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37352728 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -83,9 +86,12 @@ class NaiveBayesModel private[spark] ( }

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37364912 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -19,6 +19,8 @@ package org.apache.spark.mllib.classification

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132387819 I can take a look at some of the others too, I'm not a committer though so will still need a follow up review, but I can take a first pass :) --- If your project is

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/8241#discussion_r37365280 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala --- @@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] ( }

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-18 Thread holdenk
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-132363721 Awesome, it also seemed to be in many of your related PRs, you might want to update those as well. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8241#issuecomment-131855133 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10017] [MLlib]: ML model broadcasts sho...

2015-08-17 Thread sabhyankar
GitHub user sabhyankar opened a pull request: https://github.com/apache/spark/pull/8241 [SPARK-10017] [MLlib]: ML model broadcasts should be stored in private vars: mllib NaiveBayes ML model broadcasts should be stored in private vars: mllib NaiveBayes You can merge this pull