Github user sabhyankar closed the pull request at:
https://github.com/apache/spark/pull/8241
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-139660661
closing per discussion on parent Jira
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-139096528
@sabhyankar Thanks for discussing these issues. Given the various
arguments, I'd vote for:
* eliminating the trait --- It's a nice idea but seems like it saves
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135895566
@sabhyankar Yes, the first issue also applies to the current approach. But
I was expecting to solve it with this effort (broadcast less and only once). It
is awkward to
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135881841
Hi @mengxr The current approach of rebroadcasting on every predict
broadcasts the entire model and so I suppose the first issue you identified
also applies to the
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469129
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135482010
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469703
[Test build #41697 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/consoleFull)
for PR 8241 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135481837
[Test build #41697 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41697/console)
for PR 8241 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135469158
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135468036
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135482014
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135629255
Couple issues. Some were already mentioned by @feynmanliang :
1. Broadcasting the entire model. For example, we might just need some
pieces of it. For linear
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135118506
@jkbradley and I talked offline, summary: lets make it `private[mllib]` and
clearly document in the trait's scaladoc that it's to prevent rebroadcasting
during
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r38010012
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135121895
@feynmanliang - That sounds good - I can document the intent of the
Broadcastable trait in the scaladoc. The reason I made this trait
private[spark] is because we
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r38011826
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135046260
@jkbradley @feynmanliang I can certainly update the PR and change the
filename and trait name to be the same (Broadcastable).
I understand the concern
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37980863
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135169071
I'm OK with it being private to spark (not mllib) as long as the docs warn
about usage.
---
If your project is set up for it, you can reply to this email and have
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135254343
LGTM pending tests
@jkbradley @mengxr can you trigger tests?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-135241069
Sounds good @jkbradley . I dont think there is anything pending on my side
for this PR. Let me know if you see something otherwise. I will update the
other PRs once
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134796319
@feynmanliang @jkbradley thank you for the review guys. Traits will not
allow context bound (to ClassTag) for type parameters and so I ended up using
implicits in
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134831829
OK, I understand why the ClassTag is necessary, thanks.
@jkbradley @sabhyankar I have some concerns with this PR:
1. In the other PRs, `lclBcModel`
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134803105
@sabhyankar why is the ClassTag necessary? What's the error you're getting?
You can tag multiple JIRA issues in a single PR, see #7507 for example
---
If
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134806330
@feynmanliang without the ClassTag, the compiler complains that no ClassTag
is available for our type T.
No ClassTag available for T
[error] case
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37946459
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -21,16 +21,15 @@ import java.lang.{Iterable = JIterable}
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134810307
I think the classtag is needed because Broadcast needs the classtag.
@feynmanliang This trait is for spark.mllib too, not just spark.ml. (Model
is only a
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134809744
@sabhyankar Actually, I'd recommend just keeping 1 PR per class. I think
that's easier for avoiding conflicts, even if the changes are all pretty
similar.
---
If
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134826539
@jkbradley I have resolved the conflicts and updated the PR with the
changes identified. Let me know if you see any issues. Since I am not combining
PRs, this will
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37936395
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933321
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933318
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37933322
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771542
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771502
I'll make some comments, but can you please resolved conflicts here?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134771964
I like this approach. I'll check again after the conflicts have been
resolved. Thanks!
---
If your project is set up for it, you can reply to this email and have
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37934185
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37938624
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/util/ModelBroadcast.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-134412363
@sabhyankar looks good overall:
* Do you mind resolving merge conflicts?
* Can you put #8248 #8247 #8243 #8241 and #8249 all in one PR and close
the
Github user sabhyankar commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37410650
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] (
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132631199
Hi @holdenk I have moved the common logic to a trait and am mixin it with
the model. Let me know if you see something that needs to be corrected. I am
still getting
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132745648
Awesome, great work :) Looking at the trait, it seems like you could maybe
even move the var inside of the trait. Also passing the RDD around to get the
context out of
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132746927
I know my own PRs get triggered by jenkins, lets see if I can tell jenkins
this is good to test or if we will need to get someone else.
---
If your project is set up
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132805558
I forgot to mention, I am also only passing in the SparkContext now. I
should have done that to begin with :(
---
If your project is set up for it, you can reply to
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132805093
Thanks @holdenk ! That makes sense. I moved the private var to the trait.
Let me know if you see something else that is out of place. I believe the
generic types I
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132836131
Glad to help, maybe @jkbradley can say if this is ok to test to jenkins :)
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132362277
Thanks for pointing that out @holdenk ! I have pushed a change to the PR!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132374365
@holdenk Not sure if you are reviewing the other PRs, but the fix should
now be in all of them. Thx!
---
If your project is set up for it, you can reply to this
Github user sabhyankar commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132364417
@holdenk yep :) I am updating those as well!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132365264
Great :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37352728
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,12 @@ class NaiveBayesModel private[spark] (
}
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37364912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -19,6 +19,8 @@ package org.apache.spark.mllib.classification
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132387819
I can take a look at some of the others too, I'm not a committer though so
will still need a follow up review, but I can take a first pass :)
---
If your project is
Github user holdenk commented on a diff in the pull request:
https://github.com/apache/spark/pull/8241#discussion_r37365280
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -83,9 +86,13 @@ class NaiveBayesModel private[spark] (
}
Github user holdenk commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-132363721
Awesome, it also seemed to be in many of your related PRs, you might want
to update those as well.
---
If your project is set up for it, you can reply to this email
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/8241#issuecomment-131855133
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user sabhyankar opened a pull request:
https://github.com/apache/spark/pull/8241
[SPARK-10017] [MLlib]: ML model broadcasts should be stored in private
vars: mllib NaiveBayes
ML model broadcasts should be stored in private vars: mllib NaiveBayes
You can merge this pull
58 matches
Mail list logo