Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/216
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-81983258
@LIDIAgroup Sorry that I don't have enough bandwidth to review this PR.
Since there are unresolved performance issues, do you mind closing this PR for
now? I recommend
Github user leizongxiong commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-61396457
does the branch can be published with spark 1.2.0 version @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user leizongxiong commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r19710343
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,276 @@
+/*
+ * Licensed
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-54694750
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053704
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,276 @@
+/*
+ * Licensed to
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053868
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala
---
@@ -0,0 +1,82 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054011
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053983
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054027
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16054017
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user avulanov commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r16053912
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala
---
@@ -0,0 +1,82 @@
+/*
+ * Licensed
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-51784545
@mengxr I've tested the code on few examples after making it compatible
with the current version of `LabeledPoint`. It seems to work and produce
results similar to what
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11196809
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/ArrayAccumulator.scala
---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11167994
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11168845
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,317 @@
+/*
+ * Licensed to
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38988486
@LIDIAgroup Thanks for the update! The new code didn't pass the style
check. Please run `sbt/sbt scalastyle` to see the error messages! I saw the
following from Travis log
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11065687
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11019751
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r11019805
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed
Github user LIDIAgroup commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10969423
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38718686
I'll make some changes that, imho, will improve the discretizer in some
aspects:
1. I'll change the accumulator from a `Map` to an `Array`. This implies
collecting
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38727471
@LIDIAgroup For the second item, it is very common to have different
training and descretizing data. For example, we have a labeled dataset
containing a subset of members,
Github user LIDIAgroup commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38559061
We've tried to follow all suggestions made by @mengxr. If you feel that we
should make any other change, please don't hesitate to tell us, we're are
willing to discuss
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38592747
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38592748
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10941192
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10946507
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10946987
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947062
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947486
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947517
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947586
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947573
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947648
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38607525
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38607527
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13438/
---
If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10947993
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10948196
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951390
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951721
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala
---
@@ -0,0 +1,71 @@
+/*
+ * Licensed
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10951916
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10953908
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10954204
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+ * Licensed to
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38622854
@LIDIAgroup , I made one pass through the code. My major concern is the
complexity of the algorithm. Could you help answer the following questions?
0. What is the
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38471741
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472118
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472286
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472412
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/216#issuecomment-38472415
One or more automated tests failed
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13400/
---
If your
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897487
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/EMDDiscretizer.scala
---
@@ -0,0 +1,402 @@
+/*
+* Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897825
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,53 @@
+/*
+* Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898196
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala
---
@@ -0,0 +1,53 @@
+/*
+* Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898273
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898285
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898309
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10898397
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala
---
@@ -0,0 +1,60 @@
+package
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/216#discussion_r10897344
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala
---
@@ -0,0 +1,47 @@
+/*
+* Licensed to the Apache
58 matches
Mail list logo