[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2015-04-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/216 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2015-03-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-81983258 @LIDIAgroup Sorry that I don't have enough bandwidth to review this PR. Since there are unresolved performance issues, do you mind closing this PR for now? I recommend

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-11-02 Thread leizongxiong
Github user leizongxiong commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-61396457 does the branch can be published with spark 1.2.0 version @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-11-02 Thread leizongxiong
Github user leizongxiong commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r19710343 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,276 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-54694750 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16053704 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,276 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16053868 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16054011 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16053983 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16054027 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16054017 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r16053912 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerModel.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-08-11 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-51784545 @mengxr I've tested the code on few examples after making it compatible with the current version of `LabeledPoint`. It seems to work and produce results similar to what

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-04-02 Thread LIDIAgroup
Github user LIDIAgroup commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11196809 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/ArrayAccumulator.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-04-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11167994 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-04-01 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11168845 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,317 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-29 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38988486 @LIDIAgroup Thanks for the update! The new code didn't pass the style check. Please run `sbt/sbt scalastyle` to see the error messages! I saw the following from Travis log

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-28 Thread LIDIAgroup
Github user LIDIAgroup commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11065687 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-27 Thread LIDIAgroup
Github user LIDIAgroup commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11019751 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-27 Thread LIDIAgroup
Github user LIDIAgroup commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r11019805 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-26 Thread LIDIAgroup
Github user LIDIAgroup commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10969423 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/InfoTheory.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-26 Thread LIDIAgroup
Github user LIDIAgroup commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38718686 I'll make some changes that, imho, will improve the discretizer in some aspects: 1. I'll change the accumulator from a `Map` to an `Array`. This implies collecting

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38727471 @LIDIAgroup For the second item, it is very common to have different training and descretizing data. For example, we have a labeled dataset containing a subset of members,

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread LIDIAgroup
Github user LIDIAgroup commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38559061 We've tried to follow all suggestions made by @mengxr. If you feel that we should make any other change, please don't hesitate to tell us, we're are willing to discuss

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38592747 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38592748 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10941192 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10946507 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10946987 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947062 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947486 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947517 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947586 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947648 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38607525 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38607527 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13438/ --- If your

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10947993 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala --- @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10948196 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10951390 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/Utils.scala --- @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10951721 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizerSuite.scala --- @@ -0,0 +1,71 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10951916 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10953908 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10954204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EntropyMinimizationDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-25 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38622854 @LIDIAgroup , I made one pass through the code. My major concern is the complexity of the algorithm. Could you help answer the following questions? 0. What is the

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38471741 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38472118 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38472286 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38472412 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/216#issuecomment-38472415 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13400/ --- If your

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10897487 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/EMDDiscretizer.scala --- @@ -0,0 +1,402 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10897825 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala --- @@ -0,0 +1,53 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10898196 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/MapAccumulator.scala --- @@ -0,0 +1,53 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10898273 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala --- @@ -0,0 +1,60 @@ +package

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10898285 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala --- @@ -0,0 +1,60 @@ +package

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10898309 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala --- @@ -0,0 +1,60 @@ +package

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10898397 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/discretization/EMDDiscretizerSuite.scala --- @@ -0,0 +1,60 @@ +package

[GitHub] spark pull request: [SPARK-1303] [MLLIB] Added discretization capa...

2014-03-24 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/216#discussion_r10897344 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/discretization/DiscretizerModel.scala --- @@ -0,0 +1,47 @@ +/* +* Licensed to the Apache