[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...

2015-08-03 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/952#issuecomment-127185418 Ah. Okay. You can fix the remaining things. This build also fails on the StreamCheckpointingITCase. I'll file another JIRA ticket for that. I've observed

[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...

2015-08-03 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/952#issuecomment-127139227 ...have it actually wrap the configuration, and have all the getX() methods delegate to the config, while all the setX() methods fail with an exception

[GitHub] flink pull request: [WIP] User defined communication between tasks...

2015-08-03 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/975#issuecomment-127331693 The basic methodology is this: 1. `TaskManager` keeps asking `JobManager` for running `TaskManagers` at some interval, same as the `heartbeat` interval. 2

[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-127037112 @StephanEwen, it would be much simpler to have an interface. But then we leave the part about implementing the `setRuntimeContext` and `getRuntimeContext

[GitHub] flink pull request: [FLINK-2433][docs]Add script to build local do...

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/954#issuecomment-127041808 Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink pull request: [FLINK-2433][docs]Add script to build local do...

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/954#issuecomment-127042540 Ah. Yes, actually. I tried running the webclient on cygwin once and it didn't work [I was ignorant to the fact that I had to add those two lines in the bash

[GitHub] flink pull request: [FLINK-2459][cli]Cli API and doc fixes.

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/971#issuecomment-127055248 @StephanEwen, I've force pushed this branch to only contain the name change. You can merge this again. --- If your project is set up for it, you can reply

[GitHub] flink pull request: [FLINK-2459][cli]Cli API and doc fixes.

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/971#issuecomment-127040180 Pushed a fix for changing the name to `quiet` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...

2015-08-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/952#issuecomment-127071830 Added test to verify all setter methods are overridden by the `UnmodifiableConfiguration` class. --- If your project is set up for it, you can reply

[GitHub] flink pull request: [FLINK-2459][cli]Cli API and doc fixes.

2015-08-01 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/971#issuecomment-126943148 @mxm, you should verify if should we do away with the logging test since you reviewed it. If not, the problem is easily fixed by setting a free port instead

[GitHub] flink pull request: [FLINK-2458][FLINK-2449]Access distributed cac...

2015-08-01 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/970 [FLINK-2458][FLINK-2449]Access distributed cache entries for CollectionExecution and in Iterative tasks. 1. This PR adds support for accessing distributed cache entries when running

[GitHub] flink pull request: [FLINK-2459][cli]Cli API and doc fixes.

2015-08-01 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/971 [FLINK-2459][cli]Cli API and doc fixes. 1. Remove CliFrontendLoggingTest. Test directly that the logging flag is interpreted correctly. 2. [hotfix] Doc fix for cli api 3. [hotfix

[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/966#issuecomment-126706999 Ah yes. I'll update them in a while. There's actually some problem with the unit test I've written too. Travis fails sporadically. --- If your project is set up

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126672392 Yeah. I think we should remove it then. Too many entries tend to confuse people. --- If your project is set up for it, you can reply to this email and have your

[GitHub] flink pull request: [FLINK-1819][core]Allow access to RuntimeConte...

2015-07-31 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/966 [FLINK-1819][core]Allow access to RuntimeContext from Input and Output formats 1. Introduces new Rich Input and Output formats, similar to Rich Functions. 2. Makes all existing input

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126672811 I've updated the code to remove any changes in Configuration. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126675074 Sure. No problem. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126695988 Travis build passes. You can merge it @mxm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/957#discussion_r35967625 --- Diff: flink-clients/src/test/java/org/apache/flink/client/CliFrontendLoggingTest.java --- @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126614333 @mxm , updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126623148 Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126661277 yarnfifo case failure. nothing to do with the changes made in this PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] flink pull request: [FLINK-2248][client]Add flag to disable sysout...

2015-07-31 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/957#issuecomment-126642822 There was no specific need for it. However, since the CliFrontend only passes the Client the configuration, I decided to include it with that. Further, I think

[GitHub] flink pull request: [FLINK-2238][api]Add env.fromCollection(set) m...

2015-07-30 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/956 [FLINK-2238][api]Add env.fromCollection(set) method to scala api Used the same technique as in env.fromCollection(Seq) to first convert the data to a Java Collection. You can merge this pull

[GitHub] flink pull request: [FLINK-2238][scala api]Add env.fromCollection(...

2015-07-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/956#issuecomment-126257935 Ah. Yes. That does make more sense. :+1: Updating now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] flink pull request: [FLINK-2248]Add flag to disable sysout logging...

2015-07-30 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/957 [FLINK-2248]Add flag to disable sysout logging from cli Enables disabling of sysout messages on cli via a flag `q`. You can merge this pull request into a Git repository by running

[GitHub] flink pull request: [FLINK-2425]Provide access to task manager con...

2015-07-29 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/952 [FLINK-2425]Provide access to task manager configuration from RuntimeEnvironment Also fixes [FLINK-2426]: Define an UnmodifiableConfiguration class which doesn't allow modifications

[GitHub] flink pull request: [FLINK-2433][docs]Add script to build local do...

2015-07-29 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/954 [FLINK-2433][docs]Add script to build local documentation on windows You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachingoel0101/flink

[GitHub] flink pull request: [FLINK-2404]Primitive add methods for Accumula...

2015-07-28 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/942#issuecomment-125604252 No one will actually find out unless they specifically went through the documentation for these. Most people would only ever see the docs for Accumulator

[GitHub] flink pull request: [FLINK-2404]Primitive add methods for Accumula...

2015-07-28 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/942#issuecomment-125654972 Sure. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] flink pull request: [FLINK-2404]Primitive add methods for Accumula...

2015-07-28 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/942#issuecomment-125597443 Should we add some kind of annotation to suggest the usage of the primitive functions? We can't deprecate the older ones probably. --- If your project is set up

[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-07-28 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/945#issuecomment-125734265 I had decided to work with my own understanding of what version means since nobody replied to the JIRA comment. getClass.getPackage.getImplementationVersion

[GitHub] flink pull request: [FLINK-2399] Version checks for Job Manager an...

2015-07-27 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at: https://github.com/apache/flink/pull/944 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink pull request: [FLINK-2312][utils] Randomly Splitting a Data ...

2015-07-21 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/921#issuecomment-123390786 This leads to non-mutually exclusive splits. I tracked down the reason for this: The input data is parallelized differently while performing the splits for every

[GitHub] flink pull request: [FLINK-1723] [ml] [WIP] Add cross validation f...

2015-07-20 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/891#discussion_r34973783 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/CrossValidation.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed

[GitHub] flink pull request: [FLINK-1723] [ml] [WIP] Add cross validation f...

2015-07-20 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/891#discussion_r34975724 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/CrossValidation.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-07-19 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-122705520 This now also incorporates [Flink-2379]. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink pull request: [FLINK-2312][ml][WIP] Randomly Splitting a Dat...

2015-07-17 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/921 [FLINK-2312][ml][WIP] Randomly Splitting a Data Set according to weights given Adds a method for randomly splitting a data set. However, there are a few problems. We're effectively

[GitHub] flink pull request: [Flink-1727][ml]Decision tree[WIP]

2015-07-17 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/710#issuecomment-122379328 The changes proposed by Theodore in the PR #861 have been incorporated here too. This can be reviewed now, and merging this will also close #861. --- If your

[GitHub] flink pull request: [FLINK-2368][ml][WIP]Adds convergence criteria

2015-07-17 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/918#issuecomment-122382475 Ah. Okay. No problem. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] flink pull request: [FLINK-1723] [ml] [WIP] Add cross validation f...

2015-07-17 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/891#discussion_r34923042 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/evaluation/CrossValidation.scala --- @@ -0,0 +1,97 @@ +/* + * Licensed

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-07-16 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-121924666 Okay. @tillrohrmann, can you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] flink pull request: [FLINK-2368][ml]Adds convergence criteria [WIP...

2015-07-16 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/918#discussion_r34788837 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/SVM.scala --- @@ -382,7 +402,11 @@ object SVM

[GitHub] flink pull request: [FLINK-2368][ml]Adds convergence criteria [WIP...

2015-07-16 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/918 [FLINK-2368][ml]Adds convergence criteria [WIP] Adds a convergence criteria class which allows the user to decide whether they want to terminate training at any point, based on the solutions

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-07-15 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-121605233 @thvasilo I've incorporated different initialization strategies in the KMeans algorithm itself. Please review. --- If your project is set up for it, you can

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-07-15 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-121605463 @thvasilo, @tillrohrmann, I'm still waiting for a decision on this. It would be impossible to work further on the decision tree PR until this is merged

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-07-01 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117722491 @thvasilo , right now, there aren't other features in the library which need sampling. Perhaps it isn't a good idea to file a separate feature request

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-07-01 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-117731195 @peedeeX21 , try this link: https://github.com/sachingoel0101/flink/compare/clustering_initializations...peedeeX21:feature_kmeans I had a lot of trouble

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-07-01 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-117723450 @thvasilo , how do I merge this PR into mine? Maybe @peedeeX21 can create a pull request to my branch at https://github.com/sachingoel0101/flink/tree

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117049680 Further, the probability distribution doesn't need to be scaled down to between [0,1]. We just take care that of while building the cumulative distribution

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117047575 Hi @thvasilo, thanks for taking the time to go through it. Consider for example a probability distribution P(X_0) = 0.2, P(X_1) = 0.3, P(X_2) = 0.5

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117051023 Sorry about the formatting though. I'll fix it. I haven't worked on this in a while. I'll incorporate your suggestions from the previous PR. --- If your

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117055846 Okay. I'll update it today itself with a few trivial fixes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-117060737 Hi. IMO, the purpose of learning is to develop a model which compactly represents the data somehow. Thus, having a distributed model doesn't make sense. Besides

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at: https://github.com/apache/flink/pull/757 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-06-30 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/757#issuecomment-117220314 Hey @thvasilo , I'm going to break up this PR further. The motivation is that, the Sampling code should be available as a general feature. Given a probability

[GitHub] flink pull request: [FLINK-2131][ml]: Initialization schemes for k...

2015-06-30 Thread sachingoel0101
GitHub user sachingoel0101 reopened a pull request: https://github.com/apache/flink/pull/757 [FLINK-2131][ml]: Initialization schemes for k-means clustering This adds two most common initialization strategies for the k-means clustering algorithm, namely, Random initialization

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-26 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r33358232 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/ContinuousHistogram.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115210476 Okay. So I guess we can leave adding a createHistogram function to DataSetUtils for now [It would also require utilizing the FlinkMLTools.block for an efficient

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115204540 How should I import a class in flink.ml.math from say, flink-java? I tried adding flink-staging as a dependency to pom.xml of flink-java but to no avail. I'm

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-25 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-115199291 Where should I place the Histogram implementations? Currently, they are in {{org.apache.flink.ml.math}}, but I can't import them from the flink-core where

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r33151075 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/ContinuousHistogram.scala --- @@ -0,0 +1,325 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r33150869 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/CategoricalHistogram.scala --- @@ -0,0 +1,167 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-114895657 Adding a Utility method does certainly make sense. User will be supposed to provide an argument depicting whether the values in DataSet[Double] are continuous

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-114880501 Hello Theodore, the semantics for Discrete Histogram is such that you have to specify what classes or discrete values are going to arrive. Once you fix

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-114899521 Changing the current discrete histogram implementation would not break the decision tree functionality. Although I might have to review the code for any potential

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-24 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/861#issuecomment-114901644 Okay. Sure. I will update the make the Discrete version online first. Should I try to explicitly use Scala library data structures instead of Java

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-23 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r33050123 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/math/OnlineHistogram.scala --- @@ -0,0 +1,81 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-1727][ml]Decision tree[WIP]

2015-06-23 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/710#discussion_r33046001 --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/classification/DecisionTree.scala --- @@ -0,0 +1,490 @@ +/* + * Licensed

[GitHub] flink pull request: [Flink-2030][ml]Online Histogram: Discrete and...

2015-06-23 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/861 [Flink-2030][ml]Online Histogram: Discrete and Categorical This implements the Online Histograms for both categorical and continuous data. For continuous data, we emulate a continuous

[GitHub] flink pull request: [Flink-1727][ml]Decision tree[WIP]

2015-06-22 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/710#issuecomment-114173765 The fundamental idea for a scalable decision tree algorithm is to reduce the number of splits required to be checked at every node. Ideally, we'd check for every

[GitHub] flink pull request: [FLINK-2116] [ml] Reusing predict operation fo...

2015-06-03 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/772#issuecomment-108482112 Great. This is exactly what I had in mind. There is perhaps another feature we could incorporate. Every algorithm has some performance measure to so it can

[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-06-02 Thread sachingoel0101
Github user sachingoel0101 commented on the pull request: https://github.com/apache/flink/pull/700#issuecomment-107831123 Hey guys. You might wanna look at the initialization schemes here: https://github.com/apache/flink/pull/757 --- If your project is set up for it, you can reply

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-02 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at: https://github.com/apache/flink/pull/756 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-02 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/757 [FLINK-2131]: Initialization schemes for k-means clustering This adds two most common initialization strategies for the k-means clustering algorithm, namely, Random initialization and kmeans

[GitHub] flink pull request: [FLINK-2131]: Initialization schemes for k-mea...

2015-06-02 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/756 [FLINK-2131]: Initialization schemes for k-means clustering This adds two most common initialization strategies for the k-means clustering algorithm, namely, Random initialization and kmeans

[GitHub] flink pull request: Decision tree [Flink-1727]

2015-05-21 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/708 Decision tree [Flink-1727] This implements a part of the Decision Tree Algorithm. As of now, only continuous valued fields are implemented. Also, Gini index based splitting only. Entropy

[GitHub] flink pull request: Decision tree [Flink-1727]

2015-05-21 Thread sachingoel0101
Github user sachingoel0101 closed the pull request at: https://github.com/apache/flink/pull/708 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink pull request: Decision tree [Flink-1727]

2015-05-21 Thread sachingoel0101
GitHub user sachingoel0101 opened a pull request: https://github.com/apache/flink/pull/710 Decision tree [Flink-1727] This implements a part of the Decision Tree Algorithm. As of now, only continuous valued fields are implemented. Also, Gini index based splitting only. Entropy

<    1   2   3   4   5   6