[jira] [Resolved] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path
[ https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-2043. -- Resolution: Fixed Fix Version/s: 0.9 Fixed with 3586ced3550ac036638a8dff011c01de99f9ed5e > Change the KMeansDataGenerator to allow passing a custom path > - > > Key: FLINK-2043 > URL: https://issues.apache.org/jira/browse/FLINK-2043 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Robert Metzger >Assignee: pietro pinoli >Priority: Trivial > Labels: starter > Fix For: 0.9 > > > It would be nice to allow the user to specify a target path for the generated > data. > Right now, one has to pass the path by changing the tmp directory of java > {code} > java -Djava.io.tmpdir=`pwd` -cp > /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar > org.apache.flink.examples.java.clustering.util.KMeansDataGenerator > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats
[ https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1848. -- Resolution: Fixed Fixed with 7164b2b643985b99c6688b62174de42a71deb71b > Paths containing a Windows drive letter cannot be used in FileOutputFormats > --- > > Key: FLINK-1848 > URL: https://issues.apache.org/jira/browse/FLINK-1848 > Project: Flink > Issue Type: Bug >Affects Versions: 0.9 > Environment: Windows (Cygwin and native) >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Critical > Fix For: 0.9 > > > Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} > cannot be used as output path for {{FileOutputFormat}}. > If done, the following exception is thrown: > {code} > Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: > Relative path in absolute URI: file:c: > at org.apache.flink.core.fs.Path.initialize(Path.java:242) > at org.apache.flink.core.fs.Path.(Path.java:225) > at org.apache.flink.core.fs.Path.(Path.java:138) > at > org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603) > at > org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233) > at > org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158) > at > org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c: > at java.net.URI.checkPath(Unknown Source) > at java.net.URI.(Unknown Source) > at org.apache.flink.core.fs.Path.initialize(Path.java:240) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path
[ https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560022#comment-14560022 ] ASF GitHub Bot commented on FLINK-2043: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/721 > Change the KMeansDataGenerator to allow passing a custom path > - > > Key: FLINK-2043 > URL: https://issues.apache.org/jira/browse/FLINK-2043 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Robert Metzger >Assignee: pietro pinoli >Priority: Trivial > Labels: starter > > It would be nice to allow the user to specify a target path for the generated > data. > Right now, one has to pass the path by changing the tmp directory of java > {code} > java -Djava.io.tmpdir=`pwd` -cp > /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar > org.apache.flink.examples.java.clustering.util.KMeansDataGenerator > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats
[ https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560024#comment-14560024 ] ASF GitHub Bot commented on FLINK-1848: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/712 > Paths containing a Windows drive letter cannot be used in FileOutputFormats > --- > > Key: FLINK-1848 > URL: https://issues.apache.org/jira/browse/FLINK-1848 > Project: Flink > Issue Type: Bug >Affects Versions: 0.9 > Environment: Windows (Cygwin and native) >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Critical > Fix For: 0.9 > > > Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} > cannot be used as output path for {{FileOutputFormat}}. > If done, the following exception is thrown: > {code} > Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: > Relative path in absolute URI: file:c: > at org.apache.flink.core.fs.Path.initialize(Path.java:242) > at org.apache.flink.core.fs.Path.(Path.java:225) > at org.apache.flink.core.fs.Path.(Path.java:138) > at > org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603) > at > org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233) > at > org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158) > at > org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c: > at java.net.URI.checkPath(Unknown Source) > at java.net.URI.(Unknown Source) > at org.apache.flink.core.fs.Path.initialize(Path.java:240) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1848] Fix for file paths with Windows d...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/712 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/721 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance
[ https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559963#comment-14559963 ] ASF GitHub Bot commented on FLINK-1952: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/731#issuecomment-105679990 Sorry for the whitespace re-formatting. My IntelliJ settings got reverted somehow, so it started auto-reformatting code :-/ > Cannot run ConnectedComponents example: Could not allocate a slot on instance > - > > Key: FLINK-1952 > URL: https://issues.apache.org/jira/browse/FLINK-1952 > Project: Flink > Issue Type: Bug > Components: Scheduler >Affects Versions: 0.9 >Reporter: Robert Metzger >Priority: Blocker > > Steps to reproduce > {code} > ./bin/yarn-session.sh -n 350 > {code} > ... wait until they are connected ... > {code} > Number of connected TaskManagers changed to 266. Slots available: 266 > Number of connected TaskManagers changed to 323. Slots available: 323 > Number of connected TaskManagers changed to 334. Slots available: 334 > Number of connected TaskManagers changed to 343. Slots available: 343 > Number of connected TaskManagers changed to 350. Slots available: 350 > {code} > Start CC > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > ---> it runs > Run KMeans, let it fail with > {code} > Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - > execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 > - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network > buffers: required 350, but only 254 available. The total number of network > buffers is currently set to 2048. You can increase this number by setting the > configuration key 'taskmanager.network.numberOfBuffers'. > {code} > ... as expected. > (I've waited for 10 minutes between the two submissions) > Starting CC now will fail: > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > Error message(s): > {code} > Caused by: java.lang.IllegalStateException: Could not schedule consumer > vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469) > at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > ... 4 more > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ > cloud-19 - 1 slots - URL: > akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the > co-location constraint. > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475) > ... 9 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/731#issuecomment-105679811 The latest commit "Add big not so mini cluster test for CC to provoke scheduler problem" is not going to be merged. It is solely for verifying that the scheduler now correctly handles jobs with iterations and higher parallelism and many TaskManagers. You can start a 100 TaskManager cluster with that test setup in the commit and run connected components. (Give the VM 5GB heap space, then it works smoothly). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/731#issuecomment-105679990 Sorry for the whitespace re-formatting. My IntelliJ settings got reverted somehow, so it started auto-reformatting code :-/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance
[ https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559961#comment-14559961 ] ASF GitHub Bot commented on FLINK-1952: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/731#issuecomment-105679811 The latest commit "Add big not so mini cluster test for CC to provoke scheduler problem" is not going to be merged. It is solely for verifying that the scheduler now correctly handles jobs with iterations and higher parallelism and many TaskManagers. You can start a 100 TaskManager cluster with that test setup in the commit and run connected components. (Give the VM 5GB heap space, then it works smoothly). > Cannot run ConnectedComponents example: Could not allocate a slot on instance > - > > Key: FLINK-1952 > URL: https://issues.apache.org/jira/browse/FLINK-1952 > Project: Flink > Issue Type: Bug > Components: Scheduler >Affects Versions: 0.9 >Reporter: Robert Metzger >Priority: Blocker > > Steps to reproduce > {code} > ./bin/yarn-session.sh -n 350 > {code} > ... wait until they are connected ... > {code} > Number of connected TaskManagers changed to 266. Slots available: 266 > Number of connected TaskManagers changed to 323. Slots available: 323 > Number of connected TaskManagers changed to 334. Slots available: 334 > Number of connected TaskManagers changed to 343. Slots available: 343 > Number of connected TaskManagers changed to 350. Slots available: 350 > {code} > Start CC > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > ---> it runs > Run KMeans, let it fail with > {code} > Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - > execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 > - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network > buffers: required 350, but only 254 available. The total number of network > buffers is currently set to 2048. You can increase this number by setting the > configuration key 'taskmanager.network.numberOfBuffers'. > {code} > ... as expected. > (I've waited for 10 minutes between the two submissions) > Starting CC now will fail: > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > Error message(s): > {code} > Caused by: java.lang.IllegalStateException: Could not schedule consumer > vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469) > at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > ... 4 more > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ > cloud-19 - 1 slots - URL: > akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the > co-location constraint. > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475) > ... 9 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/731 [FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink slots_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/731.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #731 commit 771437360662bcf105c72d95924a5ce3c69f1585 Author: Stephan Ewen Date: 2015-05-19T17:08:25Z Add big not so mini cluster test for CC to provoke scheduler problem commit 067c3868c07ea125d8f429e38476d3e8edfbad08 Author: Stephan Ewen Date: 2015-05-20T09:37:56Z [FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler commit 88074eda5c99945d0c0f106240010a451ba41658 Author: Stephan Ewen Date: 2015-05-26T20:56:50Z [tests] Fix AvroExternalJarProgramITCase logging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance
[ https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559951#comment-14559951 ] ASF GitHub Bot commented on FLINK-1952: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/731 [FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink slots_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/731.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #731 commit 771437360662bcf105c72d95924a5ce3c69f1585 Author: Stephan Ewen Date: 2015-05-19T17:08:25Z Add big not so mini cluster test for CC to provoke scheduler problem commit 067c3868c07ea125d8f429e38476d3e8edfbad08 Author: Stephan Ewen Date: 2015-05-20T09:37:56Z [FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler commit 88074eda5c99945d0c0f106240010a451ba41658 Author: Stephan Ewen Date: 2015-05-26T20:56:50Z [tests] Fix AvroExternalJarProgramITCase logging > Cannot run ConnectedComponents example: Could not allocate a slot on instance > - > > Key: FLINK-1952 > URL: https://issues.apache.org/jira/browse/FLINK-1952 > Project: Flink > Issue Type: Bug > Components: Scheduler >Affects Versions: 0.9 >Reporter: Robert Metzger >Priority: Blocker > > Steps to reproduce > {code} > ./bin/yarn-session.sh -n 350 > {code} > ... wait until they are connected ... > {code} > Number of connected TaskManagers changed to 266. Slots available: 266 > Number of connected TaskManagers changed to 323. Slots available: 323 > Number of connected TaskManagers changed to 334. Slots available: 334 > Number of connected TaskManagers changed to 343. Slots available: 343 > Number of connected TaskManagers changed to 350. Slots available: 350 > {code} > Start CC > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > ---> it runs > Run KMeans, let it fail with > {code} > Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - > execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 > - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network > buffers: required 350, but only 254 available. The total number of network > buffers is currently set to 2048. You can increase this number by setting the > configuration key 'taskmanager.network.numberOfBuffers'. > {code} > ... as expected. > (I've waited for 10 minutes between the two submissions) > Starting CC now will fail: > {code} > ./bin/flink run -p 350 > ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar > {code} > Error message(s): > {code} > Caused by: java.lang.IllegalStateException: Could not schedule consumer > vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469) > at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) > ... 4 more > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ > cloud-19 - 1 slots - URL: > akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the > co-location constraint. > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436) > at > org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475) > ... 9 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: basic TfidfTransformer
GitHub user rbraeunlich opened a pull request: https://github.com/apache/flink/pull/730 basic TfidfTransformer Hi everybody, due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we created a first implementation of a TfIdfTranformer. There is still one problem left, because using modulo after the hashing causes collisions. Nevertheless, we would be glad to receive some comments to our implementation. Cheers, Ronny You can merge this pull request into a Git repository by running: $ git pull https://github.com/rbraeunlich/flink tfidf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #730 commit 9e9ac219b619ddfbab4f616165d038900b7726db Author: Ronny Bräunlich Date: 2015-05-15T09:18:00Z create TfIdfTransformer commit 42ef7c00a832e21d7391e1011031bda162d930f1 Author: Ronny Bräunlich Date: 2015-05-16T14:38:28Z fix import in TfIdfTranformer and add first basic test case commit 82385b764f45f955cd88590b7657467689d096ed Author: Ronny Bräunlich Date: 2015-05-15T09:18:00Z create TfIdfTransformer and add first basic test case commit 7242728b1c24027203f1ff91476de9acb9bbf3a7 Author: diva1012 Date: 2015-05-17T11:42:40Z Changes merged Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf Conflicts: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala commit 9c2c181624bb81f3ed83a4a774339251508644f1 Author: diva1012 Date: 2015-05-17T17:40:00Z Small fix of the test class. (The Sparse vector contains index -> value tuples, so we have to take only the value and not the whole tuple for the comparisson) commit 8b17385e34b7f139a2649f80edc81744277fcfae Author: diva1012 Date: 2015-05-18T06:41:58Z Word count implementation simplified. commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9 Author: diva1012 Date: 2015-05-18T11:35:43Z TF calculation fixed commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08 Author: diva1012 Date: 2015-05-19T20:44:31Z Transformer improved: now we get SparseVector for each document that contains all words. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559938#comment-14559938 ] ASF GitHub Bot commented on FLINK-1999: --- GitHub user rbraeunlich opened a pull request: https://github.com/apache/flink/pull/730 basic TfidfTransformer Hi everybody, due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we created a first implementation of a TfIdfTranformer. There is still one problem left, because using modulo after the hashing causes collisions. Nevertheless, we would be glad to receive some comments to our implementation. Cheers, Ronny You can merge this pull request into a Git repository by running: $ git pull https://github.com/rbraeunlich/flink tfidf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #730 commit 9e9ac219b619ddfbab4f616165d038900b7726db Author: Ronny Bräunlich Date: 2015-05-15T09:18:00Z create TfIdfTransformer commit 42ef7c00a832e21d7391e1011031bda162d930f1 Author: Ronny Bräunlich Date: 2015-05-16T14:38:28Z fix import in TfIdfTranformer and add first basic test case commit 82385b764f45f955cd88590b7657467689d096ed Author: Ronny Bräunlich Date: 2015-05-15T09:18:00Z create TfIdfTransformer and add first basic test case commit 7242728b1c24027203f1ff91476de9acb9bbf3a7 Author: diva1012 Date: 2015-05-17T11:42:40Z Changes merged Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf Conflicts: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala commit 9c2c181624bb81f3ed83a4a774339251508644f1 Author: diva1012 Date: 2015-05-17T17:40:00Z Small fix of the test class. (The Sparse vector contains index -> value tuples, so we have to take only the value and not the whole tuple for the comparisson) commit 8b17385e34b7f139a2649f80edc81744277fcfae Author: diva1012 Date: 2015-05-18T06:41:58Z Word count implementation simplified. commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9 Author: diva1012 Date: 2015-05-18T11:35:43Z TF calculation fixed commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08 Author: diva1012 Date: 2015-05-19T20:44:31Z Transformer improved: now we get SparseVector for each document that contains all words. > TF-IDF transformer > -- > > Key: FLINK-1999 > URL: https://issues.apache.org/jira/browse/FLINK-1999 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Ronny Bräunlich >Assignee: Alexander Alexandrov >Priority: Minor > Labels: ML > > Hello everybody, > we are a group of three students from TU Berlin (I guess we're not the first > group creating an issue) and we want to/have to implement a tf-idf tranformer > for Flink. > Our lecturer Alexander told us that we could get some guidance here and that > you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods
[ https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-2012. -- Resolution: Implemented Fix Version/s: 0.9 > addVertices, addEdges, removeVertices, removeEdges methods > -- > > Key: FLINK-2012 > URL: https://issues.apache.org/jira/browse/FLINK-2012 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.9 >Reporter: Andra Lungu >Assignee: Andra Lungu >Priority: Minor > Fix For: 0.9 > > > Currently, Gelly only allows the addition/deletion of one vertex/edge at a > time. If a user would want to add two (or more) vertices, he/she would need > to add a vertex-> create a new graph; then add another vertex -> another > graph etc. > It would be nice to also have addVertices, addEdges, removeVertices, > removeEdges methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1941) Add documentation for Gelly-GSA
[ https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-1941. -- Resolution: Fixed > Add documentation for Gelly-GSA > --- > > Key: FLINK-1941 > URL: https://issues.apache.org/jira/browse/FLINK-1941 > Project: Flink > Issue Type: Task > Components: Gelly >Affects Versions: 0.9 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Labels: docs, gelly > > Add a section in the Gelly guide to describe the newly introduced > Gather-Sum-Apply iteration method. Show how GSA uses delta iterations > internally and explain the differences of this model as compared to > vertex-centric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/722 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2012][gelly] Added methods to remove/ad...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/678 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA
[ https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559836#comment-14559836 ] ASF GitHub Bot commented on FLINK-1941: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/722 > Add documentation for Gelly-GSA > --- > > Key: FLINK-1941 > URL: https://issues.apache.org/jira/browse/FLINK-1941 > Project: Flink > Issue Type: Task > Components: Gelly >Affects Versions: 0.9 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Labels: docs, gelly > > Add a section in the Gelly guide to describe the newly introduced > Gather-Sum-Apply iteration method. Show how GSA uses delta iterations > internally and explain the differences of this model as compared to > vertex-centric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods
[ https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559835#comment-14559835 ] ASF GitHub Bot commented on FLINK-2012: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/678 > addVertices, addEdges, removeVertices, removeEdges methods > -- > > Key: FLINK-2012 > URL: https://issues.apache.org/jira/browse/FLINK-2012 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.9 >Reporter: Andra Lungu >Assignee: Andra Lungu >Priority: Minor > > Currently, Gelly only allows the addition/deletion of one vertex/edge at a > time. If a user would want to add two (or more) vertices, he/she would need > to add a vertex-> create a new graph; then add another vertex -> another > graph etc. > It would be nice to also have addVertices, addEdges, removeVertices, > removeEdges methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [tez] Fix initialization of MemoryManager.
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/728 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Issue Comment Deleted] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Faye Beligianni updated FLINK-2069: --- Comment: was deleted (was: I am not sure if this will help you, but I noticed the following: If the job switches from running to canceled immediately, then the csv file is created, and since the job is canceled the file remains and is not deleted. I tried this 5-6 times and every time the same happened every time. ) > writeAsCSV function in DataStream Scala API creates no file > --- > > Key: FLINK-2069 > URL: https://issues.apache.org/jira/browse/FLINK-2069 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Faye Beligianni >Priority: Blocker > Labels: Streaming > Fix For: 0.9 > > > When the {{writeAsCSV}} function is used in the DataStream Scala API, no file > is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559772#comment-14559772 ] Faye Beligianni commented on FLINK-2069: Of course the file is empty but at least it is created. > writeAsCSV function in DataStream Scala API creates no file > --- > > Key: FLINK-2069 > URL: https://issues.apache.org/jira/browse/FLINK-2069 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Faye Beligianni >Priority: Blocker > Labels: Streaming > Fix For: 0.9 > > > When the {{writeAsCSV}} function is used in the DataStream Scala API, no file > is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559767#comment-14559767 ] Faye Beligianni edited comment on FLINK-2069 at 5/26/15 8:11 PM: - I am not sure if this will help you, but I noticed the following: If the job switches from running to canceled immediately, then the csv file is created, and since the job is canceled the file remains and is not deleted. I tried this 5-6 times and every time the same happened. was (Author: fobeligi): I am not sure if this will help you, but I noticed the following: If the job switches from running to canceled immediately, then the csv file is created, and since the job is canceled the file remains and is not deleted. I tried this 5-6 times and every time the same happened every time. > writeAsCSV function in DataStream Scala API creates no file > --- > > Key: FLINK-2069 > URL: https://issues.apache.org/jira/browse/FLINK-2069 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Faye Beligianni >Priority: Blocker > Labels: Streaming > Fix For: 0.9 > > > When the {{writeAsCSV}} function is used in the DataStream Scala API, no file > is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559768#comment-14559768 ] Faye Beligianni commented on FLINK-2069: I am not sure if this will help you, but I noticed the following: If the job switches from running to canceled immediately, then the csv file is created, and since the job is canceled the file remains and is not deleted. I tried this 5-6 times and every time the same happened every time. > writeAsCSV function in DataStream Scala API creates no file > --- > > Key: FLINK-2069 > URL: https://issues.apache.org/jira/browse/FLINK-2069 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Faye Beligianni >Priority: Blocker > Labels: Streaming > Fix For: 0.9 > > > When the {{writeAsCSV}} function is used in the DataStream Scala API, no file > is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file
[ https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559767#comment-14559767 ] Faye Beligianni commented on FLINK-2069: I am not sure if this will help you, but I noticed the following: If the job switches from running to canceled immediately, then the csv file is created, and since the job is canceled the file remains and is not deleted. I tried this 5-6 times and every time the same happened every time. > writeAsCSV function in DataStream Scala API creates no file > --- > > Key: FLINK-2069 > URL: https://issues.apache.org/jira/browse/FLINK-2069 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Faye Beligianni >Priority: Blocker > Labels: Streaming > Fix For: 0.9 > > > When the {{writeAsCSV}} function is used in the DataStream Scala API, no file > is created in the specified path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path
[ https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559714#comment-14559714 ] ASF GitHub Bot commented on FLINK-2043: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/721#issuecomment-105644179 Thanks! Looks good. Will merge > Change the KMeansDataGenerator to allow passing a custom path > - > > Key: FLINK-2043 > URL: https://issues.apache.org/jira/browse/FLINK-2043 > Project: Flink > Issue Type: Improvement > Components: Examples >Reporter: Robert Metzger >Assignee: pietro pinoli >Priority: Trivial > Labels: starter > > It would be nice to allow the user to specify a target path for the generated > data. > Right now, one has to pass the path by changing the tmp directory of java > {code} > java -Djava.io.tmpdir=`pwd` -cp > /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar > org.apache.flink.examples.java.clustering.util.KMeansDataGenerator > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1848] Fix for file paths with Windows d...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/712#issuecomment-105644103 Thanks for the review. Will merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/721#issuecomment-105644179 Thanks! Looks good. Will merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats
[ https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559712#comment-14559712 ] ASF GitHub Bot commented on FLINK-1848: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/712#issuecomment-105644103 Thanks for the review. Will merge > Paths containing a Windows drive letter cannot be used in FileOutputFormats > --- > > Key: FLINK-1848 > URL: https://issues.apache.org/jira/browse/FLINK-1848 > Project: Flink > Issue Type: Bug >Affects Versions: 0.9 > Environment: Windows (Cygwin and native) >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Critical > Fix For: 0.9 > > > Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} > cannot be used as output path for {{FileOutputFormat}}. > If done, the following exception is thrown: > {code} > Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: > Relative path in absolute URI: file:c: > at org.apache.flink.core.fs.Path.initialize(Path.java:242) > at org.apache.flink.core.fs.Path.(Path.java:225) > at org.apache.flink.core.fs.Path.(Path.java:138) > at > org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) > at > org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603) > at > org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233) > at > org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158) > at > org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > at java.lang.Thread.run(Unknown Source) > Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c: > at java.net.URI.checkPath(Unknown Source) > at java.net.URI.(Unknown Source) > at org.apache.flink.core.fs.Path.initialize(Path.java:240) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/729 [FLINK-1319][core] Add static code analysis for UDFs This PR implements a Static Code Analyzer (SCA) that uses the ASM framework for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced `TaggedValue`s which extends `BasicValue` and allows for appending interesting information to values. Interesting values such as inputs, collectors, or constants are tagged such that a tracking of atomic input fields through the entire UDF (until the function returns or calls `collect()`) is possible. The implementation is as conservative as possible meaning that for cases or bytecode instructions that haven't been considered the analyzer will fallback to the ASM library (which removes TaggedValues). 61 JUnit tests are testing the basic functionality. 18 JUnit tests with code examples from the "real world" are testing the analyzer even more. The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS The interpretation takes some time. It is possible that an analysis of an UDF takes up to 1 second. Therefore, I didn't enable the analyzer in TestEnvironment by default to reduce the build times, but if you uncomment the lines the analyzer supports all 280 UDFs within the entire Flink code. The analyzer gives hints about: - Main feature: ForwardedFields semantic properties for all types of Functions except for MapPartition and Combine - Warnings if static fields are modified by a Function - Warnings if a FilterFunction modifies its input objects - Warnings if a Function returns `null` - Warnings if a tuple access uses a wrong index - Information about the number of object creations within a UDF (for manual optimization) You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink sca Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/729.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #729 commit c384fc9740013ec1ae89a2817695078542c47dfe Author: twalthr Date: 2015-05-26T18:22:03Z [FLINK-1319][core] Add static code analysis for UDFs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559700#comment-14559700 ] ASF GitHub Bot commented on FLINK-1319: --- GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/729 [FLINK-1319][core] Add static code analysis for UDFs This PR implements a Static Code Analyzer (SCA) that uses the ASM framework for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced `TaggedValue`s which extends `BasicValue` and allows for appending interesting information to values. Interesting values such as inputs, collectors, or constants are tagged such that a tracking of atomic input fields through the entire UDF (until the function returns or calls `collect()`) is possible. The implementation is as conservative as possible meaning that for cases or bytecode instructions that haven't been considered the analyzer will fallback to the ASM library (which removes TaggedValues). 61 JUnit tests are testing the basic functionality. 18 JUnit tests with code examples from the "real world" are testing the analyzer even more. The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS The interpretation takes some time. It is possible that an analysis of an UDF takes up to 1 second. Therefore, I didn't enable the analyzer in TestEnvironment by default to reduce the build times, but if you uncomment the lines the analyzer supports all 280 UDFs within the entire Flink code. The analyzer gives hints about: - Main feature: ForwardedFields semantic properties for all types of Functions except for MapPartition and Combine - Warnings if static fields are modified by a Function - Warnings if a FilterFunction modifies its input objects - Warnings if a Function returns `null` - Warnings if a tuple access uses a wrong index - Information about the number of object creations within a UDF (for manual optimization) You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink sca Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/729.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #729 commit c384fc9740013ec1ae89a2817695078542c47dfe Author: twalthr Date: 2015-05-26T18:22:03Z [FLINK-1319][core] Add static code analysis for UDFs > Add static code analysis for UDFs > - > > Key: FLINK-1319 > URL: https://issues.apache.org/jira/browse/FLINK-1319 > Project: Flink > Issue Type: New Feature > Components: Java API, Scala API >Reporter: Stephan Ewen >Assignee: Timo Walther >Priority: Minor > > Flink's Optimizer takes information that tells it for UDFs which fields of > the input elements are accessed, modified, or frwarded/copied. This > information frequently helps to reuse partitionings, sorts, etc. It may speed > up programs significantly, as it can frequently eliminate sorts and shuffles, > which are costly. > Right now, users can add lightweight annotations to UDFs to provide this > information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}. > We worked with static code analysis of UDFs before, to determine this > information automatically. This is an incredible feature, as it "magically" > makes programs faster. > For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this > works surprisingly well in many cases. We used the "Soot" toolkit for the > static code analysis. Unfortunately, Soot is LGPL licensed and thus we did > not include any of the code so far. > I propose to add this functionality to Flink, in the form of a drop-in > addition, to work around the LGPL incompatibility with ALS 2.0. Users could > simply download a special "flink-code-analysis.jar" and drop it into the > "lib" folder to enable this functionality. We may even add a script to > "tools" that downloads that library automatically into the lib folder. This > should be legally fine, since we do not redistribute LGPL code and only > dynamically link it (the incompatibility with ASL 2.0 is mainly in the > patentability, if I remember correctly). > Prior work on this has been done by [~aljoscha] and [~skunert], which could > provide a code base to start with. > *Appendix* > Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ > Papers on static analysis and for optimization: > http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and > http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA
[ https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559688#comment-14559688 ] ASF GitHub Bot commented on FLINK-1941: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/722#issuecomment-105642132 and will merge :) > Add documentation for Gelly-GSA > --- > > Key: FLINK-1941 > URL: https://issues.apache.org/jira/browse/FLINK-1941 > Project: Flink > Issue Type: Task > Components: Gelly >Affects Versions: 0.9 >Reporter: Vasia Kalavri >Assignee: Vasia Kalavri > Labels: docs, gelly > > Add a section in the Gelly guide to describe the newly introduced > Gather-Sum-Apply iteration method. Show how GSA uses delta iterations > internally and explain the differences of this model as compared to > vertex-centric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/722#issuecomment-105642132 and will merge :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2012][gelly] Added methods to remove/ad...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/678#issuecomment-105641949 will merge! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods
[ https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559686#comment-14559686 ] ASF GitHub Bot commented on FLINK-2012: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/678#issuecomment-105641949 will merge! > addVertices, addEdges, removeVertices, removeEdges methods > -- > > Key: FLINK-2012 > URL: https://issues.apache.org/jira/browse/FLINK-2012 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.9 >Reporter: Andra Lungu >Assignee: Andra Lungu >Priority: Minor > > Currently, Gelly only allows the addition/deletion of one vertex/edge at a > time. If a user would want to add two (or more) vertices, he/she would need > to add a vertex-> create a new graph; then add another vertex -> another > graph etc. > It would be nice to also have addVertices, addEdges, removeVertices, > removeEdges methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559564#comment-14559564 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31064251 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31064251 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063509 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559558#comment-14559558 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063509 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063498 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559557#comment-14559557 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063498 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063481 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063469 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063476 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063448 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559556#comment-14559556 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063481 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559555#comment-14559555 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063476 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559553#comment-14559553 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063448 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559554#comment-14559554 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063469 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063043 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559542#comment-14559542 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31063043 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062899 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062844 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559536#comment-14559536 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062899 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559535#comment-14559535 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062844 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559526#comment-14559526 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062590 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062590 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull reque
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559523#comment-14559523 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062513 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. --- End diff -- That's German ;-) > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062513 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. --- End diff -- That's German ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559519#comment-14559519 ] ASF GitHub Bot commented on FLINK-2073: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062344 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. --- End diff -- Good catch. > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31062344 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. --- End diff -- Good catch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9
[ https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-2083. -- Resolution: Fixed Improved with b015a32f6126d759fc6dee90b78f90f7ff8dfbac > Ensure high quality docs for FlinkML in 0.9 > --- > > Key: FLINK-2083 > URL: https://issues.apache.org/jira/browse/FLINK-2083 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 0.9 > > > As defined in our vision for FlinkML, providing high-quality documentation is > a primary goal for us. > This issue concerns the docs that will be included in 0.9, and will track > improvements and additions for the release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2083] [docs] [ml] Ensure high quality d...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/715 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9
[ https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559509#comment-14559509 ] ASF GitHub Bot commented on FLINK-2083: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/715 > Ensure high quality docs for FlinkML in 0.9 > --- > > Key: FLINK-2083 > URL: https://issues.apache.org/jira/browse/FLINK-2083 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 0.9 > > > As defined in our vision for FlinkML, providing high-quality documentation is > a primary goal for us. > This issue concerns the docs that will be included in 0.9, and will track > improvements and additions for the release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/727#issuecomment-105616604 Looks good, just minor comments, plus the HTML code on the title fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559477#comment-14559477 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/727#issuecomment-105616604 Looks good, just minor comments, plus the HTML code on the title fix. > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Change FlinkMiniCluster#HOSTNAME to FlinkMiniC...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/711 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058920 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559466#comment-14559466 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058920 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058631 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559462#comment-14559462 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058631 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559452#comment-14559452 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058009 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31057983 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31058009 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559450#comment-14559450 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31057983 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31057687 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559445#comment-14559445 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31057687 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559421#comment-14559421 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056770 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056770 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056574 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559415#comment-14559415 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056529 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559418#comment-14559418 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056574 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056529 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559410#comment-14559410 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056427 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056388 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559408#comment-14559408 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056388 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056427 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559405#comment-14559405 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056316 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31056316 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[GitHub] flink pull request: [tez] Fix initialization of MemoryManager.
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/728 [tez] Fix initialization of MemoryManager. The memory manager in Tez is currently initialized as if it were shared between 10 concurrently executing tasks. This reduces the amount of memory available to a single task. Because the memory manager is only used by one task at a time, we should configure it like that to avoid wasting memory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink tez_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #728 commit 849926a1d1a3e1ee6379457778fab5020efbffb5 Author: Stephan Ewen Date: 2015-05-26T14:45:22Z [tez] Fix initialization of MemoryManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559394#comment-14559394 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055697 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `do
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055697 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. +Additionally, we would like to encourage contributors to add this information to the online documentation. +The online documentation for FlinkML's components can be found in the directory `docs/libs/ml`. + +Every new algorithm is described by a single markdown file. +This file should contain at least the following points: + +1. What does the algorithm do +2. How does the algorithm work (or reference to description) +3. Parameter description with default values +4. Code snippet showing how the algorithm is used + +In order to use latex syntax in the markdown file, you have to include `mathjax: include` in the YAML front matter. + +{% highlight java %} +--- +mathjax: include +title: Example title +--- +{% endhighlight %} + +In order to use displayed mathematics, you have to put your latex code in `$$ ... $$`. +For in-line mathematics, use `$ ... $`. +Additionally some predefined latex commands are included into the scope of your markdown file. +See `docs/_include/latex_commands.html` for the complete list of predefined latex commands. + +## Contributing + +Once you have implemented the algorithm with adequate test coverage and added documentation, you are ready to open a pull request.
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559388#comment-14559388 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055461 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. --- End diff -- functioning -> functionality > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055461 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. +Maven automatically makes this distinction by using the following naming rules: +All test cases whose class name ends with a suffix fulfilling the regular expression `(IT|Integration)(Test|Suite|Case)`, are considered integration tests. +The rest are considered unit tests and should only test behavior which is local to the component under test. + +An integration test is a test which requires the full Flink system to be started. +In order to do that properly, all integration test cases have to mix in the trait `FlinkTestBase`. +This trait will set the right `ExecutionEnvironment` so that the test will be executed on a special `FlinkMiniCluster` designated for testing purposes. +Thus, an integration test could look the following: + +{% highlight scala %} +class ExampleITSuite extends FlatSpec with FlinkTestBase { + behavior of "An example algorithm" + + it should "do something" in { +... + } +} +{% endhighlight %} + +The test style does not have to be `FlatSpec` but can be any other scalatest `Suite` subclass. + +## Documentation + +When contributing new algorithms, it is required to add code comments describing the functioning of the algorithm and its parameters with which the user can control its behavior. --- End diff -- functioning -> functionality --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559385#comment-14559385 ] ASF GitHub Bot commented on FLINK-2073: --- Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055321 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. --- End diff -- Capitalization: maven -> Maven > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...
Github user thvasilo commented on a diff in the pull request: https://github.com/apache/flink/pull/727#discussion_r31055321 --- Diff: docs/libs/ml/contribution_guide.md --- @@ -20,7 +21,329 @@ specific language governing permissions and limitations under the License. --> +The Flink community highly appreciates all sorts of contributions to FlinkML. +FlinkML offers people interested in machine learning to work on a highly active open source project which makes scalable ML reality. +The following document describes how to contribute to FlinkML. + * This will be replaced by the TOC {:toc} -Coming soon. In the meantime, check our list of [open issues on JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC) +## Getting Started + +In order to get started first read Flink's [contribution guide](http://flink.apache.org/how-to-contribute.html). +Everything from this guide also applies to FlinkML. + +## Pick a Topic + +If you are looking for some new ideas, then you should check out the list of [unresolved issues on JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC). +Once you decide to contribute to one of these issues, you should take ownership of it and track your progress with this issue. +That way, the other contributors know the state of the different issues and redundant work is avoided. + +If you already know what you want to contribute to FlinkML all the better. +It is still advisable to create a JIRA issue for your idea to tell the Flink community what you want to do, though. + +## Testing + +New contributions should come with tests to verify the correct behavior of the algorithm. +The tests help to maintain the algorithm's correctness throughout code changes, e.g. refactorings. + +We distinguish between unit tests, which are executed during maven's test phase, and integration tests, which are executed during maven's verify phase. --- End diff -- Capitalization: maven -> Maven --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9
[ https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559369#comment-14559369 ] ASF GitHub Bot commented on FLINK-2083: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/715#issuecomment-105600306 LGTM. Will merge it. > Ensure high quality docs for FlinkML in 0.9 > --- > > Key: FLINK-2083 > URL: https://issues.apache.org/jira/browse/FLINK-2083 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Theodore Vasiloudis > Labels: ML > Fix For: 0.9 > > > As defined in our vision for FlinkML, providing high-quality documentation is > a primary goal for us. > This issue concerns the docs that will be included in 0.9, and will track > improvements and additions for the release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2083] [docs] [ml] Ensure high quality d...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/715#issuecomment-105600306 LGTM. Will merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML
[ https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559361#comment-14559361 ] ASF GitHub Bot commented on FLINK-2073: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/727 [FLINK-2073] [ml] [docs] Adds contribution guide Adds contribution guide and implementation tutorial for new pipeline operators. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink mlContributionGuide Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/727.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #727 commit a3c912e5cf7b310871a78f72fce3136bf63c85dd Author: Till Rohrmann Date: 2015-05-26T16:45:01Z [FLINK-2073] [ml] [docs] Adds contribution guide. Adds links to FlinkML main site. > Add contribution guide for FlinkML > -- > > Key: FLINK-2073 > URL: https://issues.apache.org/jira/browse/FLINK-2073 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library >Reporter: Theodore Vasiloudis >Assignee: Till Rohrmann > Fix For: 0.9 > > > We need a guide for contributions to FlinkML in order to encourage the > extension of the library, and provide guidelines for developers. > One thing that should be included is a step-by-step guide to create a > transformer, or other Estimator -- This message was sent by Atlassian JIRA (v6.3.4#6332)