[jira] [Resolved] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path

2015-05-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-2043.
--
   Resolution: Fixed
Fix Version/s: 0.9

Fixed with 3586ced3550ac036638a8dff011c01de99f9ed5e

> Change the KMeansDataGenerator to allow passing a custom path
> -
>
> Key: FLINK-2043
> URL: https://issues.apache.org/jira/browse/FLINK-2043
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Robert Metzger
>Assignee: pietro pinoli
>Priority: Trivial
>  Labels: starter
> Fix For: 0.9
>
>
> It would be nice to allow the user to specify a target path for the generated 
> data.
> Right now, one has to pass the path by changing the tmp directory of java
> {code}
> java -Djava.io.tmpdir=`pwd` -cp 
> /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
>  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats

2015-05-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1848.
--
Resolution: Fixed

Fixed with 7164b2b643985b99c6688b62174de42a71deb71b

> Paths containing a Windows drive letter cannot be used in FileOutputFormats
> ---
>
> Key: FLINK-1848
> URL: https://issues.apache.org/jira/browse/FLINK-1848
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Windows (Cygwin and native)
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9
>
>
> Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} 
> cannot be used as output path for {{FileOutputFormat}}.
> If done, the following exception is thrown:
> {code}
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: file:c:
> at org.apache.flink.core.fs.Path.initialize(Path.java:242)
> at org.apache.flink.core.fs.Path.(Path.java:225)
> at org.apache.flink.core.fs.Path.(Path.java:138)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603)
> at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
> at 
> org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183)
> at 
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:
> at java.net.URI.checkPath(Unknown Source)
> at java.net.URI.(Unknown Source)
> at org.apache.flink.core.fs.Path.initialize(Path.java:240)
> ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560022#comment-14560022
 ] 

ASF GitHub Bot commented on FLINK-2043:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/721


> Change the KMeansDataGenerator to allow passing a custom path
> -
>
> Key: FLINK-2043
> URL: https://issues.apache.org/jira/browse/FLINK-2043
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Robert Metzger
>Assignee: pietro pinoli
>Priority: Trivial
>  Labels: starter
>
> It would be nice to allow the user to specify a target path for the generated 
> data.
> Right now, one has to pass the path by changing the tmp directory of java
> {code}
> java -Djava.io.tmpdir=`pwd` -cp 
> /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
>  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560024#comment-14560024
 ] 

ASF GitHub Bot commented on FLINK-1848:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/712


> Paths containing a Windows drive letter cannot be used in FileOutputFormats
> ---
>
> Key: FLINK-1848
> URL: https://issues.apache.org/jira/browse/FLINK-1848
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Windows (Cygwin and native)
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9
>
>
> Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} 
> cannot be used as output path for {{FileOutputFormat}}.
> If done, the following exception is thrown:
> {code}
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: file:c:
> at org.apache.flink.core.fs.Path.initialize(Path.java:242)
> at org.apache.flink.core.fs.Path.(Path.java:225)
> at org.apache.flink.core.fs.Path.(Path.java:138)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603)
> at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
> at 
> org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183)
> at 
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:
> at java.net.URI.checkPath(Unknown Source)
> at java.net.URI.(Unknown Source)
> at org.apache.flink.core.fs.Path.initialize(Path.java:240)
> ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1848] Fix for file paths with Windows d...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/712


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559963#comment-14559963
 ] 

ASF GitHub Bot commented on FLINK-1952:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/731#issuecomment-105679990
  
Sorry for the whitespace re-formatting. My IntelliJ settings got reverted 
somehow, so it started auto-reformatting code :-/


> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -
>
> Key: FLINK-1952
> URL: https://issues.apache.org/jira/browse/FLINK-1952
> Project: Flink
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Blocker
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - 
> execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 
> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network 
> buffers: required 350, but only 254 available. The total number of network 
> buffers is currently set to 2048. You can increase this number by setting the 
> configuration key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar 
> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer 
> vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
>   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>   ... 4 more
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ 
> cloud-19 - 1 slots - URL: 
> akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the 
> co-location constraint.
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
>   at 
> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
>   ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...

2015-05-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/731#issuecomment-105679811
  
The latest commit "Add big not so mini cluster test for CC to provoke 
scheduler problem" is not going to be merged. It is solely for verifying that 
the scheduler now correctly handles jobs with iterations and higher parallelism 
and many TaskManagers.

You can start a 100 TaskManager cluster with that test setup in the commit 
and run connected components. (Give the VM 5GB heap space, then it works 
smoothly).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...

2015-05-26 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/731#issuecomment-105679990
  
Sorry for the whitespace re-formatting. My IntelliJ settings got reverted 
somehow, so it started auto-reformatting code :-/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559961#comment-14559961
 ] 

ASF GitHub Bot commented on FLINK-1952:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/731#issuecomment-105679811
  
The latest commit "Add big not so mini cluster test for CC to provoke 
scheduler problem" is not going to be merged. It is solely for verifying that 
the scheduler now correctly handles jobs with iterations and higher parallelism 
and many TaskManagers.

You can start a 100 TaskManager cluster with that test setup in the commit 
and run connected components. (Give the VM 5GB heap space, then it works 
smoothly).


> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -
>
> Key: FLINK-1952
> URL: https://issues.apache.org/jira/browse/FLINK-1952
> Project: Flink
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Blocker
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - 
> execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 
> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network 
> buffers: required 350, but only 254 available. The total number of network 
> buffers is currently set to 2048. You can increase this number by setting the 
> configuration key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar 
> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer 
> vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
>   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>   ... 4 more
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ 
> cloud-19 - 1 slots - URL: 
> akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the 
> co-location constraint.
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
>   at 
> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
>   ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1952] [jobmanager] Rework and fix slot ...

2015-05-26 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/731

[FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink slots_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #731


commit 771437360662bcf105c72d95924a5ce3c69f1585
Author: Stephan Ewen 
Date:   2015-05-19T17:08:25Z

Add big not so mini cluster test for CC to provoke scheduler problem

commit 067c3868c07ea125d8f429e38476d3e8edfbad08
Author: Stephan Ewen 
Date:   2015-05-20T09:37:56Z

[FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler

commit 88074eda5c99945d0c0f106240010a451ba41658
Author: Stephan Ewen 
Date:   2015-05-26T20:56:50Z

[tests] Fix AvroExternalJarProgramITCase logging




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1952) Cannot run ConnectedComponents example: Could not allocate a slot on instance

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559951#comment-14559951
 ] 

ASF GitHub Bot commented on FLINK-1952:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/731

[FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink slots_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #731


commit 771437360662bcf105c72d95924a5ce3c69f1585
Author: Stephan Ewen 
Date:   2015-05-19T17:08:25Z

Add big not so mini cluster test for CC to provoke scheduler problem

commit 067c3868c07ea125d8f429e38476d3e8edfbad08
Author: Stephan Ewen 
Date:   2015-05-20T09:37:56Z

[FLINK-1952] [jobmanager] Rework and fix slot sharing scheduler

commit 88074eda5c99945d0c0f106240010a451ba41658
Author: Stephan Ewen 
Date:   2015-05-26T20:56:50Z

[tests] Fix AvroExternalJarProgramITCase logging




> Cannot run ConnectedComponents example: Could not allocate a slot on instance
> -
>
> Key: FLINK-1952
> URL: https://issues.apache.org/jira/browse/FLINK-1952
> Project: Flink
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Priority: Blocker
>
> Steps to reproduce
> {code}
> ./bin/yarn-session.sh -n 350 
> {code}
> ... wait until they are connected ...
> {code}
> Number of connected TaskManagers changed to 266. Slots available: 266
> Number of connected TaskManagers changed to 323. Slots available: 323
> Number of connected TaskManagers changed to 334. Slots available: 334
> Number of connected TaskManagers changed to 343. Slots available: 343
> Number of connected TaskManagers changed to 350. Slots available: 350
> {code}
> Start CC
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar
> {code}
> ---> it runs
> Run KMeans, let it fail with 
> {code}
> Failed to deploy the task Map (Map at main(KMeans.java:100)) (1/350) - 
> execution #0 to slot SimpleSlot (2)(2)(0) - 182b7661ca9547a84591de940c47a200 
> - ALLOCATED/ALIVE: java.io.IOException: Insufficient number of network 
> buffers: required 350, but only 254 available. The total number of network 
> buffers is currently set to 2048. You can increase this number by setting the 
> configuration key 'taskmanager.network.numberOfBuffers'.
> {code}
> ... as expected.
> (I've waited for 10 minutes between the two submissions)
> Starting CC now will fail:
> {code}
> ./bin/flink run -p 350 
> ./examples/flink-java-examples-0.9-SNAPSHOT-ConnectedComponents.jar 
> {code}
> Error message(s):
> {code}
> Caused by: java.lang.IllegalStateException: Could not schedule consumer 
> vertex IterationHead(WorksetIteration (Unnamed Delta Iteration)) (19/350)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:479)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:469)
>   at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>   ... 4 more
> Caused by: 
> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
> Could not allocate a slot on instance 4a6d761cb084c32310ece1f849556faf @ 
> cloud-19 - 1 slots - URL: 
> akka.tcp://flink@130.149.21.23:51400/user/taskmanager, as required by the 
> co-location constraint.
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:247)
>   at 
> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:110)
>   at 
> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:262)
>   at 
> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:436)
>   at 
> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:475)
>   ... 9 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: basic TfidfTransformer

2015-05-26 Thread rbraeunlich
GitHub user rbraeunlich opened a pull request:

https://github.com/apache/flink/pull/730

basic TfidfTransformer

Hi everybody,

due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we 
created a first implementation of a TfIdfTranformer.
There is still one problem left, because using modulo after the hashing 
causes collisions.
Nevertheless, we would be glad to receive some comments to our 
implementation.

Cheers,
Ronny

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rbraeunlich/flink tfidf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #730


commit 9e9ac219b619ddfbab4f616165d038900b7726db
Author: Ronny Bräunlich 
Date:   2015-05-15T09:18:00Z

create TfIdfTransformer

commit 42ef7c00a832e21d7391e1011031bda162d930f1
Author: Ronny Bräunlich 
Date:   2015-05-16T14:38:28Z

fix import in TfIdfTranformer and add first basic test case

commit 82385b764f45f955cd88590b7657467689d096ed
Author: Ronny Bräunlich 
Date:   2015-05-15T09:18:00Z

create TfIdfTransformer and add first basic test case

commit 7242728b1c24027203f1ff91476de9acb9bbf3a7
Author: diva1012 
Date:   2015-05-17T11:42:40Z

Changes merged

Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf

Conflicts:

flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala

commit 9c2c181624bb81f3ed83a4a774339251508644f1
Author: diva1012 
Date:   2015-05-17T17:40:00Z

Small fix of the test class. (The Sparse vector contains index -> value 
tuples, so we have to take only the value and not the whole tuple for the 
comparisson)

commit 8b17385e34b7f139a2649f80edc81744277fcfae
Author: diva1012 
Date:   2015-05-18T06:41:58Z

Word count implementation simplified.

commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9
Author: diva1012 
Date:   2015-05-18T11:35:43Z

TF calculation fixed

commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08
Author: diva1012 
Date:   2015-05-19T20:44:31Z

Transformer improved: now we get SparseVector for each document that 
contains all words.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559938#comment-14559938
 ] 

ASF GitHub Bot commented on FLINK-1999:
---

GitHub user rbraeunlich opened a pull request:

https://github.com/apache/flink/pull/730

basic TfidfTransformer

Hi everybody,

due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we 
created a first implementation of a TfIdfTranformer.
There is still one problem left, because using modulo after the hashing 
causes collisions.
Nevertheless, we would be glad to receive some comments to our 
implementation.

Cheers,
Ronny

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rbraeunlich/flink tfidf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #730


commit 9e9ac219b619ddfbab4f616165d038900b7726db
Author: Ronny Bräunlich 
Date:   2015-05-15T09:18:00Z

create TfIdfTransformer

commit 42ef7c00a832e21d7391e1011031bda162d930f1
Author: Ronny Bräunlich 
Date:   2015-05-16T14:38:28Z

fix import in TfIdfTranformer and add first basic test case

commit 82385b764f45f955cd88590b7657467689d096ed
Author: Ronny Bräunlich 
Date:   2015-05-15T09:18:00Z

create TfIdfTransformer and add first basic test case

commit 7242728b1c24027203f1ff91476de9acb9bbf3a7
Author: diva1012 
Date:   2015-05-17T11:42:40Z

Changes merged

Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf

Conflicts:

flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala

commit 9c2c181624bb81f3ed83a4a774339251508644f1
Author: diva1012 
Date:   2015-05-17T17:40:00Z

Small fix of the test class. (The Sparse vector contains index -> value 
tuples, so we have to take only the value and not the whole tuple for the 
comparisson)

commit 8b17385e34b7f139a2649f80edc81744277fcfae
Author: diva1012 
Date:   2015-05-18T06:41:58Z

Word count implementation simplified.

commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9
Author: diva1012 
Date:   2015-05-18T11:35:43Z

TF calculation fixed

commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08
Author: diva1012 
Date:   2015-05-19T20:44:31Z

Transformer improved: now we get SparseVector for each document that 
contains all words.




> TF-IDF transformer
> --
>
> Key: FLINK-1999
> URL: https://issues.apache.org/jira/browse/FLINK-1999
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Ronny Bräunlich
>Assignee: Alexander Alexandrov
>Priority: Minor
>  Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods

2015-05-26 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2012.
--
   Resolution: Implemented
Fix Version/s: 0.9

> addVertices, addEdges, removeVertices, removeEdges methods
> --
>
> Key: FLINK-2012
> URL: https://issues.apache.org/jira/browse/FLINK-2012
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
> Fix For: 0.9
>
>
> Currently, Gelly only allows the addition/deletion of one vertex/edge at a 
> time. If a user would want to add two (or more) vertices, he/she would need 
> to add a vertex-> create a new graph; then add another vertex -> another 
> graph etc.  
> It would be nice to also have addVertices, addEdges, removeVertices, 
> removeEdges methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-26 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-1941.
--
Resolution: Fixed

> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/722


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2012][gelly] Added methods to remove/ad...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/678


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559836#comment-14559836
 ] 

ASF GitHub Bot commented on FLINK-1941:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/722


> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559835#comment-14559835
 ] 

ASF GitHub Bot commented on FLINK-2012:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/678


> addVertices, addEdges, removeVertices, removeEdges methods
> --
>
> Key: FLINK-2012
> URL: https://issues.apache.org/jira/browse/FLINK-2012
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> Currently, Gelly only allows the addition/deletion of one vertex/edge at a 
> time. If a user would want to add two (or more) vertices, he/she would need 
> to add a vertex-> create a new graph; then add another vertex -> another 
> graph etc.  
> It would be nice to also have addVertices, addEdges, removeVertices, 
> removeEdges methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [tez] Fix initialization of MemoryManager.

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/728


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Issue Comment Deleted] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-05-26 Thread Faye Beligianni (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Faye Beligianni updated FLINK-2069:
---
Comment: was deleted

(was: I am not sure if this will help you, but I noticed the following:

If the job switches from running to canceled immediately, then the csv file is 
created, and since the job is canceled the file remains and is not deleted. 
I tried this 5-6 times and every time the same happened every time. )

> writeAsCSV function in DataStream Scala API creates no file
> ---
>
> Key: FLINK-2069
> URL: https://issues.apache.org/jira/browse/FLINK-2069
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Faye Beligianni
>Priority: Blocker
>  Labels: Streaming
> Fix For: 0.9
>
>
> When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
> is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-05-26 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559772#comment-14559772
 ] 

Faye Beligianni commented on FLINK-2069:


Of course the file is empty but at least it is created.

> writeAsCSV function in DataStream Scala API creates no file
> ---
>
> Key: FLINK-2069
> URL: https://issues.apache.org/jira/browse/FLINK-2069
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Faye Beligianni
>Priority: Blocker
>  Labels: Streaming
> Fix For: 0.9
>
>
> When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
> is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-05-26 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559767#comment-14559767
 ] 

Faye Beligianni edited comment on FLINK-2069 at 5/26/15 8:11 PM:
-

I am not sure if this will help you, but I noticed the following:

If the job switches from running to canceled immediately, then the csv file is 
created, and since the job is canceled the file remains and is not deleted. 
I tried this 5-6 times and every time the same happened. 


was (Author: fobeligi):
I am not sure if this will help you, but I noticed the following:

If the job switches from running to canceled immediately, then the csv file is 
created, and since the job is canceled the file remains and is not deleted. 
I tried this 5-6 times and every time the same happened every time. 

> writeAsCSV function in DataStream Scala API creates no file
> ---
>
> Key: FLINK-2069
> URL: https://issues.apache.org/jira/browse/FLINK-2069
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Faye Beligianni
>Priority: Blocker
>  Labels: Streaming
> Fix For: 0.9
>
>
> When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
> is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-05-26 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559768#comment-14559768
 ] 

Faye Beligianni commented on FLINK-2069:


I am not sure if this will help you, but I noticed the following:

If the job switches from running to canceled immediately, then the csv file is 
created, and since the job is canceled the file remains and is not deleted. 
I tried this 5-6 times and every time the same happened every time. 

> writeAsCSV function in DataStream Scala API creates no file
> ---
>
> Key: FLINK-2069
> URL: https://issues.apache.org/jira/browse/FLINK-2069
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Faye Beligianni
>Priority: Blocker
>  Labels: Streaming
> Fix For: 0.9
>
>
> When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
> is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2069) writeAsCSV function in DataStream Scala API creates no file

2015-05-26 Thread Faye Beligianni (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559767#comment-14559767
 ] 

Faye Beligianni commented on FLINK-2069:


I am not sure if this will help you, but I noticed the following:

If the job switches from running to canceled immediately, then the csv file is 
created, and since the job is canceled the file remains and is not deleted. 
I tried this 5-6 times and every time the same happened every time. 

> writeAsCSV function in DataStream Scala API creates no file
> ---
>
> Key: FLINK-2069
> URL: https://issues.apache.org/jira/browse/FLINK-2069
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Faye Beligianni
>Priority: Blocker
>  Labels: Streaming
> Fix For: 0.9
>
>
> When the {{writeAsCSV}} function is used in the DataStream Scala API, no file 
> is created in the specified path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2043) Change the KMeansDataGenerator to allow passing a custom path

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559714#comment-14559714
 ] 

ASF GitHub Bot commented on FLINK-2043:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/721#issuecomment-105644179
  
Thanks! Looks good. Will merge


> Change the KMeansDataGenerator to allow passing a custom path
> -
>
> Key: FLINK-2043
> URL: https://issues.apache.org/jira/browse/FLINK-2043
> Project: Flink
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Robert Metzger
>Assignee: pietro pinoli
>Priority: Trivial
>  Labels: starter
>
> It would be nice to allow the user to specify a target path for the generated 
> data.
> Right now, one has to pass the path by changing the tmp directory of java
> {code}
> java -Djava.io.tmpdir=`pwd` -cp 
> /home/robert/flink/build-target/examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar
>  org.apache.flink.examples.java.clustering.util.KMeansDataGenerator
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1848] Fix for file paths with Windows d...

2015-05-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/712#issuecomment-105644103
  
Thanks for the review. Will merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2043] Change the KMeansDataGenerator to...

2015-05-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/721#issuecomment-105644179
  
Thanks! Looks good. Will merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559712#comment-14559712
 ] 

ASF GitHub Bot commented on FLINK-1848:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/712#issuecomment-105644103
  
Thanks for the review. Will merge


> Paths containing a Windows drive letter cannot be used in FileOutputFormats
> ---
>
> Key: FLINK-1848
> URL: https://issues.apache.org/jira/browse/FLINK-1848
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.9
> Environment: Windows (Cygwin and native)
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>Priority: Critical
> Fix For: 0.9
>
>
> Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} 
> cannot be used as output path for {{FileOutputFormat}}.
> If done, the following exception is thrown:
> {code}
> Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
> Relative path in absolute URI: file:c:
> at org.apache.flink.core.fs.Path.initialize(Path.java:242)
> at org.apache.flink.core.fs.Path.(Path.java:225)
> at org.apache.flink.core.fs.Path.(Path.java:138)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
> at 
> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603)
> at 
> org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
> at 
> org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158)
> at 
> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183)
> at 
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:
> at java.net.URI.checkPath(Unknown Source)
> at java.net.URI.(Unknown Source)
> at org.apache.flink.core.fs.Path.initialize(Path.java:240)
> ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...

2015-05-26 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/729

[FLINK-1319][core] Add static code analysis for UDFs

This PR implements a Static Code Analyzer (SCA) that uses the ASM framework 
for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of 
ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced 
`TaggedValue`s which extends `BasicValue` and allows for appending interesting 
information to values. Interesting values such as inputs, collectors, or 
constants are tagged such that a tracking of atomic input fields through the 
entire UDF (until the function returns or calls `collect()`) is possible.

The implementation is as conservative as possible meaning that for cases or 
bytecode instructions that haven't been considered the analyzer will fallback 
to the ASM library (which removes TaggedValues).

61 JUnit tests are testing the basic functionality. 18 JUnit tests with 
code examples from the "real world" are testing the analyzer even more.

The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS

The interpretation takes some time. It is possible that an analysis of an 
UDF takes up to 1 second. Therefore, I didn't enable the analyzer in 
TestEnvironment by default to reduce the build times, but if you uncomment the 
lines the analyzer supports all 280 UDFs within the entire Flink code. 

The analyzer gives hints about:
- Main feature: ForwardedFields semantic properties for all types of 
Functions except for MapPartition and Combine
- Warnings if static fields are modified by a Function
- Warnings if a FilterFunction modifies its input objects
- Warnings if a Function returns `null`
- Warnings if a tuple access uses a wrong index
- Information about the number of object creations within a UDF (for manual 
optimization)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink sca

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #729


commit c384fc9740013ec1ae89a2817695078542c47dfe
Author: twalthr 
Date:   2015-05-26T18:22:03Z

[FLINK-1319][core] Add static code analysis for UDFs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559700#comment-14559700
 ] 

ASF GitHub Bot commented on FLINK-1319:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/729

[FLINK-1319][core] Add static code analysis for UDFs

This PR implements a Static Code Analyzer (SCA) that uses the ASM framework 
for interpreting Java bytecode of Flink UDFs. The analyzer is build on top of 
ASM's `BasicInterpreter`. Instead of ASM's `BasicValue`s, I introduced 
`TaggedValue`s which extends `BasicValue` and allows for appending interesting 
information to values. Interesting values such as inputs, collectors, or 
constants are tagged such that a tracking of atomic input fields through the 
entire UDF (until the function returns or calls `collect()`) is possible.

The implementation is as conservative as possible meaning that for cases or 
bytecode instructions that haven't been considered the analyzer will fallback 
to the ASM library (which removes TaggedValues).

61 JUnit tests are testing the basic functionality. 18 JUnit tests with 
code examples from the "real world" are testing the analyzer even more.

The analyzer has 3 modes: DISABLED, OPTIMIZE, HINTS

The interpretation takes some time. It is possible that an analysis of an 
UDF takes up to 1 second. Therefore, I didn't enable the analyzer in 
TestEnvironment by default to reduce the build times, but if you uncomment the 
lines the analyzer supports all 280 UDFs within the entire Flink code. 

The analyzer gives hints about:
- Main feature: ForwardedFields semantic properties for all types of 
Functions except for MapPartition and Combine
- Warnings if static fields are modified by a Function
- Warnings if a FilterFunction modifies its input objects
- Warnings if a Function returns `null`
- Warnings if a tuple access uses a wrong index
- Information about the number of object creations within a UDF (for manual 
optimization)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink sca

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #729


commit c384fc9740013ec1ae89a2817695078542c47dfe
Author: twalthr 
Date:   2015-05-26T18:22:03Z

[FLINK-1319][core] Add static code analysis for UDFs




> Add static code analysis for UDFs
> -
>
> Key: FLINK-1319
> URL: https://issues.apache.org/jira/browse/FLINK-1319
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Reporter: Stephan Ewen
>Assignee: Timo Walther
>Priority: Minor
>
> Flink's Optimizer takes information that tells it for UDFs which fields of 
> the input elements are accessed, modified, or frwarded/copied. This 
> information frequently helps to reuse partitionings, sorts, etc. It may speed 
> up programs significantly, as it can frequently eliminate sorts and shuffles, 
> which are costly.
> Right now, users can add lightweight annotations to UDFs to provide this 
> information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}.
> We worked with static code analysis of UDFs before, to determine this 
> information automatically. This is an incredible feature, as it "magically" 
> makes programs faster.
> For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
> works surprisingly well in many cases. We used the "Soot" toolkit for the 
> static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
> not include any of the code so far.
> I propose to add this functionality to Flink, in the form of a drop-in 
> addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
> simply download a special "flink-code-analysis.jar" and drop it into the 
> "lib" folder to enable this functionality. We may even add a script to 
> "tools" that downloads that library automatically into the lib folder. This 
> should be legally fine, since we do not redistribute LGPL code and only 
> dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
> patentability, if I remember correctly).
> Prior work on this has been done by [~aljoscha] and [~skunert], which could 
> provide a code base to start with.
> *Appendix*
> Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
> Papers on static analysis and for optimization: 
> http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf

[jira] [Commented] (FLINK-1941) Add documentation for Gelly-GSA

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559688#comment-14559688
 ] 

ASF GitHub Bot commented on FLINK-1941:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/722#issuecomment-105642132
  
and will merge :)


> Add documentation for Gelly-GSA
> ---
>
> Key: FLINK-1941
> URL: https://issues.apache.org/jira/browse/FLINK-1941
> Project: Flink
>  Issue Type: Task
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>Assignee: Vasia Kalavri
>  Labels: docs, gelly
>
> Add a section in the Gelly guide to describe the newly introduced 
> Gather-Sum-Apply iteration method. Show how GSA uses delta iterations 
> internally and explain the differences of this model as compared to 
> vertex-centric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1941] documentation for the Gather-Sum-...

2015-05-26 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/722#issuecomment-105642132
  
and will merge :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2012][gelly] Added methods to remove/ad...

2015-05-26 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/678#issuecomment-105641949
  
will merge!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2012) addVertices, addEdges, removeVertices, removeEdges methods

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559686#comment-14559686
 ] 

ASF GitHub Bot commented on FLINK-2012:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/678#issuecomment-105641949
  
will merge!


> addVertices, addEdges, removeVertices, removeEdges methods
> --
>
> Key: FLINK-2012
> URL: https://issues.apache.org/jira/browse/FLINK-2012
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Minor
>
> Currently, Gelly only allows the addition/deletion of one vertex/edge at a 
> time. If a user would want to add two (or more) vertices, he/she would need 
> to add a vertex-> create a new graph; then add another vertex -> another 
> graph etc.  
> It would be nice to also have addVertices, addEdges, removeVertices, 
> removeEdges methods. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559564#comment-14559564
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31064251
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31064251
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063509
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559558#comment-14559558
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063509
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063498
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559557#comment-14559557
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063498
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063481
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063469
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063476
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063448
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559556#comment-14559556
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063481
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559555#comment-14559555
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063476
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559553#comment-14559553
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063448
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559554#comment-14559554
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063469
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063043
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559542#comment-14559542
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31063043
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062899
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062844
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559536#comment-14559536
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062899
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559535#comment-14559535
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062844
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559526#comment-14559526
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062590
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062590
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull reque

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559523#comment-14559523
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062513
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
--- End diff --

That's German ;-)


> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062513
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
--- End diff --

That's German ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559519#comment-14559519
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062344
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
--- End diff --

Good catch.


> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31062344
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
--- End diff --

Good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9

2015-05-26 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann resolved FLINK-2083.
--
Resolution: Fixed

Improved with b015a32f6126d759fc6dee90b78f90f7ff8dfbac

> Ensure high quality docs for FlinkML in 0.9
> ---
>
> Key: FLINK-2083
> URL: https://issues.apache.org/jira/browse/FLINK-2083
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 0.9
>
>
> As defined in our vision for FlinkML, providing high-quality documentation is 
> a primary goal for us.
> This issue concerns the docs that will be included in 0.9, and will track 
> improvements and additions for the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2083] [docs] [ml] Ensure high quality d...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/715


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559509#comment-14559509
 ] 

ASF GitHub Bot commented on FLINK-2083:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/715


> Ensure high quality docs for FlinkML in 0.9
> ---
>
> Key: FLINK-2083
> URL: https://issues.apache.org/jira/browse/FLINK-2083
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 0.9
>
>
> As defined in our vision for FlinkML, providing high-quality documentation is 
> a primary goal for us.
> This issue concerns the docs that will be included in 0.9, and will track 
> improvements and additions for the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/727#issuecomment-105616604
  
Looks good, just minor comments, plus the HTML code on the title fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559477#comment-14559477
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/727#issuecomment-105616604
  
Looks good, just minor comments, plus the HTML code on the title fix.


> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Change FlinkMiniCluster#HOSTNAME to FlinkMiniC...

2015-05-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/711


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058920
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559466#comment-14559466
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058920
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058631
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559462#comment-14559462
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058631
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559452#comment-14559452
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058009
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31057983
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31058009
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559450#comment-14559450
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31057983
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31057687
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559445#comment-14559445
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31057687
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559421#comment-14559421
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056770
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056770
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056574
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559415#comment-14559415
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056529
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559418#comment-14559418
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056574
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056529
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559410#comment-14559410
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056427
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056388
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559408#comment-14559408
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056388
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056427
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559405#comment-14559405
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056316
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31056316
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[GitHub] flink pull request: [tez] Fix initialization of MemoryManager.

2015-05-26 Thread StephanEwen
GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/728

[tez] Fix initialization of MemoryManager.

The memory manager in Tez is currently initialized as if it were shared 
between 10 concurrently executing tasks. This reduces the amount of memory 
available to a single task.

Because the memory manager is only used by one task at a time, we should 
configure it like that to avoid wasting memory.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink tez_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/728.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #728


commit 849926a1d1a3e1ee6379457778fab5020efbffb5
Author: Stephan Ewen 
Date:   2015-05-26T14:45:22Z

[tez] Fix initialization of MemoryManager.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559394#comment-14559394
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055697
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `do

[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055697
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
+Additionally, we would like to encourage contributors to add this 
information to the online documentation.
+The online documentation for FlinkML's components can be found in the 
directory `docs/libs/ml`.
+
+Every new algorithm is described by a single markdown file.
+This file should contain at least the following points:
+
+1. What does the algorithm do
+2. How does the algorithm work (or reference to description) 
+3. Parameter description with default values
+4. Code snippet showing how the algorithm is used
+
+In order to use latex syntax in the markdown file, you have to include 
`mathjax: include` in the YAML front matter.
+ 
+{% highlight java %}
+---
+mathjax: include
+title: Example title
+---
+{% endhighlight %}
+
+In order to use displayed mathematics, you have to put your latex code in 
`$$ ... $$`.
+For in-line mathematics, use `$ ... $`.
+Additionally some predefined latex commands are included into the scope of 
your markdown file.
+See `docs/_include/latex_commands.html` for the complete list of 
predefined latex commands.
+
+## Contributing
+
+Once you have implemented the algorithm with adequate test coverage and 
added documentation, you are ready to open a pull request.

[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559388#comment-14559388
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055461
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
--- End diff --

functioning -> functionality


> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055461
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
+Maven automatically makes this distinction by using the following naming 
rules:
+All test cases whose class name ends with a suffix fulfilling the regular 
expression `(IT|Integration)(Test|Suite|Case)`, are considered integration 
tests.
+The rest are considered unit tests and should only test behavior which is 
local to the component under test.
+
+An integration test is a test which requires the full Flink system to be 
started.
+In order to do that properly, all integration test cases have to mix in 
the trait `FlinkTestBase`.
+This trait will set the right `ExecutionEnvironment` so that the test will 
be executed on a special `FlinkMiniCluster` designated for testing purposes.
+Thus, an integration test could look the following:
+
+{% highlight scala %}
+class ExampleITSuite extends FlatSpec with FlinkTestBase {
+  behavior of "An example algorithm"
+  
+  it should "do something" in {
+...
+  }
+}
+{% endhighlight %}
+
+The test style does not have to be `FlatSpec` but can be any other 
scalatest `Suite` subclass. 
+
+## Documentation
+
+When contributing new algorithms, it is required to add code comments 
describing the functioning of the algorithm and its parameters with which the 
user can control its behavior.
--- End diff --

functioning -> functionality


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559385#comment-14559385
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055321
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
--- End diff --

Capitalization: maven -> Maven 


> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2073] [ml] [docs] Adds contribution gui...

2015-05-26 Thread thvasilo
Github user thvasilo commented on a diff in the pull request:

https://github.com/apache/flink/pull/727#discussion_r31055321
  
--- Diff: docs/libs/ml/contribution_guide.md ---
@@ -20,7 +21,329 @@ specific language governing permissions and limitations
 under the License.
 -->
 
+The Flink community highly appreciates all sorts of contributions to 
FlinkML.
+FlinkML offers people interested in machine learning to work on a highly 
active open source project which makes scalable ML reality.
+The following document describes how to contribute to FlinkML.
+
 * This will be replaced by the TOC
 {:toc}
 
-Coming soon. In the meantime, check our list of [open issues on 
JIRA](https://issues.apache.org/jira/browse/FLINK-1748?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC)
+## Getting Started
+
+In order to get started first read Flink's [contribution 
guide](http://flink.apache.org/how-to-contribute.html).
+Everything from this guide also applies to FlinkML.
+
+## Pick a Topic
+
+If you are looking for some new ideas, then you should check out the list 
of [unresolved issues on 
JIRA](https://issues.apache.org/jira/issues/?jql=component%20%3D%20%22Machine%20Learning%20Library%22%20AND%20project%20%3D%20FLINK%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC).
+Once you decide to contribute to one of these issues, you should take 
ownership of it and track your progress with this issue.
+That way, the other contributors know the state of the different issues 
and redundant work is avoided.
+
+If you already know what you want to contribute to FlinkML all the better.
+It is still advisable to create a JIRA issue for your idea to tell the 
Flink community what you want to do, though.
+
+## Testing
+
+New contributions should come with tests to verify the correct behavior of 
the algorithm.
+The tests help to maintain the algorithm's correctness throughout code 
changes, e.g. refactorings.
+
+We distinguish between unit tests, which are executed during maven's test 
phase, and integration tests, which are executed during maven's verify phase.
--- End diff --

Capitalization: maven -> Maven 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2083) Ensure high quality docs for FlinkML in 0.9

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559369#comment-14559369
 ] 

ASF GitHub Bot commented on FLINK-2083:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/715#issuecomment-105600306
  
LGTM. Will merge it.


> Ensure high quality docs for FlinkML in 0.9
> ---
>
> Key: FLINK-2083
> URL: https://issues.apache.org/jira/browse/FLINK-2083
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Theodore Vasiloudis
>  Labels: ML
> Fix For: 0.9
>
>
> As defined in our vision for FlinkML, providing high-quality documentation is 
> a primary goal for us.
> This issue concerns the docs that will be included in 0.9, and will track 
> improvements and additions for the release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2083] [docs] [ml] Ensure high quality d...

2015-05-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/715#issuecomment-105600306
  
LGTM. Will merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2073) Add contribution guide for FlinkML

2015-05-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559361#comment-14559361
 ] 

ASF GitHub Bot commented on FLINK-2073:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/727

[FLINK-2073] [ml] [docs] Adds contribution guide

Adds contribution guide and implementation tutorial for new pipeline 
operators.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink mlContributionGuide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #727


commit a3c912e5cf7b310871a78f72fce3136bf63c85dd
Author: Till Rohrmann 
Date:   2015-05-26T16:45:01Z

[FLINK-2073] [ml] [docs] Adds contribution guide. Adds links to FlinkML 
main site.




> Add contribution guide for FlinkML
> --
>
> Key: FLINK-2073
> URL: https://issues.apache.org/jira/browse/FLINK-2073
> Project: Flink
>  Issue Type: New Feature
>  Components: Documentation, Machine Learning Library
>Reporter: Theodore Vasiloudis
>Assignee: Till Rohrmann
> Fix For: 0.9
>
>
> We need a guide for contributions to FlinkML in order to encourage the 
> extension of the library, and provide guidelines for developers.
> One thing that should be included is a step-by-step guide to create a 
> transformer, or other Estimator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >