[jira] [Commented] (FLINK-3920) Distributed Linear Algebra: block-based matrix

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354542#comment-15354542
 ] 

ASF GitHub Bot commented on FLINK-3920:
---

Github user chiwanpark commented on the issue:

https://github.com/apache/flink/pull/2152
  
This PR contains unnecessary code changes related to `DistributedRowMatrix` 
(`DistributedRowMatrix.scala` and `DistributedRowMatrixSuite.scala`). Please 
sync this PR with current master.


> Distributed Linear Algebra: block-based matrix
> --
>
> Key: FLINK-3920
> URL: https://issues.apache.org/jira/browse/FLINK-3920
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Simone Robutti
>Assignee: Simone Robutti
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2152: [FLINK-3920] Distributed Linear Algebra: block-based matr...

2016-06-28 Thread chiwanpark
Github user chiwanpark commented on the issue:

https://github.com/apache/flink/pull/2152
  
This PR contains unnecessary code changes related to `DistributedRowMatrix` 
(`DistributedRowMatrix.scala` and `DistributedRowMatrixSuite.scala`). Please 
sync this PR with current master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2152: [FLINK-3920] Distributed Linear Algebra: block-bas...

2016-06-28 Thread chiwanpark
Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/2152#discussion_r68885212
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/math/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math.distributed
+
+import java.lang
--- End diff --

This line still imports `java.lang`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3920) Distributed Linear Algebra: block-based matrix

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354539#comment-15354539
 ] 

ASF GitHub Bot commented on FLINK-3920:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/2152#discussion_r68885212
  
--- Diff: 
flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/math/distributed/BlockMatrix.scala
 ---
@@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.math.distributed
+
+import java.lang
--- End diff --

This line still imports `java.lang`.


> Distributed Linear Algebra: block-based matrix
> --
>
> Key: FLINK-3920
> URL: https://issues.apache.org/jira/browse/FLINK-3920
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Simone Robutti
>Assignee: Simone Robutti
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4128) compile error about git-commit-id-plugin

2016-06-28 Thread Mao, Wei (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354385#comment-15354385
 ] 

Mao, Wei commented on FLINK-4128:
-

Thanks for the info. Then I think it's make sense to upgrade to latest stable 
version.

> compile error about git-commit-id-plugin
> 
>
> Key: FLINK-4128
> URL: https://issues.apache.org/jira/browse/FLINK-4128
> Project: Flink
>  Issue Type: Bug
>Reporter: Mao, Wei
>
> When I build with latest flink code, I got following error:
> {quote}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 01:06 h
> [INFO] Finished at: 2016-06-28T22:11:58+08:00
> [INFO] Final Memory: 104M/3186M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project 
> flink-runtime_2.11: Execution default of goal 
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. 
> NullPointerException -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-runtime_2.11
> {quote}
> I think it's because wrong `doGetDirectory` value is provided.
> And another question is if we should upgrade the version of this plugin, so 
> that we can got more meaningful error message instead of NPE. Eg:
> {quote}
> Could not get HEAD Ref, are you sure you have some commits in the 
> dotGitDirectory?
> {quote}
> Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no 
> longer supported with new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4128) compile error about git-commit-id-plugin

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354215#comment-15354215
 ] 

ASF GitHub Bot commented on FLINK-4128:
---

Github user mwws commented on the issue:

https://github.com/apache/flink/pull/2179
  
Yes, I am also confused about why it can compile before. But according to 
[archived flink-dev 
mail](http://mail-archives.apache.org/mod_mbox/flink-dev/201506.mbox/%3CCAJLORfdqMsJdNricb8WDSBrNWPzgAD=oqdWned1cL3KFB+i+=g...@mail.gmail.com%3E)
 Some people do have the same issue as me.



> compile error about git-commit-id-plugin
> 
>
> Key: FLINK-4128
> URL: https://issues.apache.org/jira/browse/FLINK-4128
> Project: Flink
>  Issue Type: Bug
>Reporter: Mao, Wei
>
> When I build with latest flink code, I got following error:
> {quote}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 01:06 h
> [INFO] Finished at: 2016-06-28T22:11:58+08:00
> [INFO] Final Memory: 104M/3186M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project 
> flink-runtime_2.11: Execution default of goal 
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. 
> NullPointerException -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :flink-runtime_2.11
> {quote}
> I think it's because wrong `doGetDirectory` value is provided.
> And another question is if we should upgrade the version of this plugin, so 
> that we can got more meaningful error message instead of NPE. Eg:
> {quote}
> Could not get HEAD Ref, are you sure you have some commits in the 
> dotGitDirectory?
> {quote}
> Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no 
> longer supported with new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2179: [FLINK-4128] compile error about git-commit-id-plugin

2016-06-28 Thread mwws
Github user mwws commented on the issue:

https://github.com/apache/flink/pull/2179
  
Yes, I am also confused about why it can compile before. But according to 
[archived flink-dev 
mail](http://mail-archives.apache.org/mod_mbox/flink-dev/201506.mbox/%3CCAJLORfdqMsJdNricb8WDSBrNWPzgAD=oqdWned1cL3KFB+i+=g...@mail.gmail.com%3E)
 Some people do have the same issue as me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68848969
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
--- End diff --

You mean the error in this declaration: 
```java
final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] 
: "-";
```
and not to check directly for length greater than two, because in that way 
the user will have to provide all three delimiters or none.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68848469
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
+
+
+   DataSet> vertices = this.getVerticesAsTuple2();
+
+   DataSet> edgesNValues = 
this.getEdgesAsTuple3();
--- End diff --

As I see now, we don't have to convert the vertex set to tuple2 set, so I 
already changed that.

Regarding the edges dataset, in order to write the Adjacency List file, I 
use the coGroup transformation to the Vertex dataset and EdgesAsTuple3 dataset, 
where the vertexId equals the source of the edge. 

In that case, even when a Vertex is source to no edges (e.g. has only 
incoming edges), I can still have the vertexId in the "coGrouped" dataset (I 
couldn't do that with a join).

I can't think how I could use the Edge dataset in a coGroup or similar 
transformation. 
Please let me know if you have any suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
Github user fobeligi commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68846112
  
--- Diff: 
flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala
 ---
@@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, 
EV]) {
*
* @param analytic the analytic to run on the Graph
*/
-  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T]):
-  GraphAnalytic[K, VV, EV, T] = {
+  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T])= {
--- End diff --

No, I will revert the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353687#comment-15353687
 ] 

ASF GitHub Bot commented on FLINK-3879:
---

Github user vasia commented on the issue:

https://github.com/apache/flink/pull/1967
  
Hey @greghogan,
was there consensus regarding this change? I see the numbers, but did 
anyone review this PR?
I've been offline for the past few days, and I now see that nobody reviewed 
#2160, #2079, #2067, #1997  either...
I don't doubt that you have done a great job, but it is _always_ better to 
let someone review your code before you merge. We don't usually merge PRs 
without a +1 unless it is something trivial. I understand things move faster 
this way, but we are in a community and we should try to collaborate.
Please, leave a comment next time you think a PR has stayed with no review 
for a long time or ping me personally if you want a 2nd pair of eyes on gelly 
stuff :)
Thanks!


> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1967: [FLINK-3879] [gelly] Native implementation of HITS algori...

2016-06-28 Thread vasia
Github user vasia commented on the issue:

https://github.com/apache/flink/pull/1967
  
Hey @greghogan,
was there consensus regarding this change? I see the numbers, but did 
anyone review this PR?
I've been offline for the past few days, and I now see that nobody reviewed 
#2160, #2079, #2067, #1997  either...
I don't doubt that you have done a great job, but it is _always_ better to 
let someone review your code before you merge. We don't usually merge PRs 
without a +1 unless it is something trivial. I understand things move faster 
this way, but we are in a community and we should try to collaborate.
Please, leave a comment next time you think a PR has stayed with no review 
for a long time or ping me personally if you want a 2nd pair of eyes on gelly 
stuff :)
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353650#comment-15353650
 ] 

ASF GitHub Bot commented on FLINK-3879:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1967


> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3879) Native implementation of HITS algorithm

2016-06-28 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3879.
-
Resolution: Fixed

Implemented in 40749ddcd73c4634d81c2153f64e8934d519be3d

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1967: [FLINK-3879] [gelly] Native implementation of HITS...

2016-06-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1967


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68832940
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
+
+
+   DataSet> vertices = this.getVerticesAsTuple2();
+
+   DataSet> edgesNValues = 
this.getEdgesAsTuple3();
--- End diff --

Do we need to convert the vertex and edge sets to tuples?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68832549
  
--- Diff: 
flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala
 ---
@@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, 
EV]) {
*
* @param analytic the analytic to run on the Graph
*/
-  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T]):
-  GraphAnalytic[K, VV, EV, T] = {
+  def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, 
EV, T])= {
--- End diff --

Was this change intended?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2178#discussion_r68832491
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java ---
@@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String 
edgesPath, ExecutionEnvironmen
}
 
/**
+* Creates a graph from a Adjacency List text file  with Vertex Key 
values. Edges will be created automatically.
+*
+* @param filePath a path to an Adjacency List text file with the 
Vertex data
+* @param context  the execution environment.
+* @return An instance of {@link 
org.apache.flink.graph.GraphAdjacencyListReader},
+* on which calling methods to specify types of the Vertex ID, Vertex 
value and Edge value returns a Graph.
+*/
+   public static GraphAdjacencyListReader fromAdjacencyListFile(String 
filePath, ExecutionEnvironment context) {
+   return new GraphAdjacencyListReader(filePath, context);
+   }
+
+   /**
+* Writes a graph as an Adjacency List formatted text file in a user 
specified folder.
+*
+* @param filePath   the path that the Adjacency List formatted text 
file should be written in
+* @param delimiters the delimiters that separate the different value 
types in the Adjacency List formatted text
+*   file. Delimiters should be provided with the 
following order:
+*   NEIGHBOR_DELIMITER : separating source from its 
neighbors
+*   VERTICES_DELIMITER : separating the different 
neighbors of a source vertex
+*   VERTEX_VALUE_DELIMITER: separating the source 
vertex-id from the vertex value, as well as the
+*   target vertex-ids from the edge value.
+*/
+   public void writeAsAdjacencyList(String filePath, String... delimiters) 
{
+
+   final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? 
delimiters[0] : "\t";
+
+   final String VERTICES_DELIMITER = delimiters.length > 1 ? 
delimiters[1] : ",";
+
+   final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? 
delimiters[2] : "-";
--- End diff --

Test length against "2".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-3277) Use Value types in Gelly API

2016-06-28 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3277.
-
Resolution: Fixed

Implemented in 40749ddcd73c4634d81c2153f64e8934d519be3d

> Use Value types in Gelly API
> 
>
> Key: FLINK-3277
> URL: https://issues.apache.org/jira/browse/FLINK-3277
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> This would be a breaking change so the discussion needs to happen before the 
> 1.0.0 release.
> I think it would benefit Flink to use {{Value}} types wherever possible. The 
> {{Graph}} functions {{inDegrees}}, {{outDegrees}}, and {{getDegrees}} each 
> return {{DataSet>}}. Using {{Long}} creates a new heap object 
> for every serialization and deserialization. The mutable {{Value}} types do 
> not suffer from this issue when object reuse is enabled.
> I lean towards a preference for conciseness in documentation and performance 
> in examples and APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3277) Use Value types in Gelly API

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353602#comment-15353602
 ] 

ASF GitHub Bot commented on FLINK-3277:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1671


> Use Value types in Gelly API
> 
>
> Key: FLINK-3277
> URL: https://issues.apache.org/jira/browse/FLINK-3277
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.0.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> This would be a breaking change so the discussion needs to happen before the 
> 1.0.0 release.
> I think it would benefit Flink to use {{Value}} types wherever possible. The 
> {{Graph}} functions {{inDegrees}}, {{outDegrees}}, and {{getDegrees}} each 
> return {{DataSet>}}. Using {{Long}} creates a new heap object 
> for every serialization and deserialization. The mutable {{Value}} types do 
> not suffer from this issue when object reuse is enabled.
> I lean towards a preference for conciseness in documentation and performance 
> in examples and APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #1671: [FLINK-3277] Use Value types in Gelly API

2016-06-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1671


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-4129) HITSAlgorithm should test for element-wise convergence

2016-06-28 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-4129:
-

 Summary: HITSAlgorithm should test for element-wise convergence
 Key: FLINK-4129
 URL: https://issues.apache.org/jira/browse/FLINK-4129
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 1.1.0
Reporter: Greg Hogan
Priority: Minor


{{HITSAlgorithm}} tests for convergence by summing the difference of each 
authority score minus the average score. This is simply comparing the sum of 
scores against the previous sum of scores which is not a good test for 
convergence.

{code}
// count the diff value of sum of authority scores
diffSumAggregator.aggregate(previousAuthAverage - newAuthorityValue.getValue());
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4127) Clean up configuration and check breaking API changes

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353525#comment-15353525
 ] 

ASF GitHub Bot commented on FLINK-4127:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2177#discussion_r68819516
  
--- Diff: docs/setup/config.md ---
@@ -85,6 +85,8 @@ The default fraction for managed memory can be adjusted 
using the `taskmanager.m
 
 - `taskmanager.memory.preallocate`: Can be either of `true` or `false`. 
Specifies whether task managers should allocate all managed memory when 
starting up. (DEFAULT: false)
 
+- `taskmanager.runtime.large-record-handler`: Whether to use the 
LargeRecordHandler when spilling. This feature is experimental. (DEFAULT: false)
--- End diff --

Have we documented when it would be useful to enable `LargeRecordHandler`?


> Clean up configuration and check breaking API changes
> -
>
> Key: FLINK-4127
> URL: https://issues.apache.org/jira/browse/FLINK-4127
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 1.1.0
>
> Attachments: flink-core.html, flink-java.html, flink-scala.html, 
> flink-streaming-java.html, flink-streaming-scala.html
>
>
> For the upcoming 1.1. release, I'll check if there are any breaking API 
> changes and if the documentation is up tp date with the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2177: [FLINK-4127] Check API compatbility for 1.1 in fli...

2016-06-28 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/2177#discussion_r68819516
  
--- Diff: docs/setup/config.md ---
@@ -85,6 +85,8 @@ The default fraction for managed memory can be adjusted 
using the `taskmanager.m
 
 - `taskmanager.memory.preallocate`: Can be either of `true` or `false`. 
Specifies whether task managers should allocate all managed memory when 
starting up. (DEFAULT: false)
 
+- `taskmanager.runtime.large-record-handler`: Whether to use the 
LargeRecordHandler when spilling. This feature is experimental. (DEFAULT: false)
--- End diff --

Have we documented when it would be useful to enable `LargeRecordHandler`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2179: [Flink-4128] fix flink-runtime compile error about git-co...

2016-06-28 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/2179
  
The modified line is ancient code. It's not clear why this is necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353454#comment-15353454
 ] 

ASF GitHub Bot commented on FLINK-3477:
---

Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/1517
  
+1 with just a few superficial comments.

Reading back through the discussion I see that there are many ideas for 
future performance enhancements. If not already suggested I'd like to consider 
skipping staging for fixed length records.

I'm missing why we can't update in place with smaller records. The 
deserializer is responsible for detecting the end of the record and we wouldn't 
need to change the pointer value when replacing with a smaller record.

CombineHint.NONE can be implemented in a new PR since this looks to be 
ready as-is.


> Add hash-based combine strategy for ReduceFunction
> --
>
> Key: FLINK-3477
> URL: https://issues.apache.org/jira/browse/FLINK-3477
> Project: Flink
>  Issue Type: Sub-task
>  Components: Local Runtime
>Reporter: Fabian Hueske
>Assignee: Gabor Gevay
>
> This issue is about adding a hash-based combine strategy for ReduceFunctions.
> The interface of the {{reduce()}} method is as follows:
> {code}
> public T reduce(T v1, T v2)
> {code}
> Input type and output type are identical and the function returns only a 
> single value. A Reduce function is incrementally applied to compute a final 
> aggregated value. This allows to hold the preaggregated value in a hash-table 
> and update it with each function call. 
> The hash-based strategy requires special implementation of an in-memory hash 
> table. The hash table should support in place updates of elements (if the 
> updated value has the same size as the new value) but also appending updates 
> with invalidation of the old value (if the binary length of the new value 
> differs). The hash table needs to be able to evict and emit all elements if 
> it runs out-of-memory.
> We should also add {{HASH}} and {{SORT}} compiler hints to 
> {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the 
> execution strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1517: [FLINK-3477] [runtime] Add hash-based combine strategy fo...

2016-06-28 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/1517
  
+1 with just a few superficial comments.

Reading back through the discussion I see that there are many ideas for 
future performance enhancements. If not already suggested I'd like to consider 
skipping staging for fixed length records.

I'm missing why we can't update in place with smaller records. The 
deserializer is responsible for detecting the end of the record and we wouldn't 
need to change the pointer value when replacing with a smaller record.

CombineHint.NONE can be implemented in a new PR since this looks to be 
ready as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3674) Add an interface for EventTime aware User Function

2016-06-28 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353430#comment-15353430
 ] 

ramkrishna.s.vasudevan commented on FLINK-3674:
---

So in all the Stream UDF implementations if we are checking if the userFunction 
is an instance of the new Interface 'EventTime', call the new API in that 
interface?  And call the new API in #processWatermark(WaterMark) flow.

> Add an interface for EventTime aware User Function
> --
>
> Key: FLINK-3674
> URL: https://issues.apache.org/jira/browse/FLINK-3674
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming
>Affects Versions: 1.0.0
>Reporter: Stephan Ewen
>
> I suggest to add an interface that UDFs can implement, which will let them be 
> notified upon watermark updates.
> Example usage:
> {code}
> public interface EventTimeFunction {
> void onWatermark(Watermark watermark);
> }
> public class MyMapper implements MapFunction, 
> EventTimeFunction {
> private long currentEventTime = Long.MIN_VALUE;
> public String map(String value) {
> return value + " @ " + currentEventTime;
> }
> public void onWatermark(Watermark watermark) {
> currentEventTime = watermark.getTimestamp();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-28 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353421#comment-15353421
 ] 

Greg Hogan commented on FLINK-3879:
---

Using CombineHint.HASH the HITS timings dropped from/to:

scale 10: 1,034 ms -> 1,106 ms
scale 12: 1,115 ms -> 1,090 ms
scale 14: 1,974 ms -> 1,608 ms
scale 16: 5,843 ms -> 3,841 ms
scale 18: 21,927 ms -> 13,345 ms
scale 20: 93,488 ms -> 54,570 ms

Impressive [~ggevay]!

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4128) compile error about git-commit-id-plugin

2016-06-28 Thread Mao, Wei (JIRA)
Mao, Wei created FLINK-4128:
---

 Summary: compile error about git-commit-id-plugin
 Key: FLINK-4128
 URL: https://issues.apache.org/jira/browse/FLINK-4128
 Project: Flink
  Issue Type: Bug
Reporter: Mao, Wei


When I build with latest flink code, I got following error:
{quote}
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:06 h
[INFO] Finished at: 2016-06-28T22:11:58+08:00
[INFO] Final Memory: 104M/3186M
[INFO] 
[ERROR] Failed to execute goal 
pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project 
flink-runtime_2.11: Execution default of goal 
pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. 
NullPointerException -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :flink-runtime_2.11
{quote}

I think it's because wrong `doGetDirectory` value is provided.

And another question is if we should upgrade the version of this plugin, so 
that we can got more meaningful error message instead of NPE. Eg:
{quote}
Could not get HEAD Ref, are you sure you have some commits in the 
dotGitDirectory?
{quote}

Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no 
longer supported with new version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2179: [Flink-4128]fix flink-runtime compile error about ...

2016-06-28 Thread mwws
GitHub user mwws opened a pull request:

https://github.com/apache/flink/pull/2179

[Flink-4128]fix flink-runtime compile error about git-commit-id-plugin

When I build with latest flink code, I got following error:

> [INFO] 

> [INFO] BUILD FAILURE
> [INFO] 

> [INFO] Total time: 01:06 h
> [INFO] Finished at: 2016-06-28T22:11:58+08:00
> [INFO] Final Memory: 104M/3186M
> [INFO] 

> [ERROR] Failed to execute goal 
pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project 
flink-runtime_2.11: Execution default of goal 
pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. 
NullPointerException -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the 
-e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, 
please read the following articles:
> [ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
command
> [ERROR]   mvn  -rf :flink-runtime_2.11

I think it's because wrong `doGetDirectory` value is provided.  I have fix 
it and compile is fine for me now.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mwws/flink git-plugin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2179


commit ac2bccfd416e5ce5343c27d9b8c33959271977b3
Author: unknown 
Date:   2016-06-28T16:50:26Z

fix flink runtime compile error about git-commit-id-plugin




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm

2016-06-28 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353348#comment-15353348
 ] 

Greg Hogan commented on FLINK-3879:
---

I had not realized pr1517 was defaulting to the old sort-based combine rather 
than the new hash-based combine.

> Native implementation of HITS algorithm
> ---
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is 
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on 
> the link information among a set of documents. The algorithm presumes that a 
> good hub is a document that points to many others, and a good authority is a 
> document that many documents point to." 
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence, 
> outputting both hub and authority scores, and completing in half the number 
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...

2016-06-28 Thread fobeligi
GitHub user fobeligi opened a pull request:

https://github.com/apache/flink/pull/2178

[Flink-1815] Add methods to read and write a Graph as adjacency list

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [ ] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fobeligi/incubator-flink FLINK-1815

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2178


commit 3a9502da61b7758e1383803d5141a16fe3a5777a
Author: fobeligi 
Date:   2016-06-22T16:11:23Z

[FLINK-1815] Add GraphAdjacencyListReader class to read an Adjacency List 
formatted text file. Moreover, add writeAsAdjacencyList method to Graph. Test 
cases are also added for each new method.

commit 8aab5b40e031b132c46782a5908d58cc6290892f
Author: fobeligi 
Date:   2016-06-28T08:49:03Z

[FLINK-1815] Add fromAdjacencyListFile and writeAsAdjacencyList methods to 
Graph scala API. Tests are also added.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics

2016-06-28 Thread zentol
Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2146
  
I moved the checkpoint metrics into the Tracker (and reverted the changed 
to ExecutionGraph). Currently trying it out locally.

Regarding the exception catching in the metrics: I can't decide whether we 
should try to write all metrics in such a way that they can't throw exceptions, 
or write the reporters in such a way that they can deal with it. (usually by 
logging the exception). The first option is safer considering custom reporters, 
but the second will allows us to properly log them.

Regarding a test: While i agree that such a test would be nice i can only 
come up with icky ways to test it. You have to access the metric _while a job 
is running_ as they are removed afterwards. So you either have to submit a job 
that blocks until _something_ happens, or you add a reporter that feeds that 
information back to the test _somehow_.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353280#comment-15353280
 ] 

ASF GitHub Bot commented on FLINK-1550:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/2146
  
I moved the checkpoint metrics into the Tracker (and reverted the changed 
to ExecutionGraph). Currently trying it out locally.

Regarding the exception catching in the metrics: I can't decide whether we 
should try to write all metrics in such a way that they can't throw exceptions, 
or write the reporters in such a way that they can deal with it. (usually by 
logging the exception). The first option is safer considering custom reporters, 
but the second will allows us to properly log them.

Regarding a test: While i agree that such a test would be nice i can only 
come up with icky ways to test it. You have to access the metric _while a job 
is running_ as they are removed afterwards. So you either have to submit a job 
that blocks until _something_ happens, or you add a reporter that feeds that 
information back to the test _somehow_.


> Show JVM Metrics for JobManager
> ---
>
> Key: FLINK-1550
> URL: https://issues.apache.org/jira/browse/FLINK-1550
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, Metrics
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353092#comment-15353092
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user tzulitai commented on the issue:

https://github.com/apache/flink/pull/2175
  
LGTM ;)


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4127) Clean up configuration and check breaking API changes

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353027#comment-15353027
 ] 

ASF GitHub Bot commented on FLINK-4127:
---

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/2177

[FLINK-4127] Check API compatbility for 1.1 in flink-core

I checked all the newly introduced methods in public APIs by going through 
the reports generated from japicmp.
I've also put the reports (before my PR) into the JIRA: 
https://issues.apache.org/jira/browse/FLINK-4127

I added the new configuration parameters to the documentation, and renamed 
some new configuration keys.

@uce @tillrohrmann @mxm: What do you think about the renaming of the 
configuration keys?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink4127

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2177


commit b4073b93c9be271068faa97a139b20b9e9d6e356
Author: Robert Metzger 
Date:   2016-06-28T13:12:05Z

[FLINK-4127] Check API compatbility for 1.1 in flink-core




> Clean up configuration and check breaking API changes
> -
>
> Key: FLINK-4127
> URL: https://issues.apache.org/jira/browse/FLINK-4127
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 1.1.0
>
> Attachments: flink-core.html, flink-java.html, flink-scala.html, 
> flink-streaming-java.html, flink-streaming-scala.html
>
>
> For the upcoming 1.1. release, I'll check if there are any breaking API 
> changes and if the documentation is up tp date with the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2177: [FLINK-4127] Check API compatbility for 1.1 in fli...

2016-06-28 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/2177

[FLINK-4127] Check API compatbility for 1.1 in flink-core

I checked all the newly introduced methods in public APIs by going through 
the reports generated from japicmp.
I've also put the reports (before my PR) into the JIRA: 
https://issues.apache.org/jira/browse/FLINK-4127

I added the new configuration parameters to the documentation, and renamed 
some new configuration keys.

@uce @tillrohrmann @mxm: What do you think about the renaming of the 
configuration keys?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink4127

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2177.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2177


commit b4073b93c9be271068faa97a139b20b9e9d6e356
Author: Robert Metzger 
Date:   2016-06-28T13:12:05Z

[FLINK-4127] Check API compatbility for 1.1 in flink-core




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2352) [Graph Visualization] Integrate Gelly with Gephi

2016-06-28 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352996#comment-15352996
 ] 

Greg Hogan commented on FLINK-2352:
---

I do not think that this ticket is a good fit for Flink. Visualization of 
results is better left to external projects such as Apache Zeppelin which can 
process output from multiple data engines.

> [Graph Visualization] Integrate Gelly with Gephi
> 
>
> Key: FLINK-2352
> URL: https://issues.apache.org/jira/browse/FLINK-2352
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10.0
>Reporter: Andra Lungu
>
> This integration will allow users to see the real-time progress of their 
> graph. They could also visually verify results for clustering algorithms, for 
> example. Gephi is free/open-source and provides support for all types of 
> networks, including dynamic and hierarchical graphs. 
> A first step would be to add the Gephi Toolkit to the pom.xml.
> https://github.com/gephi/gephi-toolkit
> Afterwards, a GraphBuilder similar to this one
> https://github.com/palmerabollo/test-twitter-graph/blob/master/src/main/java/es/guido/twitter/graph/GraphBuilder.java
> can be implemented. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352988#comment-15352988
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68757090
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
--- End diff --

Will undo this as well


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352989#comment-15352989
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2175
  
Thank you for the review. It seems that I'm not that focused today ;)


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2175: [FLINK-4085][Kinesis] Set Flink-specific user agent

2016-06-28 Thread rmetzger
Github user rmetzger commented on the issue:

https://github.com/apache/flink/pull/2175
  
Thank you for the review. It seems that I'm not that focused today ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352986#comment-15352986
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68757009
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
+   // set specific user agent
+   config.setUserAgent("Apache Flink " + 
EnvironmentInformation.getVersion() + " (" + 
EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis 
Connector");
+   }
AmazonKinesisClient client = new 
AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials());
--- End diff --

oh, indeed ;)


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...

2016-06-28 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68757090
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
--- End diff --

Will undo this as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...

2016-06-28 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68757009
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
+   // set specific user agent
+   config.setUserAgent("Apache Flink " + 
EnvironmentInformation.getVersion() + " (" + 
EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis 
Connector");
+   }
AmazonKinesisClient client = new 
AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials());
--- End diff --

oh, indeed ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2176: [FLINK-4118] The docker-flink image is outdated (1.0.2) a...

2016-06-28 Thread iemejia
Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2176
  
Sorry I had to rebase my previous PR but this is the definitive one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352966#comment-15352966
 ] 

ASF GitHub Bot commented on FLINK-4118:
---

Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2176
  
Sorry I had to rebase my previous PR but this is the definitive one.


> The docker-flink image is outdated (1.0.2) and can be slimmed down
> --
>
> Key: FLINK-4118
> URL: https://issues.apache.org/jira/browse/FLINK-4118
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ismaël Mejía
>Priority: Minor
>
> This issue is to upgrade the docker image and polish some details in it (e.g. 
> it can be slimmed down if we remove some unneeded dependencies, and the code 
> can be polished).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352961#comment-15352961
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68752991
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
--- End diff --

Why do we have to check if it equals the default user agent before setting 
it to Flink?


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...

2016-06-28 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68752991
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
--- End diff --

Why do we have to check if it equals the default user agent before setting 
it to Flink?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352957#comment-15352957
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68752667
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
+   // set specific user agent
+   config.setUserAgent("Apache Flink " + 
EnvironmentInformation.getVersion() + " (" + 
EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis 
Connector");
+   }
AmazonKinesisClient client = new 
AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials());
--- End diff --

The client isn't using the new `ClientConfiguration`.


> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...

2016-06-28 Thread tzulitai
Github user tzulitai commented on a diff in the pull request:

https://github.com/apache/flink/pull/2175#discussion_r68752667
  
--- Diff: 
flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java
 ---
@@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) {
this.configProps = checkNotNull(configProps);
 
this.regionId = 
configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION);
+   ClientConfigurationFactory configurationFactory = new 
ClientConfigurationFactory();
+   ClientConfiguration config = configurationFactory.getConfig();
+   
if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) {
+   // set specific user agent
+   config.setUserAgent("Apache Flink " + 
EnvironmentInformation.getVersion() + " (" + 
EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis 
Connector");
+   }
AmazonKinesisClient client = new 
AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials());
--- End diff --

The client isn't using the new `ClientConfiguration`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352950#comment-15352950
 ] 

ASF GitHub Bot commented on FLINK-4118:
---

Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2176
  
The docker images script was simplified and the image size was reduced.

Previous image:
flink latest 6475add651c7 24 minutes ago 711.6 MB

Image after FLINK-4118
flink latest 555e60f24c10 20 seconds ago 252.5 MB



> The docker-flink image is outdated (1.0.2) and can be slimmed down
> --
>
> Key: FLINK-4118
> URL: https://issues.apache.org/jira/browse/FLINK-4118
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ismaël Mejía
>Priority: Minor
>
> This issue is to upgrade the docker image and polish some details in it (e.g. 
> it can be slimmed down if we remove some unneeded dependencies, and the code 
> can be polished).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2176: Flink 4118

2016-06-28 Thread iemejia
Github user iemejia commented on the issue:

https://github.com/apache/flink/pull/2176
  
The docker images script was simplified and the image size was reduced.

Previous image:
flink latest 6475add651c7 24 minutes ago 711.6 MB

Image after FLINK-4118
flink latest 555e60f24c10 20 seconds ago 252.5 MB



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2176: Flink 4118

2016-06-28 Thread iemejia
GitHub user iemejia opened a pull request:

https://github.com/apache/flink/pull/2176

Flink 4118

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/iemejia/flink FLINK-4118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2176.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2176


commit 7f7f24a1c31aa3e9b011fb492280389ac500fff4
Author: Ismaël Mejía 
Date:   2016-06-23T16:39:50Z

[FLINK-4118] Update docker image to 1.0.3 and remove unneeded deps

Some of the changes include:

- Remove unneeded dependencies (nano, wget)
- Remove apt lists to reduce image size
- Reduce number of layers on the docker image (best docker practice)
- Remove useless variables and base the code in generic ones e.g.
FLINK_HOME
- Change the default JDK from oracle to openjdk-8-jre-headless, based on
two reasons:

1. You cannot legally repackage the oracle jdk in docker images
2. The open-jdk headless is more appropriate for a server image (no GUI 
stuff)

- Return port assignation to the standard FLINK one:

Variable: docker-flink -> flink

taskmanager.rpc.port: 6121 -> 6122
taskmanager.data.port: 6122 -> 6121
jobmanager.web.port: 8080 -> 8081

commit 04c18edbbb6b109c1b23c28c17f82bde080b8686
Author: Ismaël Mejía 
Date:   2016-06-24T14:52:22Z

[FLINK-4118] Base the image on the official java alpine and remove ssh

commit eff306e80ffe605f6f94d2b5764520fe6462b1f7
Author: Ismaël Mejía 
Date:   2016-06-27T11:23:10Z

[FLINK-4118] Remove unused configuration files and fix README

commit e386b5e99c086e3279454a3e574edc7a02c54cdb
Author: Ismaël Mejía 
Date:   2016-06-27T15:01:13Z

[FLINK-4118] Update compose to v2 + add new entrypoint for direct execution

commit 34f18849cd610199805bc396b2ac850242f5ab6d
Author: Ismaël Mejía 
Date:   2016-06-27T16:52:48Z

[FLINK-4118] Change default entrypoint so users can run flink as a client 
too




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352937#comment-15352937
 ] 

ASF GitHub Bot commented on FLINK-4084:
---

Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68750010
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

Added chained version, the second command runs only if the first was 
successful; i think it is a more correct behaviour.


> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...

2016-06-28 Thread alkagin
Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68750010
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

Added chained version, the second command runs only if the first was 
successful; i think it is a more correct behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/2154
  
Thanks for updating/fixing the documentation @aljoscha ! 
I had some comments that I posted here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352922#comment-15352922
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on the issue:

https://github.com/apache/flink/pull/2154
  
Thanks for updating/fixing the documentation @aljoscha ! 
I had some comments that I posted here. 


> Update Windowing Documentation
> --
>
> Key: FLINK-4062
> URL: https://issues.apache.org/jira/browse/FLINK-4062
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.1.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The window documentation could be a bit more principled and also needs 
> updating with the new allowed lateness setting.
> There is also essentially no documentation about how to write a custom 
> trigger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3034) Redis SInk Connector

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352920#comment-15352920
 ] 

ASF GitHub Bot commented on FLINK-3034:
---

Github user subhankarb commented on the issue:

https://github.com/apache/flink/pull/1813
  
@mjsax , @rmetzger  plz review. The changed model is described in the PR 
description.

thanks,
subhankar


> Redis SInk Connector
> 
>
> Key: FLINK-3034
> URL: https://issues.apache.org/jira/browse/FLINK-3034
> Project: Flink
>  Issue Type: New Feature
>  Components: Streaming Connectors
>Reporter: Matthias J. Sax
>Assignee: Subhankar Biswas
>Priority: Minor
>
> Flink does not provide a sink connector for Redis.
> See FLINK-3033



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #1813: [FLINK-3034] Redis Sink Connector

2016-06-28 Thread subhankarb
Github user subhankarb commented on the issue:

https://github.com/apache/flink/pull/1813
  
@mjsax , @rmetzger  plz review. The changed model is described in the PR 
description.

thanks,
subhankar


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352914#comment-15352914
 ] 

ASF GitHub Bot commented on FLINK-4084:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68746527
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

I suppose it doesn't matter much but I like the chained version better.


> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...

2016-06-28 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68746527
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

I suppose it doesn't matter much but I like the chained version better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-4127) Clean up configuration and check breaking API changes

2016-06-28 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-4127:
--
Attachment: flink-streaming-scala.html
flink-streaming-java.html
flink-scala.html

I added the Japicmp reports for the covered modules

> Clean up configuration and check breaking API changes
> -
>
> Key: FLINK-4127
> URL: https://issues.apache.org/jira/browse/FLINK-4127
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 1.1.0
>
> Attachments: flink-core.html, flink-java.html, flink-scala.html, 
> flink-streaming-java.html, flink-streaming-scala.html
>
>
> For the upcoming 1.1. release, I'll check if there are any breaking API 
> changes and if the documentation is up tp date with the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352896#comment-15352896
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744858
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352897#comment-15352897
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2174
  
Thanks for fixing this! I had some comments but they should be easy to 
address.


> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITCase fai...

2016-06-28 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2174
  
Thanks for fixing this! I had some comments but they should be easy to 
address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744858
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352894#comment-15352894
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744802
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[GitHub] flink issue #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics

2016-06-28 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2146
  
I've tested the PR locally and the JobManager metrics were shown properly. 
Good job @zentol :-)

Thus, there are only a test case for the job manager metrics, the question 
whether to use the `CheckpointStatsTracker` or not for the checkpoint metrics 
and the try-catch blocks around the `con.getAttribute` calls left.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-4127) Clean up configuration and check breaking API changes

2016-06-28 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-4127:
--
Attachment: flink-java.html
flink-core.html

> Clean up configuration and check breaking API changes
> -
>
> Key: FLINK-4127
> URL: https://issues.apache.org/jira/browse/FLINK-4127
> Project: Flink
>  Issue Type: Improvement
>Reporter: Robert Metzger
> Fix For: 1.1.0
>
> Attachments: flink-core.html, flink-java.html
>
>
> For the upcoming 1.1. release, I'll check if there are any breaking API 
> changes and if the documentation is up tp date with the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352895#comment-15352895
 ] 

ASF GitHub Bot commented on FLINK-1550:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2146
  
I've tested the PR locally and the JobManager metrics were shown properly. 
Good job @zentol :-)

Thus, there are only a test case for the job manager metrics, the question 
whether to use the `CheckpointStatsTracker` or not for the checkpoint metrics 
and the try-catch blocks around the `con.getAttribute` calls left.


> Show JVM Metrics for JobManager
> ---
>
> Key: FLINK-1550
> URL: https://issues.apache.org/jira/browse/FLINK-1550
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, Metrics
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744802
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352889#comment-15352889
 ] 

ASF GitHub Bot commented on FLINK-1550:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2146#discussion_r68744357
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -663,6 +677,9 @@ public boolean 
receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws E
LOG.info("Completed checkpoint 
" + checkpointId + " (in " +

completed.getDuration() + " ms)");
 
+   lastCheckpointSize = 
completed.getStateSize();
+   lastCheckpointDuration = 
completed.getDuration();
--- End diff --

Would it make sense to use the `CheckpointStatsTracker` to obtain the 
values for the last checkpoint? Given that the JobManager is not so high 
performance critical with respect to the calculation of the metrics.


> Show JVM Metrics for JobManager
> ---
>
> Key: FLINK-1550
> URL: https://issues.apache.org/jira/browse/FLINK-1550
> Project: Flink
>  Issue Type: Sub-task
>  Components: JobManager, Metrics
>Reporter: Robert Metzger
>Assignee: Chesnay Schepler
> Fix For: pre-apache
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3898) Adamic-Adar Similarity

2016-06-28 Thread Greg Hogan (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-3898.
-
Resolution: Fixed

Implemented in f9552d8dc25564405b37f3f0b6ac9addf497b722

> Adamic-Adar Similarity
> --
>
> Key: FLINK-3898
> URL: https://issues.apache.org/jira/browse/FLINK-3898
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> The implementation of Adamic-Adar Similarity [0] is very close to Jaccard 
> Similarity. Whereas Jaccard Similarity counts common neighbors, Adamic-Adar 
> Similarity sums the inverse logarithm of the degree of common neighbors.
> Consideration will be given to the computation of the inverse logarithm, in 
> particular whether to pre-compute a small array of values.
> [0] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352887#comment-15352887
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744340
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352886#comment-15352886
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744278
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -191,11 +197,11 @@ public void close() throws Exception {
 
private volatile boolean isSplitOpen = false;
 
-   SplitReader(FileInputFormat format,
+   private SplitReader(FileInputFormat format,
TypeSerializer serializer,
TimestampedCollector collector,
Object checkpointLock,
-   Tuple3 restoredState) {
+   Tuple3 recoveredState) {
--- End diff --

I think the old name was fine. In other parts of Flink it's also called 
restored state.


> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics

2016-06-28 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/2146#discussion_r68744357
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java
 ---
@@ -663,6 +677,9 @@ public boolean 
receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws E
LOG.info("Completed checkpoint 
" + checkpointId + " (in " +

completed.getDuration() + " ms)");
 
+   lastCheckpointSize = 
completed.getStateSize();
+   lastCheckpointDuration = 
completed.getDuration();
--- End diff --

Would it make sense to use the `CheckpointStatsTracker` to obtain the 
values for the last checkpoint? Given that the JobManager is not so high 
performance critical with respect to the calculation of the metrics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68744340
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...

2016-06-28 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744213
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -405,6 +414,9 @@ public void restoreState(StreamTaskState state, long 
recoveryTimestamp) throws E
S formatState = (S) ois.readObject();
 
// set the whole reader state for the open() to find.
+   Preconditions.checkArgument(this.readerState == null,
--- End diff --

This should also be `Preconditions.checkState`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...

2016-06-28 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744278
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -191,11 +197,11 @@ public void close() throws Exception {
 
private volatile boolean isSplitOpen = false;
 
-   SplitReader(FileInputFormat format,
+   private SplitReader(FileInputFormat format,
TypeSerializer serializer,
TimestampedCollector collector,
Object checkpointLock,
-   Tuple3 restoredState) {
+   Tuple3 recoveredState) {
--- End diff --

I think the old name was fine. In other parts of Flink it's also called 
restored state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352884#comment-15352884
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744213
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -405,6 +414,9 @@ public void restoreState(StreamTaskState state, long 
recoveryTimestamp) throws E
S formatState = (S) ois.readObject();
 
// set the whole reader state for the open() to find.
+   Preconditions.checkArgument(this.readerState == null,
--- End diff --

This should also be `Preconditions.checkState`


> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352881#comment-15352881
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744158
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -104,7 +104,13 @@ public void open() throws Exception {
this.collector = new TimestampedCollector<>(output);
this.checkpointLock = getContainingTask().getCheckpointLock();
 
+   Preconditions.checkArgument(reader == null, "The reader is 
already initialized.");
+
this.reader = new SplitReader<>(format, serializer, collector, 
checkpointLock, readerState);
+
+   // after initializing the reader, set the state to recovered 
state
--- End diff --

Could you please clarify this sentence. I think something might be missing.


> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...

2016-06-28 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68744158
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -104,7 +104,13 @@ public void open() throws Exception {
this.collector = new TimestampedCollector<>(output);
this.checkpointLock = getContainingTask().getCheckpointLock();
 
+   Preconditions.checkArgument(reader == null, "The reader is 
already initialized.");
+
this.reader = new SplitReader<>(format, serializer, collector, 
checkpointLock, readerState);
+
+   // after initializing the reader, set the state to recovered 
state
--- End diff --

Could you please clarify this sentence. I think something might be missing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...

2016-06-28 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68743982
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -104,7 +104,13 @@ public void open() throws Exception {
this.collector = new TimestampedCollector<>(output);
this.checkpointLock = getContainingTask().getCheckpointLock();
 
+   Preconditions.checkArgument(reader == null, "The reader is 
already initialized.");
--- End diff --

I think this should be `Preconditions.checkState()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352878#comment-15352878
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/2174#discussion_r68743982
  
--- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java
 ---
@@ -104,7 +104,13 @@ public void open() throws Exception {
this.collector = new TimestampedCollector<>(output);
this.checkpointLock = getContainingTask().getCheckpointLock();
 
+   Preconditions.checkArgument(reader == null, "The reader is 
already initialized.");
--- End diff --

I think this should be `Preconditions.checkState()`.


> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352877#comment-15352877
 ] 

ASF GitHub Bot commented on FLINK-4075:
---

Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2174
  
I think having `splitsToFwdOrderedAscByModTime` and `currentSplitsToFwd` as 
fields that are checkpointed is no longer necessary since 
`monitorDirAndForwardSplits()` is called in lock scope and those two fields are 
always set to `null` at the end of the method.




> ContinuousFileProcessingCheckpointITCase failed on Travis
> -
>
> Key: FLINK-4075
> URL: https://issues.apache.org/jira/browse/FLINK-4075
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Kostas Kloudas
>Priority: Critical
>  Labels: test-stability
>
> The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis.
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink issue #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITCase fai...

2016-06-28 Thread aljoscha
Github user aljoscha commented on the issue:

https://github.com/apache/flink/pull/2174
  
I think having `splitsToFwdOrderedAscByModTime` and `currentSplitsToFwd` as 
fields that are checkpointed is no longer necessary since 
`monitorDirAndForwardSplits()` is called in lock scope and those two fields are 
always set to `null` at the end of the method.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-4127) Clean up configuration and check breaking API changes

2016-06-28 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-4127:
-

 Summary: Clean up configuration and check breaking API changes
 Key: FLINK-4127
 URL: https://issues.apache.org/jira/browse/FLINK-4127
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger
 Fix For: 1.1.0


For the upcoming 1.1. release, I'll check if there are any breaking API changes 
and if the documentation is up tp date with the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352872#comment-15352872
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68743505
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352871#comment-15352871
 ] 

ASF GitHub Bot commented on FLINK-4084:
---

Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68743480
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
--- End diff --

There is already one, we need two? 


> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68743505
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...

2016-06-28 Thread alkagin
Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68743480
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
--- End diff --

There is already one, we need two? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352865#comment-15352865
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68743168
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...

2016-06-28 Thread alkagin
Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68743195
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

I found a similar command 
[here](https://github.com/alkagin/flink/blob/76283d3f0fbf5454b208ba1887d847acea09640d/flink-dist/src/main/flink-bin/bin/config.sh#L132).
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352867#comment-15352867
 ] 

ASF GitHub Bot commented on FLINK-4084:
---

Github user alkagin commented on a diff in the pull request:

https://github.com/apache/flink/pull/2149#discussion_r68743195
  
--- Diff: flink-dist/src/main/flink-bin/bin/flink ---
@@ -17,20 +17,41 @@
 # limitations under the License.
 

 
-target="$0"
 # For the case, the executable has been directly symlinked, figure out
 # the correct bin path by following its symlink up to an upper bound.
 # Note: we can't use the readlink utility here if we want to be POSIX
 # compatible.
-iteration=0
-while [ -L "$target" ]; do
-if [ "$iteration" -gt 100 ]; then
-echo "Cannot resolve path: You have a cyclic symlink in $target."
+followSymLink() {
+local iteration=0
+local bar=$1
+while [ -L "$bar" ]; do
+if [ "$iteration" -gt 100 ]; then
+echo "Cannot resolve path: You have a cyclic symlink in $bar."
+break
+fi
+ls=`ls -ld -- "$bar"`
+bar=`expr "$ls" : '.* -> \(.*\)$'`
+iteration=$((iteration + 1))
+done
+
+echo "$bar"
+}
+
+target=$(followSymLink "$0")
+
+#Search --configDir into the parameters and set it as FLINK_CONF_DIR
+args=("$@")
+for i in "${!args[@]}"; do
+if [ ${args[$i]} = "--configDir" ]; then
+dir=$(followSymLink "${args[$(($i+1))]}" )
+if [ -d "$dir" ]; then
+FLINK_CONF_DIR=`cd "${dir}"; pwd -P`
--- End diff --

I found a similar command 
[here](https://github.com/alkagin/flink/blob/76283d3f0fbf5454b208ba1887d847acea09640d/flink-dist/src/main/flink-bin/bin/config.sh#L132).
 


> Add configDir parameter to CliFrontend and flink shell script
> -
>
> Key: FLINK-4084
> URL: https://issues.apache.org/jira/browse/FLINK-4084
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Andrea Sella
>Priority: Minor
>
> At the moment there is no other way than the environment variable 
> FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if 
> it is started via the flink shell script. In order to improve the user 
> exprience, I would propose to introduce a {{--configDir}} parameter which the 
> user can use to specify a configuration directory more easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68743168
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[jira] [Commented] (FLINK-4062) Update Windowing Documentation

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352859#comment-15352859
 ] 

ASF GitHub Bot commented on FLINK-4062:
---

Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68742875
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()

[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation

2016-06-28 Thread kl0u
Github user kl0u commented on a diff in the pull request:

https://github.com/apache/flink/pull/2154#discussion_r68742875
  
--- Diff: docs/apis/streaming/windows.md ---
@@ -24,1023 +24,593 @@ specific language governing permissions and 
limitations
 under the License.
 -->
 
+Flink uses a concept called *windows* to divide a (potentially) infinite 
`DataStream` into slices
+based on the timestamps of elements or other criteria. This division is 
required when working
+with infinite streams of data and performing transformations that 
aggregate elements.
+
+Info We will mostly talk about 
*keyed windowing* here, this
+means that the elements are subdivided based on both window and key before 
being given to
+a user function. Keyed windows have the advantage that work can be 
distributed across the cluster
+because the elements for different keys can be processed in isolation. If 
you absolutely must,
+you can check out [non-keyed windowing](#non-keyed-windowing) where we 
describe how non-keyed
+windows work.
+
 * This will be replaced by the TOC
 {:toc}
 
-## Windows on Keyed Data Streams
-
-Flink offers a variety of methods for defining windows on a `KeyedStream`. 
All of these group elements *per key*,
-i.e., each window will contain elements with the same key value.
+## Basics
 
-### Basic Window Constructs
+For a windowed transformations you must at least specify a *key* (usually 
in the form of a
+`KeySelector`) a *window assigner* and a *window function*. The *key* 
specifies how elements are
+put into groups. The *window assigner* specifies how the infinite stream 
is divided into windows.
+Finally, the *window function* is used to process the elements of each 
window.
 
-Flink offers a general window mechanism that provides flexibility, as well 
as a number of pre-defined windows
-for common use cases. See first if your use case can be served by the 
pre-defined windows below before moving
-to defining your own windows.
+The basic structure of a windowed transformation is thus as follows:
 
 
 
+{% highlight java %}
+DataStream input = ...;
 
-
-
-
-  
-
-  Transformation
-  Description
-
-  
-  
-  
-Tumbling time windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 5 seconds, that "tumbles". This means that 
elements are
-  grouped according to their timestamp in groups of 5 second 
duration, and every element belongs to exactly one window.
- The notion of time is specified by the selected TimeCharacteristic 
(see time).
-{% highlight java %}
-keyedStream.timeWindow(Time.seconds(5));
-{% endhighlight %}
-  
-
-  
-  
-  Sliding time windowKeyedStream  
WindowedStream
-  
-
- Defines a window of 5 seconds, that "slides" by 1 second. 
This means that elements are
- grouped according to their timestamp in groups of 5 second 
duration, and elements can belong to more than
- one window (since windows overlap by at most 4 seconds)
- The notion of time is specified by the selected 
TimeCharacteristic (see time).
-  {% highlight java %}
-keyedStream.timeWindow(Time.seconds(5), Time.seconds(1));
-  {% endhighlight %}
-
-  
-
-  
-Tumbling count windowKeyedStream  
WindowedStream
-
-  
-  Defines a window of 1000 elements, that "tumbles". This means 
that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element belongs to exactly one window.
-{% highlight java %}
-keyedStream.countWindow(1000);
-{% endhighlight %}
-
-
-  
-  
-  Sliding count windowKeyedStream  
WindowedStream
-  
-
-  Defines a window of 1000 elements, that "slides" every 100 
elements. This means that elements are
-  grouped according to their arrival time (equivalent to 
processing time) in groups of 1000 elements,
-  and every element can belong to more than one window (as windows 
overlap by at most 900 elements).
-  {% highlight java %}
-keyedStream.countWindow(1000, 100)
-  {% endhighlight %}
-
-  
-
-  
-
-
+input
+.keyBy()
+.window()
+.();
+{% endhighlight %}
 
 
 
+{% highlight scala %}
+val input: DataStream[T] = ...
 
-
-
-
-  
-
-  Transformation
-  Description
-

[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink

2016-06-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352858#comment-15352858
 ] 

ASF GitHub Bot commented on FLINK-4085:
---

GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/2175

[FLINK-4085][Kinesis] Set Flink-specific user agent

I was asked by Amazon to set a Flink specific user agent when accessing the 
AWS APIs.
I've set an agent for the consumer, for the producer I could not find any 
setting.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink4085

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2175


commit b65abd448ca087c4b49bb841ff4a95efaeebdb36
Author: Robert Metzger 
Date:   2016-06-28T11:54:12Z

[FLINK-4085][Kinesis] Set Flink-specific user agent




> Set Kinesis Consumer Agent to Flink
> ---
>
> Key: FLINK-4085
> URL: https://issues.apache.org/jira/browse/FLINK-4085
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kinesis Connector, Streaming Connectors
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Currently, we use the default Kinesis Agent name.
> I was asked by Amazon to set it to something containing Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...

2016-06-28 Thread rmetzger
GitHub user rmetzger opened a pull request:

https://github.com/apache/flink/pull/2175

[FLINK-4085][Kinesis] Set Flink-specific user agent

I was asked by Amazon to set a Flink specific user agent when accessing the 
AWS APIs.
I've set an agent for the consumer, for the producer I could not find any 
setting.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rmetzger/flink flink4085

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2175


commit b65abd448ca087c4b49bb841ff4a95efaeebdb36
Author: Robert Metzger 
Date:   2016-06-28T11:54:12Z

[FLINK-4085][Kinesis] Set Flink-specific user agent




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   >