[jira] [Commented] (FLINK-3920) Distributed Linear Algebra: block-based matrix
[ https://issues.apache.org/jira/browse/FLINK-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354542#comment-15354542 ] ASF GitHub Bot commented on FLINK-3920: --- Github user chiwanpark commented on the issue: https://github.com/apache/flink/pull/2152 This PR contains unnecessary code changes related to `DistributedRowMatrix` (`DistributedRowMatrix.scala` and `DistributedRowMatrixSuite.scala`). Please sync this PR with current master. > Distributed Linear Algebra: block-based matrix > -- > > Key: FLINK-3920 > URL: https://issues.apache.org/jira/browse/FLINK-3920 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Simone Robutti >Assignee: Simone Robutti > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2152: [FLINK-3920] Distributed Linear Algebra: block-based matr...
Github user chiwanpark commented on the issue: https://github.com/apache/flink/pull/2152 This PR contains unnecessary code changes related to `DistributedRowMatrix` (`DistributedRowMatrix.scala` and `DistributedRowMatrixSuite.scala`). Please sync this PR with current master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2152: [FLINK-3920] Distributed Linear Algebra: block-bas...
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/2152#discussion_r68885212 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/math/distributed/BlockMatrix.scala --- @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.math.distributed + +import java.lang --- End diff -- This line still imports `java.lang`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3920) Distributed Linear Algebra: block-based matrix
[ https://issues.apache.org/jira/browse/FLINK-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354539#comment-15354539 ] ASF GitHub Bot commented on FLINK-3920: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/2152#discussion_r68885212 --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/math/distributed/BlockMatrix.scala --- @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.ml.math.distributed + +import java.lang --- End diff -- This line still imports `java.lang`. > Distributed Linear Algebra: block-based matrix > -- > > Key: FLINK-3920 > URL: https://issues.apache.org/jira/browse/FLINK-3920 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Simone Robutti >Assignee: Simone Robutti > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4128) compile error about git-commit-id-plugin
[ https://issues.apache.org/jira/browse/FLINK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354385#comment-15354385 ] Mao, Wei commented on FLINK-4128: - Thanks for the info. Then I think it's make sense to upgrade to latest stable version. > compile error about git-commit-id-plugin > > > Key: FLINK-4128 > URL: https://issues.apache.org/jira/browse/FLINK-4128 > Project: Flink > Issue Type: Bug >Reporter: Mao, Wei > > When I build with latest flink code, I got following error: > {quote} > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 01:06 h > [INFO] Finished at: 2016-06-28T22:11:58+08:00 > [INFO] Final Memory: 104M/3186M > [INFO] > > [ERROR] Failed to execute goal > pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project > flink-runtime_2.11: Execution default of goal > pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. > NullPointerException -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :flink-runtime_2.11 > {quote} > I think it's because wrong `doGetDirectory` value is provided. > And another question is if we should upgrade the version of this plugin, so > that we can got more meaningful error message instead of NPE. Eg: > {quote} > Could not get HEAD Ref, are you sure you have some commits in the > dotGitDirectory? > {quote} > Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no > longer supported with new version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4128) compile error about git-commit-id-plugin
[ https://issues.apache.org/jira/browse/FLINK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354215#comment-15354215 ] ASF GitHub Bot commented on FLINK-4128: --- Github user mwws commented on the issue: https://github.com/apache/flink/pull/2179 Yes, I am also confused about why it can compile before. But according to [archived flink-dev mail](http://mail-archives.apache.org/mod_mbox/flink-dev/201506.mbox/%3CCAJLORfdqMsJdNricb8WDSBrNWPzgAD=oqdWned1cL3KFB+i+=g...@mail.gmail.com%3E) Some people do have the same issue as me. > compile error about git-commit-id-plugin > > > Key: FLINK-4128 > URL: https://issues.apache.org/jira/browse/FLINK-4128 > Project: Flink > Issue Type: Bug >Reporter: Mao, Wei > > When I build with latest flink code, I got following error: > {quote} > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 01:06 h > [INFO] Finished at: 2016-06-28T22:11:58+08:00 > [INFO] Final Memory: 104M/3186M > [INFO] > > [ERROR] Failed to execute goal > pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project > flink-runtime_2.11: Execution default of goal > pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. > NullPointerException -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :flink-runtime_2.11 > {quote} > I think it's because wrong `doGetDirectory` value is provided. > And another question is if we should upgrade the version of this plugin, so > that we can got more meaningful error message instead of NPE. Eg: > {quote} > Could not get HEAD Ref, are you sure you have some commits in the > dotGitDirectory? > {quote} > Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no > longer supported with new version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2179: [FLINK-4128] compile error about git-commit-id-plugin
Github user mwws commented on the issue: https://github.com/apache/flink/pull/2179 Yes, I am also confused about why it can compile before. But according to [archived flink-dev mail](http://mail-archives.apache.org/mod_mbox/flink-dev/201506.mbox/%3CCAJLORfdqMsJdNricb8WDSBrNWPzgAD=oqdWned1cL3KFB+i+=g...@mail.gmail.com%3E) Some people do have the same issue as me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user fobeligi commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68848969 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen } /** +* Creates a graph from a Adjacency List text file with Vertex Key values. Edges will be created automatically. +* +* @param filePath a path to an Adjacency List text file with the Vertex data +* @param context the execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader}, +* on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph. +*/ + public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) { + return new GraphAdjacencyListReader(filePath, context); + } + + /** +* Writes a graph as an Adjacency List formatted text file in a user specified folder. +* +* @param filePath the path that the Adjacency List formatted text file should be written in +* @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text +* file. Delimiters should be provided with the following order: +* NEIGHBOR_DELIMITER : separating source from its neighbors +* VERTICES_DELIMITER : separating the different neighbors of a source vertex +* VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the +* target vertex-ids from the edge value. +*/ + public void writeAsAdjacencyList(String filePath, String... delimiters) { + + final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t"; + + final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ","; + + final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-"; --- End diff -- You mean the error in this declaration: ```java final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-"; ``` and not to check directly for length greater than two, because in that way the user will have to provide all three delimiters or none. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user fobeligi commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68848469 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen } /** +* Creates a graph from a Adjacency List text file with Vertex Key values. Edges will be created automatically. +* +* @param filePath a path to an Adjacency List text file with the Vertex data +* @param context the execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader}, +* on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph. +*/ + public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) { + return new GraphAdjacencyListReader(filePath, context); + } + + /** +* Writes a graph as an Adjacency List formatted text file in a user specified folder. +* +* @param filePath the path that the Adjacency List formatted text file should be written in +* @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text +* file. Delimiters should be provided with the following order: +* NEIGHBOR_DELIMITER : separating source from its neighbors +* VERTICES_DELIMITER : separating the different neighbors of a source vertex +* VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the +* target vertex-ids from the edge value. +*/ + public void writeAsAdjacencyList(String filePath, String... delimiters) { + + final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t"; + + final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ","; + + final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-"; + + + DataSet> vertices = this.getVerticesAsTuple2(); + + DataSet > edgesNValues = this.getEdgesAsTuple3(); --- End diff -- As I see now, we don't have to convert the vertex set to tuple2 set, so I already changed that. Regarding the edges dataset, in order to write the Adjacency List file, I use the coGroup transformation to the Vertex dataset and EdgesAsTuple3 dataset, where the vertexId equals the source of the edge. In that case, even when a Vertex is source to no edges (e.g. has only incoming edges), I can still have the vertexId in the "coGrouped" dataset (I couldn't do that with a join). I can't think how I could use the Edge dataset in a coGroup or similar transformation. Please let me know if you have any suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user fobeligi commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68846112 --- Diff: flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { * * @param analytic the analytic to run on the Graph */ - def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T]): - GraphAnalytic[K, VV, EV, T] = { + def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T])= { --- End diff -- No, I will revert the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353687#comment-15353687 ] ASF GitHub Bot commented on FLINK-3879: --- Github user vasia commented on the issue: https://github.com/apache/flink/pull/1967 Hey @greghogan, was there consensus regarding this change? I see the numbers, but did anyone review this PR? I've been offline for the past few days, and I now see that nobody reviewed #2160, #2079, #2067, #1997 either... I don't doubt that you have done a great job, but it is _always_ better to let someone review your code before you merge. We don't usually merge PRs without a +1 unless it is something trivial. I understand things move faster this way, but we are in a community and we should try to collaborate. Please, leave a comment next time you think a PR has stayed with no review for a long time or ping me personally if you want a 2nd pair of eyes on gelly stuff :) Thanks! > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #1967: [FLINK-3879] [gelly] Native implementation of HITS algori...
Github user vasia commented on the issue: https://github.com/apache/flink/pull/1967 Hey @greghogan, was there consensus regarding this change? I see the numbers, but did anyone review this PR? I've been offline for the past few days, and I now see that nobody reviewed #2160, #2079, #2067, #1997 either... I don't doubt that you have done a great job, but it is _always_ better to let someone review your code before you merge. We don't usually merge PRs without a +1 unless it is something trivial. I understand things move faster this way, but we are in a community and we should try to collaborate. Please, leave a comment next time you think a PR has stayed with no review for a long time or ping me personally if you want a 2nd pair of eyes on gelly stuff :) Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353650#comment-15353650 ] ASF GitHub Bot commented on FLINK-3879: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1967 > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3879. - Resolution: Fixed Implemented in 40749ddcd73c4634d81c2153f64e8934d519be3d > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #1967: [FLINK-3879] [gelly] Native implementation of HITS...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1967 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68832940 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen } /** +* Creates a graph from a Adjacency List text file with Vertex Key values. Edges will be created automatically. +* +* @param filePath a path to an Adjacency List text file with the Vertex data +* @param context the execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader}, +* on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph. +*/ + public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) { + return new GraphAdjacencyListReader(filePath, context); + } + + /** +* Writes a graph as an Adjacency List formatted text file in a user specified folder. +* +* @param filePath the path that the Adjacency List formatted text file should be written in +* @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text +* file. Delimiters should be provided with the following order: +* NEIGHBOR_DELIMITER : separating source from its neighbors +* VERTICES_DELIMITER : separating the different neighbors of a source vertex +* VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the +* target vertex-ids from the edge value. +*/ + public void writeAsAdjacencyList(String filePath, String... delimiters) { + + final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t"; + + final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ","; + + final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-"; + + + DataSet> vertices = this.getVerticesAsTuple2(); + + DataSet > edgesNValues = this.getEdgesAsTuple3(); --- End diff -- Do we need to convert the vertex and edge sets to tuples? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68832549 --- Diff: flink-libraries/flink-gelly-scala/src/main/scala/org/apache/flink/graph/scala/Graph.scala --- @@ -1127,8 +1194,7 @@ TypeInformation : ClassTag](jgraph: jg.Graph[K, VV, EV]) { * * @param analytic the analytic to run on the Graph */ - def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T]): - GraphAnalytic[K, VV, EV, T] = { + def run[T: TypeInformation : ClassTag](analytic: GraphAnalytic[K, VV, EV, T])= { --- End diff -- Was this change intended? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/2178#discussion_r68832491 --- Diff: flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/Graph.java --- @@ -408,6 +408,79 @@ public static GraphCsvReader fromCsvReader(String edgesPath, ExecutionEnvironmen } /** +* Creates a graph from a Adjacency List text file with Vertex Key values. Edges will be created automatically. +* +* @param filePath a path to an Adjacency List text file with the Vertex data +* @param context the execution environment. +* @return An instance of {@link org.apache.flink.graph.GraphAdjacencyListReader}, +* on which calling methods to specify types of the Vertex ID, Vertex value and Edge value returns a Graph. +*/ + public static GraphAdjacencyListReader fromAdjacencyListFile(String filePath, ExecutionEnvironment context) { + return new GraphAdjacencyListReader(filePath, context); + } + + /** +* Writes a graph as an Adjacency List formatted text file in a user specified folder. +* +* @param filePath the path that the Adjacency List formatted text file should be written in +* @param delimiters the delimiters that separate the different value types in the Adjacency List formatted text +* file. Delimiters should be provided with the following order: +* NEIGHBOR_DELIMITER : separating source from its neighbors +* VERTICES_DELIMITER : separating the different neighbors of a source vertex +* VERTEX_VALUE_DELIMITER: separating the source vertex-id from the vertex value, as well as the +* target vertex-ids from the edge value. +*/ + public void writeAsAdjacencyList(String filePath, String... delimiters) { + + final String NEIGHBOR_DELIMITER = delimiters.length > 0 ? delimiters[0] : "\t"; + + final String VERTICES_DELIMITER = delimiters.length > 1 ? delimiters[1] : ","; + + final String VERTEX_VALUE_DELIMITER = delimiters.length > 1 ? delimiters[2] : "-"; --- End diff -- Test length against "2". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-3277) Use Value types in Gelly API
[ https://issues.apache.org/jira/browse/FLINK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3277. - Resolution: Fixed Implemented in 40749ddcd73c4634d81c2153f64e8934d519be3d > Use Value types in Gelly API > > > Key: FLINK-3277 > URL: https://issues.apache.org/jira/browse/FLINK-3277 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > This would be a breaking change so the discussion needs to happen before the > 1.0.0 release. > I think it would benefit Flink to use {{Value}} types wherever possible. The > {{Graph}} functions {{inDegrees}}, {{outDegrees}}, and {{getDegrees}} each > return {{DataSet>}}. Using {{Long}} creates a new heap object > for every serialization and deserialization. The mutable {{Value}} types do > not suffer from this issue when object reuse is enabled. > I lean towards a preference for conciseness in documentation and performance > in examples and APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3277) Use Value types in Gelly API
[ https://issues.apache.org/jira/browse/FLINK-3277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353602#comment-15353602 ] ASF GitHub Bot commented on FLINK-3277: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1671 > Use Value types in Gelly API > > > Key: FLINK-3277 > URL: https://issues.apache.org/jira/browse/FLINK-3277 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.0.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > This would be a breaking change so the discussion needs to happen before the > 1.0.0 release. > I think it would benefit Flink to use {{Value}} types wherever possible. The > {{Graph}} functions {{inDegrees}}, {{outDegrees}}, and {{getDegrees}} each > return {{DataSet>}}. Using {{Long}} creates a new heap object > for every serialization and deserialization. The mutable {{Value}} types do > not suffer from this issue when object reuse is enabled. > I lean towards a preference for conciseness in documentation and performance > in examples and APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #1671: [FLINK-3277] Use Value types in Gelly API
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1671 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-4129) HITSAlgorithm should test for element-wise convergence
Greg Hogan created FLINK-4129: - Summary: HITSAlgorithm should test for element-wise convergence Key: FLINK-4129 URL: https://issues.apache.org/jira/browse/FLINK-4129 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 1.1.0 Reporter: Greg Hogan Priority: Minor {{HITSAlgorithm}} tests for convergence by summing the difference of each authority score minus the average score. This is simply comparing the sum of scores against the previous sum of scores which is not a good test for convergence. {code} // count the diff value of sum of authority scores diffSumAggregator.aggregate(previousAuthAverage - newAuthorityValue.getValue()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4127) Clean up configuration and check breaking API changes
[ https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353525#comment-15353525 ] ASF GitHub Bot commented on FLINK-4127: --- Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/2177#discussion_r68819516 --- Diff: docs/setup/config.md --- @@ -85,6 +85,8 @@ The default fraction for managed memory can be adjusted using the `taskmanager.m - `taskmanager.memory.preallocate`: Can be either of `true` or `false`. Specifies whether task managers should allocate all managed memory when starting up. (DEFAULT: false) +- `taskmanager.runtime.large-record-handler`: Whether to use the LargeRecordHandler when spilling. This feature is experimental. (DEFAULT: false) --- End diff -- Have we documented when it would be useful to enable `LargeRecordHandler`? > Clean up configuration and check breaking API changes > - > > Key: FLINK-4127 > URL: https://issues.apache.org/jira/browse/FLINK-4127 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 1.1.0 > > Attachments: flink-core.html, flink-java.html, flink-scala.html, > flink-streaming-java.html, flink-streaming-scala.html > > > For the upcoming 1.1. release, I'll check if there are any breaking API > changes and if the documentation is up tp date with the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2177: [FLINK-4127] Check API compatbility for 1.1 in fli...
Github user greghogan commented on a diff in the pull request: https://github.com/apache/flink/pull/2177#discussion_r68819516 --- Diff: docs/setup/config.md --- @@ -85,6 +85,8 @@ The default fraction for managed memory can be adjusted using the `taskmanager.m - `taskmanager.memory.preallocate`: Can be either of `true` or `false`. Specifies whether task managers should allocate all managed memory when starting up. (DEFAULT: false) +- `taskmanager.runtime.large-record-handler`: Whether to use the LargeRecordHandler when spilling. This feature is experimental. (DEFAULT: false) --- End diff -- Have we documented when it would be useful to enable `LargeRecordHandler`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2179: [Flink-4128] fix flink-runtime compile error about git-co...
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/2179 The modified line is ancient code. It's not clear why this is necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3477) Add hash-based combine strategy for ReduceFunction
[ https://issues.apache.org/jira/browse/FLINK-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353454#comment-15353454 ] ASF GitHub Bot commented on FLINK-3477: --- Github user greghogan commented on the issue: https://github.com/apache/flink/pull/1517 +1 with just a few superficial comments. Reading back through the discussion I see that there are many ideas for future performance enhancements. If not already suggested I'd like to consider skipping staging for fixed length records. I'm missing why we can't update in place with smaller records. The deserializer is responsible for detecting the end of the record and we wouldn't need to change the pointer value when replacing with a smaller record. CombineHint.NONE can be implemented in a new PR since this looks to be ready as-is. > Add hash-based combine strategy for ReduceFunction > -- > > Key: FLINK-3477 > URL: https://issues.apache.org/jira/browse/FLINK-3477 > Project: Flink > Issue Type: Sub-task > Components: Local Runtime >Reporter: Fabian Hueske >Assignee: Gabor Gevay > > This issue is about adding a hash-based combine strategy for ReduceFunctions. > The interface of the {{reduce()}} method is as follows: > {code} > public T reduce(T v1, T v2) > {code} > Input type and output type are identical and the function returns only a > single value. A Reduce function is incrementally applied to compute a final > aggregated value. This allows to hold the preaggregated value in a hash-table > and update it with each function call. > The hash-based strategy requires special implementation of an in-memory hash > table. The hash table should support in place updates of elements (if the > updated value has the same size as the new value) but also appending updates > with invalidation of the old value (if the binary length of the new value > differs). The hash table needs to be able to evict and emit all elements if > it runs out-of-memory. > We should also add {{HASH}} and {{SORT}} compiler hints to > {{DataSet.reduce()}} and {{Grouping.reduce()}} to allow users to pick the > execution strategy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #1517: [FLINK-3477] [runtime] Add hash-based combine strategy fo...
Github user greghogan commented on the issue: https://github.com/apache/flink/pull/1517 +1 with just a few superficial comments. Reading back through the discussion I see that there are many ideas for future performance enhancements. If not already suggested I'd like to consider skipping staging for fixed length records. I'm missing why we can't update in place with smaller records. The deserializer is responsible for detecting the end of the record and we wouldn't need to change the pointer value when replacing with a smaller record. CombineHint.NONE can be implemented in a new PR since this looks to be ready as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3674) Add an interface for EventTime aware User Function
[ https://issues.apache.org/jira/browse/FLINK-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353430#comment-15353430 ] ramkrishna.s.vasudevan commented on FLINK-3674: --- So in all the Stream UDF implementations if we are checking if the userFunction is an instance of the new Interface 'EventTime', call the new API in that interface? And call the new API in #processWatermark(WaterMark) flow. > Add an interface for EventTime aware User Function > -- > > Key: FLINK-3674 > URL: https://issues.apache.org/jira/browse/FLINK-3674 > Project: Flink > Issue Type: New Feature > Components: Streaming >Affects Versions: 1.0.0 >Reporter: Stephan Ewen > > I suggest to add an interface that UDFs can implement, which will let them be > notified upon watermark updates. > Example usage: > {code} > public interface EventTimeFunction { > void onWatermark(Watermark watermark); > } > public class MyMapper implements MapFunction, > EventTimeFunction { > private long currentEventTime = Long.MIN_VALUE; > public String map(String value) { > return value + " @ " + currentEventTime; > } > public void onWatermark(Watermark watermark) { > currentEventTime = watermark.getTimestamp(); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353421#comment-15353421 ] Greg Hogan commented on FLINK-3879: --- Using CombineHint.HASH the HITS timings dropped from/to: scale 10: 1,034 ms -> 1,106 ms scale 12: 1,115 ms -> 1,090 ms scale 14: 1,974 ms -> 1,608 ms scale 16: 5,843 ms -> 3,841 ms scale 18: 21,927 ms -> 13,345 ms scale 20: 93,488 ms -> 54,570 ms Impressive [~ggevay]! > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4128) compile error about git-commit-id-plugin
Mao, Wei created FLINK-4128: --- Summary: compile error about git-commit-id-plugin Key: FLINK-4128 URL: https://issues.apache.org/jira/browse/FLINK-4128 Project: Flink Issue Type: Bug Reporter: Mao, Wei When I build with latest flink code, I got following error: {quote} [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:06 h [INFO] Finished at: 2016-06-28T22:11:58+08:00 [INFO] Final Memory: 104M/3186M [INFO] [ERROR] Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project flink-runtime_2.11: Execution default of goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. NullPointerException -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :flink-runtime_2.11 {quote} I think it's because wrong `doGetDirectory` value is provided. And another question is if we should upgrade the version of this plugin, so that we can got more meaningful error message instead of NPE. Eg: {quote} Could not get HEAD Ref, are you sure you have some commits in the dotGitDirectory? {quote} Current stable version is 2.2.1, but the disadvantage is that Java 1.6 is no longer supported with new version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2179: [Flink-4128]fix flink-runtime compile error about ...
GitHub user mwws opened a pull request: https://github.com/apache/flink/pull/2179 [Flink-4128]fix flink-runtime compile error about git-commit-id-plugin When I build with latest flink code, I got following error: > [INFO] > [INFO] BUILD FAILURE > [INFO] > [INFO] Total time: 01:06 h > [INFO] Finished at: 2016-06-28T22:11:58+08:00 > [INFO] Final Memory: 104M/3186M > [INFO] > [ERROR] Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project flink-runtime_2.11: Execution default of goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. NullPointerException -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please read the following articles: > [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the command > [ERROR] mvn -rf :flink-runtime_2.11 I think it's because wrong `doGetDirectory` value is provided. I have fix it and compile is fine for me now. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mwws/flink git-plugin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2179.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2179 commit ac2bccfd416e5ce5343c27d9b8c33959271977b3 Author: unknownDate: 2016-06-28T16:50:26Z fix flink runtime compile error about git-commit-id-plugin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3879) Native implementation of HITS algorithm
[ https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353348#comment-15353348 ] Greg Hogan commented on FLINK-3879: --- I had not realized pr1517 was defaulting to the old sort-based combine rather than the new hash-based combine. > Native implementation of HITS algorithm > --- > > Key: FLINK-3879 > URL: https://issues.apache.org/jira/browse/FLINK-3879 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is > presented in [0] and described in [1]. > "[HITS] is a very popular and effective algorithm to rank documents based on > the link information among a set of documents. The algorithm presumes that a > good hub is a document that points to many others, and a good authority is a > document that many documents point to." > [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf] > This implementation differs from FLINK-2044 by providing for convergence, > outputting both hub and authority scores, and completing in half the number > of iterations. > [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf > [1] https://en.wikipedia.org/wiki/HITS_algorithm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2178: [Flink-1815] Add methods to read and write a Graph...
GitHub user fobeligi opened a pull request: https://github.com/apache/flink/pull/2178 [Flink-1815] Add methods to read and write a Graph as adjacency list Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [ ] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [ ] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [ ] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/fobeligi/incubator-flink FLINK-1815 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2178 commit 3a9502da61b7758e1383803d5141a16fe3a5777a Author: fobeligiDate: 2016-06-22T16:11:23Z [FLINK-1815] Add GraphAdjacencyListReader class to read an Adjacency List formatted text file. Moreover, add writeAsAdjacencyList method to Graph. Test cases are also added for each new method. commit 8aab5b40e031b132c46782a5908d58cc6290892f Author: fobeligi Date: 2016-06-28T08:49:03Z [FLINK-1815] Add fromAdjacencyListFile and writeAsAdjacencyList methods to Graph scala API. Tests are also added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2146 I moved the checkpoint metrics into the Tracker (and reverted the changed to ExecutionGraph). Currently trying it out locally. Regarding the exception catching in the metrics: I can't decide whether we should try to write all metrics in such a way that they can't throw exceptions, or write the reporters in such a way that they can deal with it. (usually by logging the exception). The first option is safer considering custom reporters, but the second will allows us to properly log them. Regarding a test: While i agree that such a test would be nice i can only come up with icky ways to test it. You have to access the metric _while a job is running_ as they are removed afterwards. So you either have to submit a job that blocks until _something_ happens, or you add a reporter that feeds that information back to the test _somehow_. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager
[ https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353280#comment-15353280 ] ASF GitHub Bot commented on FLINK-1550: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/2146 I moved the checkpoint metrics into the Tracker (and reverted the changed to ExecutionGraph). Currently trying it out locally. Regarding the exception catching in the metrics: I can't decide whether we should try to write all metrics in such a way that they can't throw exceptions, or write the reporters in such a way that they can deal with it. (usually by logging the exception). The first option is safer considering custom reporters, but the second will allows us to properly log them. Regarding a test: While i agree that such a test would be nice i can only come up with icky ways to test it. You have to access the metric _while a job is running_ as they are removed afterwards. So you either have to submit a job that blocks until _something_ happens, or you add a reporter that feeds that information back to the test _somehow_. > Show JVM Metrics for JobManager > --- > > Key: FLINK-1550 > URL: https://issues.apache.org/jira/browse/FLINK-1550 > Project: Flink > Issue Type: Sub-task > Components: JobManager, Metrics >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Fix For: pre-apache > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353092#comment-15353092 ] ASF GitHub Bot commented on FLINK-4085: --- Github user tzulitai commented on the issue: https://github.com/apache/flink/pull/2175 LGTM ;) > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4127) Clean up configuration and check breaking API changes
[ https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353027#comment-15353027 ] ASF GitHub Bot commented on FLINK-4127: --- GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/2177 [FLINK-4127] Check API compatbility for 1.1 in flink-core I checked all the newly introduced methods in public APIs by going through the reports generated from japicmp. I've also put the reports (before my PR) into the JIRA: https://issues.apache.org/jira/browse/FLINK-4127 I added the new configuration parameters to the documentation, and renamed some new configuration keys. @uce @tillrohrmann @mxm: What do you think about the renaming of the configuration keys? You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink4127 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2177 commit b4073b93c9be271068faa97a139b20b9e9d6e356 Author: Robert MetzgerDate: 2016-06-28T13:12:05Z [FLINK-4127] Check API compatbility for 1.1 in flink-core > Clean up configuration and check breaking API changes > - > > Key: FLINK-4127 > URL: https://issues.apache.org/jira/browse/FLINK-4127 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 1.1.0 > > Attachments: flink-core.html, flink-java.html, flink-scala.html, > flink-streaming-java.html, flink-streaming-scala.html > > > For the upcoming 1.1. release, I'll check if there are any breaking API > changes and if the documentation is up tp date with the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2177: [FLINK-4127] Check API compatbility for 1.1 in fli...
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/2177 [FLINK-4127] Check API compatbility for 1.1 in flink-core I checked all the newly introduced methods in public APIs by going through the reports generated from japicmp. I've also put the reports (before my PR) into the JIRA: https://issues.apache.org/jira/browse/FLINK-4127 I added the new configuration parameters to the documentation, and renamed some new configuration keys. @uce @tillrohrmann @mxm: What do you think about the renaming of the configuration keys? You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink4127 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2177 commit b4073b93c9be271068faa97a139b20b9e9d6e356 Author: Robert MetzgerDate: 2016-06-28T13:12:05Z [FLINK-4127] Check API compatbility for 1.1 in flink-core --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2352) [Graph Visualization] Integrate Gelly with Gephi
[ https://issues.apache.org/jira/browse/FLINK-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352996#comment-15352996 ] Greg Hogan commented on FLINK-2352: --- I do not think that this ticket is a good fit for Flink. Visualization of results is better left to external projects such as Apache Zeppelin which can process output from multiple data engines. > [Graph Visualization] Integrate Gelly with Gephi > > > Key: FLINK-2352 > URL: https://issues.apache.org/jira/browse/FLINK-2352 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10.0 >Reporter: Andra Lungu > > This integration will allow users to see the real-time progress of their > graph. They could also visually verify results for clustering algorithms, for > example. Gephi is free/open-source and provides support for all types of > networks, including dynamic and hierarchical graphs. > A first step would be to add the Gephi Toolkit to the pom.xml. > https://github.com/gephi/gephi-toolkit > Afterwards, a GraphBuilder similar to this one > https://github.com/palmerabollo/test-twitter-graph/blob/master/src/main/java/es/guido/twitter/graph/GraphBuilder.java > can be implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352988#comment-15352988 ] ASF GitHub Bot commented on FLINK-4085: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68757090 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { --- End diff -- Will undo this as well > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352989#comment-15352989 ] ASF GitHub Bot commented on FLINK-4085: --- Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2175 Thank you for the review. It seems that I'm not that focused today ;) > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2175: [FLINK-4085][Kinesis] Set Flink-specific user agent
Github user rmetzger commented on the issue: https://github.com/apache/flink/pull/2175 Thank you for the review. It seems that I'm not that focused today ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352986#comment-15352986 ] ASF GitHub Bot commented on FLINK-4085: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68757009 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { + // set specific user agent + config.setUserAgent("Apache Flink " + EnvironmentInformation.getVersion() + " (" + EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis Connector"); + } AmazonKinesisClient client = new AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials()); --- End diff -- oh, indeed ;) > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68757090 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { --- End diff -- Will undo this as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68757009 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { + // set specific user agent + config.setUserAgent("Apache Flink " + EnvironmentInformation.getVersion() + " (" + EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis Connector"); + } AmazonKinesisClient client = new AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials()); --- End diff -- oh, indeed ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2176: [FLINK-4118] The docker-flink image is outdated (1.0.2) a...
Github user iemejia commented on the issue: https://github.com/apache/flink/pull/2176 Sorry I had to rebase my previous PR but this is the definitive one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down
[ https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352966#comment-15352966 ] ASF GitHub Bot commented on FLINK-4118: --- Github user iemejia commented on the issue: https://github.com/apache/flink/pull/2176 Sorry I had to rebase my previous PR but this is the definitive one. > The docker-flink image is outdated (1.0.2) and can be slimmed down > -- > > Key: FLINK-4118 > URL: https://issues.apache.org/jira/browse/FLINK-4118 > Project: Flink > Issue Type: Improvement >Reporter: Ismaël MejÃa >Priority: Minor > > This issue is to upgrade the docker image and polish some details in it (e.g. > it can be slimmed down if we remove some unneeded dependencies, and the code > can be polished). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352961#comment-15352961 ] ASF GitHub Bot commented on FLINK-4085: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68752991 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { --- End diff -- Why do we have to check if it equals the default user agent before setting it to Flink? > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68752991 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { --- End diff -- Why do we have to check if it equals the default user agent before setting it to Flink? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352957#comment-15352957 ] ASF GitHub Bot commented on FLINK-4085: --- Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68752667 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { + // set specific user agent + config.setUserAgent("Apache Flink " + EnvironmentInformation.getVersion() + " (" + EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis Connector"); + } AmazonKinesisClient client = new AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials()); --- End diff -- The client isn't using the new `ClientConfiguration`. > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...
Github user tzulitai commented on a diff in the pull request: https://github.com/apache/flink/pull/2175#discussion_r68752667 --- Diff: flink-streaming-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java --- @@ -73,7 +76,14 @@ public KinesisProxy(Properties configProps) { this.configProps = checkNotNull(configProps); this.regionId = configProps.getProperty(KinesisConfigConstants.CONFIG_AWS_REGION); + ClientConfigurationFactory configurationFactory = new ClientConfigurationFactory(); + ClientConfiguration config = configurationFactory.getConfig(); + if(config.getUserAgent().equals(ClientConfiguration.DEFAULT_USER_AGENT)) { + // set specific user agent + config.setUserAgent("Apache Flink " + EnvironmentInformation.getVersion() + " (" + EnvironmentInformation.getRevisionInformation().commitId + ") Kinesis Connector"); + } AmazonKinesisClient client = new AmazonKinesisClient(AWSUtil.getCredentialsProvider(configProps).getCredentials()); --- End diff -- The client isn't using the new `ClientConfiguration`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4118) The docker-flink image is outdated (1.0.2) and can be slimmed down
[ https://issues.apache.org/jira/browse/FLINK-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352950#comment-15352950 ] ASF GitHub Bot commented on FLINK-4118: --- Github user iemejia commented on the issue: https://github.com/apache/flink/pull/2176 The docker images script was simplified and the image size was reduced. Previous image: flink latest 6475add651c7 24 minutes ago 711.6 MB Image after FLINK-4118 flink latest 555e60f24c10 20 seconds ago 252.5 MB > The docker-flink image is outdated (1.0.2) and can be slimmed down > -- > > Key: FLINK-4118 > URL: https://issues.apache.org/jira/browse/FLINK-4118 > Project: Flink > Issue Type: Improvement >Reporter: Ismaël MejÃa >Priority: Minor > > This issue is to upgrade the docker image and polish some details in it (e.g. > it can be slimmed down if we remove some unneeded dependencies, and the code > can be polished). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2176: Flink 4118
Github user iemejia commented on the issue: https://github.com/apache/flink/pull/2176 The docker images script was simplified and the image size was reduced. Previous image: flink latest 6475add651c7 24 minutes ago 711.6 MB Image after FLINK-4118 flink latest 555e60f24c10 20 seconds ago 252.5 MB --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2176: Flink 4118
GitHub user iemejia opened a pull request: https://github.com/apache/flink/pull/2176 Flink 4118 Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [x] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [x] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [x] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/iemejia/flink FLINK-4118 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2176 commit 7f7f24a1c31aa3e9b011fb492280389ac500fff4 Author: Ismaël MejÃÂaDate: 2016-06-23T16:39:50Z [FLINK-4118] Update docker image to 1.0.3 and remove unneeded deps Some of the changes include: - Remove unneeded dependencies (nano, wget) - Remove apt lists to reduce image size - Reduce number of layers on the docker image (best docker practice) - Remove useless variables and base the code in generic ones e.g. FLINK_HOME - Change the default JDK from oracle to openjdk-8-jre-headless, based on two reasons: 1. You cannot legally repackage the oracle jdk in docker images 2. The open-jdk headless is more appropriate for a server image (no GUI stuff) - Return port assignation to the standard FLINK one: Variable: docker-flink -> flink taskmanager.rpc.port: 6121 -> 6122 taskmanager.data.port: 6122 -> 6121 jobmanager.web.port: 8080 -> 8081 commit 04c18edbbb6b109c1b23c28c17f82bde080b8686 Author: Ismaël MejÃÂa Date: 2016-06-24T14:52:22Z [FLINK-4118] Base the image on the official java alpine and remove ssh commit eff306e80ffe605f6f94d2b5764520fe6462b1f7 Author: Ismaël MejÃÂa Date: 2016-06-27T11:23:10Z [FLINK-4118] Remove unused configuration files and fix README commit e386b5e99c086e3279454a3e574edc7a02c54cdb Author: Ismaël MejÃÂa Date: 2016-06-27T15:01:13Z [FLINK-4118] Update compose to v2 + add new entrypoint for direct execution commit 34f18849cd610199805bc396b2ac850242f5ab6d Author: Ismaël MejÃÂa Date: 2016-06-27T16:52:48Z [FLINK-4118] Change default entrypoint so users can run flink as a client too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352937#comment-15352937 ] ASF GitHub Bot commented on FLINK-4084: --- Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68750010 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- Added chained version, the second command runs only if the first was successful; i think it is a more correct behaviour. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68750010 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- Added chained version, the second command runs only if the first was successful; i think it is a more correct behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on the issue: https://github.com/apache/flink/pull/2154 Thanks for updating/fixing the documentation @aljoscha ! I had some comments that I posted here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352922#comment-15352922 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on the issue: https://github.com/apache/flink/pull/2154 Thanks for updating/fixing the documentation @aljoscha ! I had some comments that I posted here. > Update Windowing Documentation > -- > > Key: FLINK-4062 > URL: https://issues.apache.org/jira/browse/FLINK-4062 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Affects Versions: 1.1.0 >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > The window documentation could be a bit more principled and also needs > updating with the new allowed lateness setting. > There is also essentially no documentation about how to write a custom > trigger. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3034) Redis SInk Connector
[ https://issues.apache.org/jira/browse/FLINK-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352920#comment-15352920 ] ASF GitHub Bot commented on FLINK-3034: --- Github user subhankarb commented on the issue: https://github.com/apache/flink/pull/1813 @mjsax , @rmetzger plz review. The changed model is described in the PR description. thanks, subhankar > Redis SInk Connector > > > Key: FLINK-3034 > URL: https://issues.apache.org/jira/browse/FLINK-3034 > Project: Flink > Issue Type: New Feature > Components: Streaming Connectors >Reporter: Matthias J. Sax >Assignee: Subhankar Biswas >Priority: Minor > > Flink does not provide a sink connector for Redis. > See FLINK-3033 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #1813: [FLINK-3034] Redis Sink Connector
Github user subhankarb commented on the issue: https://github.com/apache/flink/pull/1813 @mjsax , @rmetzger plz review. The changed model is described in the PR description. thanks, subhankar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352914#comment-15352914 ] ASF GitHub Bot commented on FLINK-4084: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68746527 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- I suppose it doesn't matter much but I like the chained version better. > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68746527 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- I suppose it doesn't matter much but I like the chained version better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-4127) Clean up configuration and check breaking API changes
[ https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4127: -- Attachment: flink-streaming-scala.html flink-streaming-java.html flink-scala.html I added the Japicmp reports for the covered modules > Clean up configuration and check breaking API changes > - > > Key: FLINK-4127 > URL: https://issues.apache.org/jira/browse/FLINK-4127 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 1.1.0 > > Attachments: flink-core.html, flink-java.html, flink-scala.html, > flink-streaming-java.html, flink-streaming-scala.html > > > For the upcoming 1.1. release, I'll check if there are any breaking API > changes and if the documentation is up tp date with the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352896#comment-15352896 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744858 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352897#comment-15352897 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2174 Thanks for fixing this! I had some comments but they should be easy to address. > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITCase fai...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2174 Thanks for fixing this! I had some comments but they should be easy to address. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744858 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352894#comment-15352894 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744802 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[GitHub] flink issue #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2146 I've tested the PR locally and the JobManager metrics were shown properly. Good job @zentol :-) Thus, there are only a test case for the job manager metrics, the question whether to use the `CheckpointStatsTracker` or not for the checkpoint metrics and the try-catch blocks around the `con.getAttribute` calls left. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-4127) Clean up configuration and check breaking API changes
[ https://issues.apache.org/jira/browse/FLINK-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4127: -- Attachment: flink-java.html flink-core.html > Clean up configuration and check breaking API changes > - > > Key: FLINK-4127 > URL: https://issues.apache.org/jira/browse/FLINK-4127 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 1.1.0 > > Attachments: flink-core.html, flink-java.html > > > For the upcoming 1.1. release, I'll check if there are any breaking API > changes and if the documentation is up tp date with the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager
[ https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352895#comment-15352895 ] ASF GitHub Bot commented on FLINK-1550: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2146 I've tested the PR locally and the JobManager metrics were shown properly. Good job @zentol :-) Thus, there are only a test case for the job manager metrics, the question whether to use the `CheckpointStatsTracker` or not for the checkpoint metrics and the try-catch blocks around the `con.getAttribute` calls left. > Show JVM Metrics for JobManager > --- > > Key: FLINK-1550 > URL: https://issues.apache.org/jira/browse/FLINK-1550 > Project: Flink > Issue Type: Sub-task > Components: JobManager, Metrics >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Fix For: pre-apache > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744802 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[jira] [Commented] (FLINK-1550) Show JVM Metrics for JobManager
[ https://issues.apache.org/jira/browse/FLINK-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352889#comment-15352889 ] ASF GitHub Bot commented on FLINK-1550: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2146#discussion_r68744357 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java --- @@ -663,6 +677,9 @@ public boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws E LOG.info("Completed checkpoint " + checkpointId + " (in " + completed.getDuration() + " ms)"); + lastCheckpointSize = completed.getStateSize(); + lastCheckpointDuration = completed.getDuration(); --- End diff -- Would it make sense to use the `CheckpointStatsTracker` to obtain the values for the last checkpoint? Given that the JobManager is not so high performance critical with respect to the calculation of the metrics. > Show JVM Metrics for JobManager > --- > > Key: FLINK-1550 > URL: https://issues.apache.org/jira/browse/FLINK-1550 > Project: Flink > Issue Type: Sub-task > Components: JobManager, Metrics >Reporter: Robert Metzger >Assignee: Chesnay Schepler > Fix For: pre-apache > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-3898) Adamic-Adar Similarity
[ https://issues.apache.org/jira/browse/FLINK-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-3898. - Resolution: Fixed Implemented in f9552d8dc25564405b37f3f0b6ac9addf497b722 > Adamic-Adar Similarity > -- > > Key: FLINK-3898 > URL: https://issues.apache.org/jira/browse/FLINK-3898 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.1.0 > > > The implementation of Adamic-Adar Similarity [0] is very close to Jaccard > Similarity. Whereas Jaccard Similarity counts common neighbors, Adamic-Adar > Similarity sums the inverse logarithm of the degree of common neighbors. > Consideration will be given to the computation of the inverse logarithm, in > particular whether to pre-compute a small array of values. > [0] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352887#comment-15352887 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744340 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352886#comment-15352886 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744278 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -191,11 +197,11 @@ public void close() throws Exception { private volatile boolean isSplitOpen = false; - SplitReader(FileInputFormat format, + private SplitReader(FileInputFormat format, TypeSerializer serializer, TimestampedCollector collector, Object checkpointLock, - Tuple3restoredState) { + Tuple3
recoveredState) { --- End diff -- I think the old name was fine. In other parts of Flink it's also called restored state. > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2146: [FLINK-1550/FLINK-4057] Add JobManager Metrics
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/2146#discussion_r68744357 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinator.java --- @@ -663,6 +677,9 @@ public boolean receiveAcknowledgeMessage(AcknowledgeCheckpoint message) throws E LOG.info("Completed checkpoint " + checkpointId + " (in " + completed.getDuration() + " ms)"); + lastCheckpointSize = completed.getStateSize(); + lastCheckpointDuration = completed.getDuration(); --- End diff -- Would it make sense to use the `CheckpointStatsTracker` to obtain the values for the last checkpoint? Given that the JobManager is not so high performance critical with respect to the calculation of the metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68744340 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744213 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -405,6 +414,9 @@ public void restoreState(StreamTaskState state, long recoveryTimestamp) throws E S formatState = (S) ois.readObject(); // set the whole reader state for the open() to find. + Preconditions.checkArgument(this.readerState == null, --- End diff -- This should also be `Preconditions.checkState` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744278 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -191,11 +197,11 @@ public void close() throws Exception { private volatile boolean isSplitOpen = false; - SplitReader(FileInputFormat format, + private SplitReader(FileInputFormat format, TypeSerializer serializer, TimestampedCollector collector, Object checkpointLock, - Tuple3restoredState) { + Tuple3
recoveredState) { --- End diff -- I think the old name was fine. In other parts of Flink it's also called restored state. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352884#comment-15352884 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744213 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -405,6 +414,9 @@ public void restoreState(StreamTaskState state, long recoveryTimestamp) throws E S formatState = (S) ois.readObject(); // set the whole reader state for the open() to find. + Preconditions.checkArgument(this.readerState == null, --- End diff -- This should also be `Preconditions.checkState` > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352881#comment-15352881 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744158 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -104,7 +104,13 @@ public void open() throws Exception { this.collector = new TimestampedCollector<>(output); this.checkpointLock = getContainingTask().getCheckpointLock(); + Preconditions.checkArgument(reader == null, "The reader is already initialized."); + this.reader = new SplitReader<>(format, serializer, collector, checkpointLock, readerState); + + // after initializing the reader, set the state to recovered state --- End diff -- Could you please clarify this sentence. I think something might be missing. > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68744158 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -104,7 +104,13 @@ public void open() throws Exception { this.collector = new TimestampedCollector<>(output); this.checkpointLock = getContainingTask().getCheckpointLock(); + Preconditions.checkArgument(reader == null, "The reader is already initialized."); + this.reader = new SplitReader<>(format, serializer, collector, checkpointLock, readerState); + + // after initializing the reader, set the state to recovered state --- End diff -- Could you please clarify this sentence. I think something might be missing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITC...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68743982 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -104,7 +104,13 @@ public void open() throws Exception { this.collector = new TimestampedCollector<>(output); this.checkpointLock = getContainingTask().getCheckpointLock(); + Preconditions.checkArgument(reader == null, "The reader is already initialized."); --- End diff -- I think this should be `Preconditions.checkState()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352878#comment-15352878 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/2174#discussion_r68743982 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileReaderOperator.java --- @@ -104,7 +104,13 @@ public void open() throws Exception { this.collector = new TimestampedCollector<>(output); this.checkpointLock = getContainingTask().getCheckpointLock(); + Preconditions.checkArgument(reader == null, "The reader is already initialized."); --- End diff -- I think this should be `Preconditions.checkState()`. > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4075) ContinuousFileProcessingCheckpointITCase failed on Travis
[ https://issues.apache.org/jira/browse/FLINK-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352877#comment-15352877 ] ASF GitHub Bot commented on FLINK-4075: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2174 I think having `splitsToFwdOrderedAscByModTime` and `currentSplitsToFwd` as fields that are checkpointed is no longer necessary since `monitorDirAndForwardSplits()` is called in lock scope and those two fields are always set to `null` at the end of the method. > ContinuousFileProcessingCheckpointITCase failed on Travis > - > > Key: FLINK-4075 > URL: https://issues.apache.org/jira/browse/FLINK-4075 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Kostas Kloudas >Priority: Critical > Labels: test-stability > > The test case {{ContinuousFileProcessingCheckpointITCase}} failed on Travis. > https://s3.amazonaws.com/archive.travis-ci.org/jobs/137748004/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2174: [FLINK-4075] ContinuousFileProcessingCheckpointITCase fai...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/2174 I think having `splitsToFwdOrderedAscByModTime` and `currentSplitsToFwd` as fields that are checkpointed is no longer necessary since `monitorDirAndForwardSplits()` is called in lock scope and those two fields are always set to `null` at the end of the method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-4127) Clean up configuration and check breaking API changes
Robert Metzger created FLINK-4127: - Summary: Clean up configuration and check breaking API changes Key: FLINK-4127 URL: https://issues.apache.org/jira/browse/FLINK-4127 Project: Flink Issue Type: Improvement Reporter: Robert Metzger Fix For: 1.1.0 For the upcoming 1.1. release, I'll check if there are any breaking API changes and if the documentation is up tp date with the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352872#comment-15352872 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68743505 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352871#comment-15352871 ] ASF GitHub Bot commented on FLINK-4084: --- Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68743480 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then --- End diff -- There is already one, we need two? > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68743505 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68743480 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then --- End diff -- There is already one, we need two? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352865#comment-15352865 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68743168 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[GitHub] flink pull request #2149: [FLINK-4084] Add configDir parameter to CliFronten...
Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68743195 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- I found a similar command [here](https://github.com/alkagin/flink/blob/76283d3f0fbf5454b208ba1887d847acea09640d/flink-dist/src/main/flink-bin/bin/config.sh#L132). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4084) Add configDir parameter to CliFrontend and flink shell script
[ https://issues.apache.org/jira/browse/FLINK-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352867#comment-15352867 ] ASF GitHub Bot commented on FLINK-4084: --- Github user alkagin commented on a diff in the pull request: https://github.com/apache/flink/pull/2149#discussion_r68743195 --- Diff: flink-dist/src/main/flink-bin/bin/flink --- @@ -17,20 +17,41 @@ # limitations under the License. -target="$0" # For the case, the executable has been directly symlinked, figure out # the correct bin path by following its symlink up to an upper bound. # Note: we can't use the readlink utility here if we want to be POSIX # compatible. -iteration=0 -while [ -L "$target" ]; do -if [ "$iteration" -gt 100 ]; then -echo "Cannot resolve path: You have a cyclic symlink in $target." +followSymLink() { +local iteration=0 +local bar=$1 +while [ -L "$bar" ]; do +if [ "$iteration" -gt 100 ]; then +echo "Cannot resolve path: You have a cyclic symlink in $bar." +break +fi +ls=`ls -ld -- "$bar"` +bar=`expr "$ls" : '.* -> \(.*\)$'` +iteration=$((iteration + 1)) +done + +echo "$bar" +} + +target=$(followSymLink "$0") + +#Search --configDir into the parameters and set it as FLINK_CONF_DIR +args=("$@") +for i in "${!args[@]}"; do +if [ ${args[$i]} = "--configDir" ]; then +dir=$(followSymLink "${args[$(($i+1))]}" ) +if [ -d "$dir" ]; then +FLINK_CONF_DIR=`cd "${dir}"; pwd -P` --- End diff -- I found a similar command [here](https://github.com/alkagin/flink/blob/76283d3f0fbf5454b208ba1887d847acea09640d/flink-dist/src/main/flink-bin/bin/config.sh#L132). > Add configDir parameter to CliFrontend and flink shell script > - > > Key: FLINK-4084 > URL: https://issues.apache.org/jira/browse/FLINK-4084 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.0 >Reporter: Till Rohrmann >Assignee: Andrea Sella >Priority: Minor > > At the moment there is no other way than the environment variable > FLINK_CONF_DIR to specify the configuration directory for the CliFrontend if > it is started via the flink shell script. In order to improve the user > exprience, I would propose to introduce a {{--configDir}} parameter which the > user can use to specify a configuration directory more easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68743168 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[jira] [Commented] (FLINK-4062) Update Windowing Documentation
[ https://issues.apache.org/jira/browse/FLINK-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352859#comment-15352859 ] ASF GitHub Bot commented on FLINK-4062: --- Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68742875 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy()
[GitHub] flink pull request #2154: [FLINK-4062] Update Windowing Documentation
Github user kl0u commented on a diff in the pull request: https://github.com/apache/flink/pull/2154#discussion_r68742875 --- Diff: docs/apis/streaming/windows.md --- @@ -24,1023 +24,593 @@ specific language governing permissions and limitations under the License. --> +Flink uses a concept called *windows* to divide a (potentially) infinite `DataStream` into slices +based on the timestamps of elements or other criteria. This division is required when working +with infinite streams of data and performing transformations that aggregate elements. + +Info We will mostly talk about *keyed windowing* here, this +means that the elements are subdivided based on both window and key before being given to +a user function. Keyed windows have the advantage that work can be distributed across the cluster +because the elements for different keys can be processed in isolation. If you absolutely must, +you can check out [non-keyed windowing](#non-keyed-windowing) where we describe how non-keyed +windows work. + * This will be replaced by the TOC {:toc} -## Windows on Keyed Data Streams - -Flink offers a variety of methods for defining windows on a `KeyedStream`. All of these group elements *per key*, -i.e., each window will contain elements with the same key value. +## Basics -### Basic Window Constructs +For a windowed transformations you must at least specify a *key* (usually in the form of a +`KeySelector`) a *window assigner* and a *window function*. The *key* specifies how elements are +put into groups. The *window assigner* specifies how the infinite stream is divided into windows. +Finally, the *window function* is used to process the elements of each window. -Flink offers a general window mechanism that provides flexibility, as well as a number of pre-defined windows -for common use cases. See first if your use case can be served by the pre-defined windows below before moving -to defining your own windows. +The basic structure of a windowed transformation is thus as follows: +{% highlight java %} +DataStream input = ...; - - - - - - Transformation - Description - - - - -Tumbling time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "tumbles". This means that elements are - grouped according to their timestamp in groups of 5 second duration, and every element belongs to exactly one window. - The notion of time is specified by the selected TimeCharacteristic (see time). -{% highlight java %} -keyedStream.timeWindow(Time.seconds(5)); -{% endhighlight %} - - - - - Sliding time windowKeyedStream WindowedStream - - - Defines a window of 5 seconds, that "slides" by 1 second. This means that elements are - grouped according to their timestamp in groups of 5 second duration, and elements can belong to more than - one window (since windows overlap by at most 4 seconds) - The notion of time is specified by the selected TimeCharacteristic (see time). - {% highlight java %} -keyedStream.timeWindow(Time.seconds(5), Time.seconds(1)); - {% endhighlight %} - - - - -Tumbling count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "tumbles". This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element belongs to exactly one window. -{% highlight java %} -keyedStream.countWindow(1000); -{% endhighlight %} - - - - - Sliding count windowKeyedStream WindowedStream - - - Defines a window of 1000 elements, that "slides" every 100 elements. This means that elements are - grouped according to their arrival time (equivalent to processing time) in groups of 1000 elements, - and every element can belong to more than one window (as windows overlap by at most 900 elements). - {% highlight java %} -keyedStream.countWindow(1000, 100) - {% endhighlight %} - - - - - - +input +.keyBy() +.window() +.(); +{% endhighlight %} +{% highlight scala %} +val input: DataStream[T] = ... - - - - - - Transformation - Description -
[jira] [Commented] (FLINK-4085) Set Kinesis Consumer Agent to Flink
[ https://issues.apache.org/jira/browse/FLINK-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15352858#comment-15352858 ] ASF GitHub Bot commented on FLINK-4085: --- GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/2175 [FLINK-4085][Kinesis] Set Flink-specific user agent I was asked by Amazon to set a Flink specific user agent when accessing the AWS APIs. I've set an agent for the consumer, for the producer I could not find any setting. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink4085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2175 commit b65abd448ca087c4b49bb841ff4a95efaeebdb36 Author: Robert MetzgerDate: 2016-06-28T11:54:12Z [FLINK-4085][Kinesis] Set Flink-specific user agent > Set Kinesis Consumer Agent to Flink > --- > > Key: FLINK-4085 > URL: https://issues.apache.org/jira/browse/FLINK-4085 > Project: Flink > Issue Type: Sub-task > Components: Kinesis Connector, Streaming Connectors >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 1.1.0 > > > Currently, we use the default Kinesis Agent name. > I was asked by Amazon to set it to something containing Flink. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2175: [FLINK-4085][Kinesis] Set Flink-specific user agen...
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/2175 [FLINK-4085][Kinesis] Set Flink-specific user agent I was asked by Amazon to set a Flink specific user agent when accessing the AWS APIs. I've set an agent for the consumer, for the producer I could not find any setting. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink4085 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2175 commit b65abd448ca087c4b49bb841ff4a95efaeebdb36 Author: Robert MetzgerDate: 2016-06-28T11:54:12Z [FLINK-4085][Kinesis] Set Flink-specific user agent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---