[jira] [Updated] (FLINK-2435) Add support for custom CSV field parsers
[ https://issues.apache.org/jira/browse/FLINK-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maximilian Michels updated FLINK-2435: -- Fix Version/s: 1.0 > Add support for custom CSV field parsers > > > Key: FLINK-2435 > URL: https://issues.apache.org/jira/browse/FLINK-2435 > Project: Flink > Issue Type: New Feature > Components: Java API, Scala API >Affects Versions: 0.10 >Reporter: Fabian Hueske > Fix For: 1.0 > > > The {{CSVInputFormats}} have only {{FieldParsers}} for Java's primitive types > (byte, short, int, long, float, double, boolean, String). > It would be good to add support for CSV field parsers for custom data types > which can be registered in a {{CSVReader}}. > We could offer two interfaces for field parsers. > 1. The regular low-level {{FieldParser}} which operates on a byte array and > offsets. > 2. A {{StringFieldParser}} which operates on a String that has been extracted > by a {{StringParser}} before. This interface will be easier to implement but > less efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1276#issuecomment-151091135 Looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()
[ https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973982#comment-14973982 ] ASF GitHub Bot commented on FLINK-2827: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1276#issuecomment-151091135 Looks good! > Potential resource leak in TwitterSource#loadAuthenticationProperties() > --- > > Key: FLINK-2827 > URL: https://issues.apache.org/jira/browse/FLINK-2827 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10 >Reporter: Ted Yu >Assignee: Saumitra Shahapure >Priority: Minor > Labels: starter > > Here is related code: > {code} > Properties properties = new Properties(); > try { > InputStream input = new FileInputStream(authPath); > properties.load(input); > input.close(); > } catch (Exception e) { > throw new RuntimeException("Cannot open .properties > file: " + authPath, e); > } > {code} > If there is exception coming out of properties.load() call, input would be > left open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2411) Add basic graph summarization algorithm
[ https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974011#comment-14974011 ] ASF GitHub Bot commented on FLINK-2411: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1269 > Add basic graph summarization algorithm > --- > > Key: FLINK-2411 > URL: https://issues.apache.org/jira/browse/FLINK-2411 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > > Graph summarization determines a structural grouping of similar vertices and > edges to condense a graph and thus helps to uncover insights about patterns > hidden in the graph. It can be used in OLAP-style operations on the graph and > is similar to group by in SQL but on the graph structure instead of rows. > > The graph summarization operator represents every vertex group by a single > vertex in the summarized graph; edges between vertices in the summary graph > represent a group of edges between the vertex group members of the original > graph. Summarization is defined by specifying grouping keys for vertices and > edges, respectively. > One publication that presents a Map/Reduce based approach is "Pagrol: > Parallel graph olap over large-scale attributed graphs", however they > pre-compute the graph-cube before it can be analyzed. With Flink, we can give > the user an interactive way of summarizing the graph and do not need to > compute the cube beforehand. > A more complex approach focuses on summarization on graph patterns > "SynopSys: Large Graph Analytics in the SAP HANA Database Through > Summarization". > However, I want to start with a simple algorithm that summarizes the graph on > vertex and optionally edge values and additionally stores a count aggregate > at summarized vertices/edges. > Consider the following two examples (e.g., social network with users from > cities and friendships with timestamp): > > h4. Input graph: > > Vertices (id, value): > (0, Leipzig) > (1, Leipzig) > (2, Dresden) > (3, Dresden) > (4, Dresden) > (5, Berlin) > Edges (source, target, value): > (0, 1, 2014) > (1, 0, 2014) > (1, 2, 2013) > (2, 1, 2013) > (2, 3, 2014) > (3, 2, 2014) > (4, 0, 2013) > (4, 1, 2015) > (5, 2, 2015) > (5, 3, 2015) > h4. Output graph (summarized on vertex value): > Vertices (id, value, count) > (0, Leipzig, 2) // "2 users from Leipzig" > (2, Dresden, 3) // "3 users from Dresden" > (5, Berlin, 1) // "1 user from Berlin" > Edges (source, target, count) > (0, 0, 2) // "2 edges between users in Leipzig" > (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden" > (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig" > (2, 2, 2) // "2 edges between users in Dresden" > (5, 2, 2) // "2 edges from users in Berlin to users in Dresden" > h4. Output graph (summarized on vertex and edge value): > Vertices (id, value, count) > (0, Leipzig, 2) > (2, Dresden, 3) > (5, Berlin, 1) > Edges (source, target, value, count) > (0, 0, 2014, 2) // ... > (0, 2, 2013, 1) // ... > (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with > timestamp 2013" > (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with > timestamp 2015" > (2, 2, 2014, 2) // ... > (5, 2, 2015, 2) // ... > I've already implemented two versions of the summarization algorithm in our > own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a > graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. > You can see the current WIP here: > 1 [Abstract > summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java] > 2 [Implementation using > cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java] > 3 [Implementation using > joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java] > 4 > [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java] > 5 > [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf] > I would basically use the same implementation as in 3 in combination with > KeySelectors to select the grouping keys on vertices and edges. > As you can see in the example, each vertex in the resulting graph has a > vertex id that is contained in the original graph. This id is the smallest id > among the grouped vertices (e.g., vertices 2, 3 and 4
[jira] [Created] (FLINK-2921) Add online documentation of sample methods
Till Rohrmann created FLINK-2921: Summary: Add online documentation of sample methods Key: FLINK-2921 URL: https://issues.apache.org/jira/browse/FLINK-2921 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.10 Reporter: Till Rohrmann Priority: Minor I couldn't find online documentation about Flink's sampling API (as part of the {{DataSetUtils}}/{{utils}} package object). We should add information for these methods to our online documentation so that people can more easily use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2905) Add intersect method to Graph class
[ https://issues.apache.org/jira/browse/FLINK-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973972#comment-14973972 ] Vasia Kalavri commented on FLINK-2905: -- Hi [~mju], several {{Graph}} methods use the composite {{sourceID - targetID}} key as a unique identifier. I see the problem with that in your example. If G1 has the edge 1,3,13 and G2 has the edge 1,3,14, these will match, but what should the value be in the intersection? >From the options you present, Option 3 seems the most intuitive to me. >Alternatively, we could maybe provide an option to the method, defining >whether the edge value should be used for matching? > Add intersect method to Graph class > --- > > Key: FLINK-2905 > URL: https://issues.apache.org/jira/browse/FLINK-2905 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > > Currently, the Gelly Graph supports the set operations > {{Graph.union(otherGraph)}} and {{Graph.difference(otherGraph)}}. It would be > nice to have a {{Graph.intersect(otherGraph)}} method, where the resulting > graph contains all vertices and edges contained in both input graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2591) Add configuration parameter for default number of yarn containers
[ https://issues.apache.org/jira/browse/FLINK-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973986#comment-14973986 ] ASF GitHub Bot commented on FLINK-2591: --- Github user willmiao commented on the pull request: https://github.com/apache/flink/pull/1121#issuecomment-151092573 hi @rmetzger , I believe I find the reason why my test failed. A test will fail if we don’t specify “-t” argument(_which is an optional argument_) in the command when call runWithArgs function, and we will get exception here: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Level at org.apache.hadoop.mapred.JobConf.(JobConf.java:357) ``` It seems some jar files relevant to log callback are needed. One more thing, I noticed a segment of code as follows in the file “FlinkYarnSessionCli.java”: ``` File logback = new File(confDirPath + File.pathSeparator + CONFIG_FILE_LOGBACK_NAME); if (logback.exists()) { shipFiles.add(logback); flinkYarnClient.setFlinkLoggingConfigurationPath(new Path(logback.toURI())); } ``` And I wonder if “**File.pathSeparator**” should be “**File.separator**” here. > Add configuration parameter for default number of yarn containers > - > > Key: FLINK-2591 > URL: https://issues.apache.org/jira/browse/FLINK-2591 > Project: Flink > Issue Type: Improvement > Components: YARN Client >Reporter: Robert Metzger >Assignee: Will Miao >Priority: Minor > Labels: starter > > A user complained about the requirement to always specify the number of yarn > containers (-n) when starting a job. > Adding a configuration value with a default value will allow users to set a > default ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2591] Add configuration parameter for d...
Github user willmiao commented on the pull request: https://github.com/apache/flink/pull/1121#issuecomment-151092573 hi @rmetzger , I believe I find the reason why my test failed. A test will fail if we donât specify â-tâ argument(_which is an optional argument_) in the command when call runWithArgs function, and we will get exception here: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Level at org.apache.hadoop.mapred.JobConf.(JobConf.java:357) ``` It seems some jar files relevant to log callback are needed. One more thing, I noticed a segment of code as follows in the file âFlinkYarnSessionCli.javaâ: ``` File logback = new File(confDirPath + File.pathSeparator + CONFIG_FILE_LOGBACK_NAME); if (logback.exists()) { shipFiles.add(logback); flinkYarnClient.setFlinkLoggingConfigurationPath(new Path(logback.toURI())); } ``` And I wonder if â**File.pathSeparator**â should be â**File.separator**â here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2411) Add basic graph summarization algorithm
[ https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vasia Kalavri resolved FLINK-2411. -- Resolution: Implemented Fix Version/s: 1.0 > Add basic graph summarization algorithm > --- > > Key: FLINK-2411 > URL: https://issues.apache.org/jira/browse/FLINK-2411 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > Fix For: 1.0 > > > Graph summarization determines a structural grouping of similar vertices and > edges to condense a graph and thus helps to uncover insights about patterns > hidden in the graph. It can be used in OLAP-style operations on the graph and > is similar to group by in SQL but on the graph structure instead of rows. > > The graph summarization operator represents every vertex group by a single > vertex in the summarized graph; edges between vertices in the summary graph > represent a group of edges between the vertex group members of the original > graph. Summarization is defined by specifying grouping keys for vertices and > edges, respectively. > One publication that presents a Map/Reduce based approach is "Pagrol: > Parallel graph olap over large-scale attributed graphs", however they > pre-compute the graph-cube before it can be analyzed. With Flink, we can give > the user an interactive way of summarizing the graph and do not need to > compute the cube beforehand. > A more complex approach focuses on summarization on graph patterns > "SynopSys: Large Graph Analytics in the SAP HANA Database Through > Summarization". > However, I want to start with a simple algorithm that summarizes the graph on > vertex and optionally edge values and additionally stores a count aggregate > at summarized vertices/edges. > Consider the following two examples (e.g., social network with users from > cities and friendships with timestamp): > > h4. Input graph: > > Vertices (id, value): > (0, Leipzig) > (1, Leipzig) > (2, Dresden) > (3, Dresden) > (4, Dresden) > (5, Berlin) > Edges (source, target, value): > (0, 1, 2014) > (1, 0, 2014) > (1, 2, 2013) > (2, 1, 2013) > (2, 3, 2014) > (3, 2, 2014) > (4, 0, 2013) > (4, 1, 2015) > (5, 2, 2015) > (5, 3, 2015) > h4. Output graph (summarized on vertex value): > Vertices (id, value, count) > (0, Leipzig, 2) // "2 users from Leipzig" > (2, Dresden, 3) // "3 users from Dresden" > (5, Berlin, 1) // "1 user from Berlin" > Edges (source, target, count) > (0, 0, 2) // "2 edges between users in Leipzig" > (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden" > (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig" > (2, 2, 2) // "2 edges between users in Dresden" > (5, 2, 2) // "2 edges from users in Berlin to users in Dresden" > h4. Output graph (summarized on vertex and edge value): > Vertices (id, value, count) > (0, Leipzig, 2) > (2, Dresden, 3) > (5, Berlin, 1) > Edges (source, target, value, count) > (0, 0, 2014, 2) // ... > (0, 2, 2013, 1) // ... > (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with > timestamp 2013" > (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with > timestamp 2015" > (2, 2, 2014, 2) // ... > (5, 2, 2015, 2) // ... > I've already implemented two versions of the summarization algorithm in our > own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a > graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. > You can see the current WIP here: > 1 [Abstract > summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java] > 2 [Implementation using > cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java] > 3 [Implementation using > joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java] > 4 > [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java] > 5 > [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf] > I would basically use the same implementation as in 3 in combination with > KeySelectors to select the grouping keys on vertices and edges. > As you can see in the example, each vertex in the resulting graph has a > vertex id that is contained in the original graph. This id is the smallest id > among the grouped vertices (e.g., vertices 2, 3 and 4 represent Dresden, so 2 > is the group
[jira] [Commented] (FLINK-2411) Add basic graph summarization algorithm
[ https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973926#comment-14973926 ] ASF GitHub Bot commented on FLINK-2411: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1269#issuecomment-151077098 I'll merge this. > Add basic graph summarization algorithm > --- > > Key: FLINK-2411 > URL: https://issues.apache.org/jira/browse/FLINK-2411 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 0.10 >Reporter: Martin Junghanns >Assignee: Martin Junghanns >Priority: Minor > > Graph summarization determines a structural grouping of similar vertices and > edges to condense a graph and thus helps to uncover insights about patterns > hidden in the graph. It can be used in OLAP-style operations on the graph and > is similar to group by in SQL but on the graph structure instead of rows. > > The graph summarization operator represents every vertex group by a single > vertex in the summarized graph; edges between vertices in the summary graph > represent a group of edges between the vertex group members of the original > graph. Summarization is defined by specifying grouping keys for vertices and > edges, respectively. > One publication that presents a Map/Reduce based approach is "Pagrol: > Parallel graph olap over large-scale attributed graphs", however they > pre-compute the graph-cube before it can be analyzed. With Flink, we can give > the user an interactive way of summarizing the graph and do not need to > compute the cube beforehand. > A more complex approach focuses on summarization on graph patterns > "SynopSys: Large Graph Analytics in the SAP HANA Database Through > Summarization". > However, I want to start with a simple algorithm that summarizes the graph on > vertex and optionally edge values and additionally stores a count aggregate > at summarized vertices/edges. > Consider the following two examples (e.g., social network with users from > cities and friendships with timestamp): > > h4. Input graph: > > Vertices (id, value): > (0, Leipzig) > (1, Leipzig) > (2, Dresden) > (3, Dresden) > (4, Dresden) > (5, Berlin) > Edges (source, target, value): > (0, 1, 2014) > (1, 0, 2014) > (1, 2, 2013) > (2, 1, 2013) > (2, 3, 2014) > (3, 2, 2014) > (4, 0, 2013) > (4, 1, 2015) > (5, 2, 2015) > (5, 3, 2015) > h4. Output graph (summarized on vertex value): > Vertices (id, value, count) > (0, Leipzig, 2) // "2 users from Leipzig" > (2, Dresden, 3) // "3 users from Dresden" > (5, Berlin, 1) // "1 user from Berlin" > Edges (source, target, count) > (0, 0, 2) // "2 edges between users in Leipzig" > (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden" > (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig" > (2, 2, 2) // "2 edges between users in Dresden" > (5, 2, 2) // "2 edges from users in Berlin to users in Dresden" > h4. Output graph (summarized on vertex and edge value): > Vertices (id, value, count) > (0, Leipzig, 2) > (2, Dresden, 3) > (5, Berlin, 1) > Edges (source, target, value, count) > (0, 0, 2014, 2) // ... > (0, 2, 2013, 1) // ... > (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with > timestamp 2013" > (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with > timestamp 2015" > (2, 2, 2014, 2) // ... > (5, 2, 2015, 2) // ... > I've already implemented two versions of the summarization algorithm in our > own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a > graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. > You can see the current WIP here: > 1 [Abstract > summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java] > 2 [Implementation using > cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java] > 3 [Implementation using > joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java] > 4 > [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java] > 5 > [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf] > I would basically use the same implementation as in 3 in combination with > KeySelectors to select the grouping keys on vertices and edges. > As you can see in the example, each vertex in the resulting graph has a > vertex id that is contained in the original graph. This id is the smallest id >
[GitHub] flink pull request: [FLINK-2411] [gelly] Add Summarization Algorit...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1269#issuecomment-151077098 I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2909) Gelly Graph Generators
[ https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973937#comment-14973937 ] Vasia Kalavri commented on FLINK-2909: -- Hi [~greghogan], thank you for opening this issue :) Can you give us some idea on what are your implementation plans for this? Are you planning to implement all the generators you mention in the description as separate algorithms or are you considering making this a Gelly utility where we can add generators incrementally? Thank you! > Gelly Graph Generators > -- > > Key: FLINK-2909 > URL: https://issues.apache.org/jira/browse/FLINK-2909 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > Include a selection of graph generators in Gelly. Generated graphs will be > useful for performing scalability, stress, and regression testing as well as > benchmarking and comparing algorithms, for both Flink users and developers. > Generated data is infinitely scalable yet described by a few simple > parameters and can often substitute for user data or sharing large files when > reporting issues. > There are at multiple categories of graphs as documented by > [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html] > and elsewhere. > Graphs may be a well-defined, i.e. the [Chvátal > graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be > sufficiently small to populate locally. > Graphs may be scalable, i.e. complete and star graphs. These should use > Flink's distributed parallelism. > Graphs may be stochastic, i.e. [RMat > graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] > . A key consideration is that the graphs should source randomness from a > seedable PRNG and generate the same Graph regardless of parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/1214#discussion_r42979520 --- Diff: flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java --- @@ -304,4 +368,60 @@ public String getDescription() { return "TestOptimizerPlan "; } } + + public static final class TestExecuteTwice { + + public static void main(String args[]) throws Exception { --- End diff -- Ok that's better then! Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2411] [gelly] Add Summarization Algorit...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1269 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973843#comment-14973843 ] ASF GitHub Bot commented on FLINK-2919: --- GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1300 [FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class. JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the `FieldAccessMinibenchmark` class and move it to `flink-benchmark` module. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink fieldAccessMiniBenchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1300.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1300 commit 3b9a8db1abaf956a04416974cdb2f41fd6fc4300 Author: gallenvaraDate: 2015-10-26T07:15:14Z [FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class. > Apply JMH on FieldAccessMinibenchmark class. > > > Key: FLINK-2919 > URL: https://issues.apache.org/jira/browse/FLINK-2919 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...
GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1300 [FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class. JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the `FieldAccessMinibenchmark` class and move it to `flink-benchmark` module. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink fieldAccessMiniBenchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1300.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1300 commit 3b9a8db1abaf956a04416974cdb2f41fd6fc4300 Author: gallenvaraDate: 2015-10-26T07:15:14Z [FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2870) Add support for accumulating/discarding for Event-Time Windows
[ https://issues.apache.org/jira/browse/FLINK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-2870: Fix Version/s: (was: 0.10) > Add support for accumulating/discarding for Event-Time Windows > -- > > Key: FLINK-2870 > URL: https://issues.apache.org/jira/browse/FLINK-2870 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > This would allow to specify whether windows should be discarded after the > trigger fires or kept in the operator if late elements arrive. > When keeping elements, the user would also have to specify an allowed > lateness time after which the window contents are discarded without emitting > any further window evaluation result. > If elements arrive after the allowed lateness they would trigger the window > immediately with only the one single element. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2891) Key for Keyed State is not set upon Window Evaluation
[ https://issues.apache.org/jira/browse/FLINK-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek closed FLINK-2891. --- Resolution: Fixed Fixed for both types of window operations now. > Key for Keyed State is not set upon Window Evaluation > - > > Key: FLINK-2891 > URL: https://issues.apache.org/jira/browse/FLINK-2891 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Affects Versions: 0.10 >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 0.10 > > > In both the aligned and the general-purpose windows the key for the keyed > operator state is not set when evaluating the windows. This silently leads to > incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975163#comment-14975163 ] ASF GitHub Bot commented on FLINK-2919: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1300 > Apply JMH on FieldAccessMinibenchmark class. > > > Key: FLINK-2919 > URL: https://issues.apache.org/jira/browse/FLINK-2919 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()
[ https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975164#comment-14975164 ] ASF GitHub Bot commented on FLINK-2827: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1276 > Potential resource leak in TwitterSource#loadAuthenticationProperties() > --- > > Key: FLINK-2827 > URL: https://issues.apache.org/jira/browse/FLINK-2827 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10 >Reporter: Ted Yu >Assignee: Saumitra Shahapure >Priority: Minor > Labels: starter > > Here is related code: > {code} > Properties properties = new Properties(); > try { > InputStream input = new FileInputStream(authPath); > properties.load(input); > input.close(); > } catch (Exception e) { > throw new RuntimeException("Cannot open .properties > file: " + authPath, e); > } > {code} > If there is exception coming out of properties.load() call, input would be > left open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class
[ https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975161#comment-14975161 ] ASF GitHub Bot commented on FLINK-2889: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1283 > Apply JMH on LongSerializationSpeedBenchmark class > -- > > Key: FLINK-2889 > URL: https://issues.apache.org/jira/browse/FLINK-2889 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > LongSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975160#comment-14975160 ] ASF GitHub Bot commented on FLINK-2890: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1284 > Apply JMH on StringSerializationSpeedBenchmark class. > - > > Key: FLINK-2890 > URL: https://issues.apache.org/jira/browse/FLINK-2890 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > StringSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1611) [REFACTOR] Rename classes and packages in test that contains Nephele
[ https://issues.apache.org/jira/browse/FLINK-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Saputra closed FLINK-1611. Resolution: Fixed Fix Version/s: 0.10 This seemed to be fixed. > [REFACTOR] Rename classes and packages in test that contains Nephele > > > Key: FLINK-1611 > URL: https://issues.apache.org/jira/browse/FLINK-1611 > Project: Flink > Issue Type: Improvement > Components: other >Reporter: Henry Saputra >Assignee: Henry Saputra >Priority: Minor > Fix For: 0.10 > > > We have several classes and packages names that have Nephele names: > ./flink-tests/src/test/java/org/apache/flink/test/broadcastvars/BroadcastVarsNepheleITCase.java > ./flink-tests/src/test/java/org/apache/flink/test/broadcastvars/KMeansIterativeNepheleITCase.java > ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/ConnectedComponentsNepheleITCase.java > ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/DanglingPageRankNepheleITCase.java > ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/DanglingPageRankWithCombinerNepheleITCase.java > ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/IterationWithChainingNepheleITCase.java > Nephele was the older name used by Flink in early years to describe the Flink > processing engine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2909) Gelly Graph Generators
[ https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974985#comment-14974985 ] Greg Hogan commented on FLINK-2909: --- My thought was to initially implement one or two of each to prove the API. Each generator will have its own class. > Gelly Graph Generators > -- > > Key: FLINK-2909 > URL: https://issues.apache.org/jira/browse/FLINK-2909 > Project: Flink > Issue Type: New Feature > Components: Gelly >Affects Versions: 1.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > > Include a selection of graph generators in Gelly. Generated graphs will be > useful for performing scalability, stress, and regression testing as well as > benchmarking and comparing algorithms, for both Flink users and developers. > Generated data is infinitely scalable yet described by a few simple > parameters and can often substitute for user data or sharing large files when > reporting issues. > There are at multiple categories of graphs as documented by > [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html] > and elsewhere. > Graphs may be a well-defined, i.e. the [Chvátal > graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be > sufficiently small to populate locally. > Graphs may be scalable, i.e. complete and star graphs. These should use > Flink's distributed parallelism. > Graphs may be stochastic, i.e. [RMat > graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] > . A key consideration is that the graphs should source randomness from a > seedable PRNG and generate the same Graph regardless of parallelism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2922) Add Queryable Window Operator
[ https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974914#comment-14974914 ] Aljoscha Krettek commented on FLINK-2922: - I don't exactly know what you mean by that but this would still require a special kind of operator to support that, that's where the queryable window operator comes in. This is a mockup of what it would look like in practice: {code} DataStream text = env.socketTextStream("localhost", ); DataStream query = env.socketTextStream("localhost", 9998); WindowStreamOperator, Tuple2 > winStream = text .flatMap(new WordCount.Tokenizer()) .keyBy(0) .countWindow(10) .query(query.keyBy(new IdentityKey())) .apply(new WindowFunction , Tuple2 , Tuple, GlobalWindow>() { private static final long serialVersionUID = 1L; @Override public void apply(Tuple tuple, GlobalWindow window, Iterable > values, Collector > out) throws Exception { int sum = 0; for (Tuple2 val : values) { sum += val.f1; } out.collect(Tuple2.of((String) tuple.getField(0), sum)); } }); winStream.print(); // WindowResult // QT = query type // T = window result type DataStream >> queryResults = winStream.getQueryResultStream(); querResults.print(); {code} > Add Queryable Window Operator > - > > Key: FLINK-2922 > URL: https://issues.apache.org/jira/browse/FLINK-2922 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > The idea is to provide a window operator that allows to query the current > window result at any time without discarding the current result. > For example, a user might have an aggregation window operation with tumbling > windows of 1 hour. Now, at any time they might be interested in the current > aggregated value for the currently in-flight hour window. > The idea is to make the operator a two input operator where normal elements > arrive on input one while queries arrive on input two. The query stream must > be keyed by the same key as the input stream. If an input arrives for a key > the current value for that key is emitted along with the query element so > that the user can map the result to the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-2890. Resolution: Fixed Fix Version/s: 1.0 Fixed with 75a5257412606ac70113850439457cce7da3b2e6 > Apply JMH on StringSerializationSpeedBenchmark class. > - > > Key: FLINK-2890 > URL: https://issues.apache.org/jira/browse/FLINK-2890 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > Fix For: 1.0 > > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > StringSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-2919. Resolution: Fixed Fix Version/s: 1.0 Fixed with 7265d81ff95aff4ddfbcbd4ef25869ea8f159769 > Apply JMH on FieldAccessMinibenchmark class. > > > Key: FLINK-2919 > URL: https://issues.apache.org/jira/browse/FLINK-2919 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > Fix For: 1.0 > > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()
[ https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-2827. Resolution: Fixed Fix Version/s: 1.0 Fixed with 6a8e90b3621bb96304b325a4ae8f7f5575ec909a > Potential resource leak in TwitterSource#loadAuthenticationProperties() > --- > > Key: FLINK-2827 > URL: https://issues.apache.org/jira/browse/FLINK-2827 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10 >Reporter: Ted Yu >Assignee: Saumitra Shahapure >Priority: Minor > Labels: starter > Fix For: 1.0 > > > Here is related code: > {code} > Properties properties = new Properties(); > try { > InputStream input = new FileInputStream(authPath); > properties.load(input); > input.close(); > } catch (Exception e) { > throw new RuntimeException("Cannot open .properties > file: " + authPath, e); > } > {code} > If there is exception coming out of properties.load() call, input would be > left open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class
[ https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-2889. Resolution: Fixed Fix Version/s: 1.0 Fixed with 75a5257412606ac70113850439457cce7da3b2e6 > Apply JMH on LongSerializationSpeedBenchmark class > -- > > Key: FLINK-2889 > URL: https://issues.apache.org/jira/browse/FLINK-2889 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > Fix For: 1.0 > > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > LongSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1276 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975162#comment-14975162 ] ASF GitHub Bot commented on FLINK-2853: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1267 > Apply JMH on MutableHashTablePerformanceBenchmark class. > > > Key: FLINK-2853 > URL: https://issues.apache.org/jira/browse/FLINK-2853 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1300 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2853] [tests] Apply JMH on MutableHashT...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1267 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2890] [tests] Apply JMH on StringSerial...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1284 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1283 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies
[ https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975242#comment-14975242 ] Flavio Pompermaier commented on FLINK-1827: --- A lot of improvements regarding Flink tests are going on in these last days...any effort in solving this easy issue? Do you want me to open a PR for this? > Move test classes in test folders and fix scope of test dependencies > > > Key: FLINK-1827 > URL: https://issues.apache.org/jira/browse/FLINK-1827 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 0.9 >Reporter: Flavio Pompermaier >Priority: Minor > Labels: test-compile > Original Estimate: 4h > Remaining Estimate: 4h > > Right now it is not possible to avoid compilation of test classes > (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) > requires test classes in non-test sources (e.g. > scalatest_${scala.binary.version}) > Test classes should be moved to src/main/test (if Java) and src/test/scala > (if scala) and use scope=test for test dependencies -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2920] [tests] Apply JMH on KryoVersusAv...
GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1302 [FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class. JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the `KryoVersusAvroMinibenchmark` class and move it to `flink-benchmark` module. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink KryoVersusAvroMinibenchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1302.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1302 commit 6c64e262a3a55471640368a1be12feff229a08cd Author: gallenvaraDate: 2015-10-27T03:27:03Z [FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2920) Apply JMH on KryoVersusAvroMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975656#comment-14975656 ] ASF GitHub Bot commented on FLINK-2920: --- GitHub user gallenvara opened a pull request: https://github.com/apache/flink/pull/1302 [FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class. JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks method in order to get much more accurate results.Modify the `KryoVersusAvroMinibenchmark` class and move it to `flink-benchmark` module. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gallenvara/flink KryoVersusAvroMinibenchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1302.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1302 commit 6c64e262a3a55471640368a1be12feff229a08cd Author: gallenvaraDate: 2015-10-27T03:27:03Z [FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class. > Apply JMH on KryoVersusAvroMinibenchmark class. > --- > > Key: FLINK-2920 > URL: https://issues.apache.org/jira/browse/FLINK-2920 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > KryoVersusAvroMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2925) Client shows incomplete cause of Exception
[ https://issues.apache.org/jira/browse/FLINK-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2925: --- Summary: Client shows incomplete cause of Exception (was: Client does not show root cause of Exception) > Client shows incomplete cause of Exception > -- > > Key: FLINK-2925 > URL: https://issues.apache.org/jira/browse/FLINK-2925 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Affects Versions: 0.10 >Reporter: Ufuk Celebi > > I get the following Exception at the client: > {code} > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Job execution failed > at org.apache.flink.client.program.Client.runBlocking(Client.java:370) > at org.apache.flink.client.program.Client.runBlocking(Client.java:348) > at org.apache.flink.client.program.Client.runBlocking(Client.java:315) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70) > at > org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395) > at org.apache.flink.client.program.Client.runBlocking(Client.java:252) > at > org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:670) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:325) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:971) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1021) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed > at > org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:177) > at org.apache.flink.client.program.Client.runBlocking(Client.java:368) > ... 15 more > Caused by: java.lang.Exception: Serialized representation of > org.apache.flink.runtime.client.JobExecutionException: Failed to submit job > 57d1efe70571500f851ed5ff24bf401f (WordCount Example) > at > org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:944) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:152) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) > at > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33) > at > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) > at > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28) > at akka.actor.Actor$class.aroundReceive(Actor.scala:465) > at > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) > at akka.actor.ActorCell.invoke(ActorCell.scala:487) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) > at akka.dispatch.Mailbox.run(Mailbox.scala:221) > at akka.dispatch.Mailbox.exec(Mailbox.scala:231) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >
[jira] [Created] (FLINK-2924) Create database state backend
Gyula Fora created FLINK-2924: - Summary: Create database state backend Key: FLINK-2924 URL: https://issues.apache.org/jira/browse/FLINK-2924 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora The goal is to create a database state backend that can be used with JDBC supporting databases. The backend should support the storage of non-partitioned states, and also the storage of Key-value states with high throughput. As databases provide advanced querying functionality the key-value state can be implemented to be lazily fetched and should scale to "arbitrary" state sizes by not storing the non-active key-values on heap. An adapter class will be provided that can help bridge the gap between different sql implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2922) Add Queryable Window Operator
[ https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974818#comment-14974818 ] Gyula Fora commented on FLINK-2922: --- I think this could be implemented nicely with the same logic I did for the StreamKV (https://github.com/gyfora/StreamKV) Where you create an abstract queryable object (the window). On which you apply queries that return QueryResults from which you can get the streams. This would abstract away the tuple logic from the user who can simple work on the streams then. > Add Queryable Window Operator > - > > Key: FLINK-2922 > URL: https://issues.apache.org/jira/browse/FLINK-2922 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > The idea is to provide a window operator that allows to query the current > window result at any time without discarding the current result. > For example, a user might have an aggregation window operation with tumbling > windows of 1 hour. Now, at any time they might be interested in the current > aggregated value for the currently in-flight hour window. > The idea is to make the operator a two input operator where normal elements > arrive on input one while queries arrive on input two. The query stream must > be keyed by the same key as the input stream. If an input arrives for a key > the current value for that key is emitted along with the query element so > that the user can map the result to the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43004130 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.java.functions; + +import org.apache.flink.api.common.distributions.DataDistribution; +import org.apache.flink.api.common.functions.RichMapPartitionFunction; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.util.Collector; + +import java.util.ArrayList; +import java.util.List; + +/** + * This mapPartition function require a DataSet with DataDistribution as broadcast input, it read + * target parallelism from parameter, build partition boundaries with input DataDistribution, then + * compute the range index for each record. + * + * @param The original data type. + * @param The key type. + */ +public class AssignRangeIndex+ extends RichMapPartitionFunction , Tuple2 > { + + private List partitionBoundaries; + private int numberChannels; + + @Override + public void open(Configuration parameters) throws Exception { + this.numberChannels = parameters.getInteger("TargetParallelism", 1); + } + + @Override + public void mapPartition(Iterable > values, Collector > out) throws Exception { + + List broadcastVariable = getRuntimeContext().getBroadcastVariable("DataDistribution"); + if (broadcastVariable == null || broadcastVariable.size() != 1) { --- End diff -- You can move the broadcast variable initialization into the open method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974343#comment-14974343 ] ASF GitHub Bot commented on FLINK-7: Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43004130 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.java.functions; + +import org.apache.flink.api.common.distributions.DataDistribution; +import org.apache.flink.api.common.functions.RichMapPartitionFunction; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.util.Collector; + +import java.util.ArrayList; +import java.util.List; + +/** + * This mapPartition function require a DataSet with DataDistribution as broadcast input, it read + * target parallelism from parameter, build partition boundaries with input DataDistribution, then + * compute the range index for each record. + * + * @param The original data type. + * @param The key type. + */ +public class AssignRangeIndex+ extends RichMapPartitionFunction , Tuple2 > { + + private List partitionBoundaries; + private int numberChannels; + + @Override + public void open(Configuration parameters) throws Exception { + this.numberChannels = parameters.getInteger("TargetParallelism", 1); + } + + @Override + public void mapPartition(Iterable > values, Collector > out) throws Exception { + + List broadcastVariable = getRuntimeContext().getBroadcastVariable("DataDistribution"); + if (broadcastVariable == null || broadcastVariable.size() != 1) { --- End diff -- You can move the broadcast variable initialization into the open method. > [GitHub] Enable Range Partitioner > - > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime >Reporter: GitHub Import >Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2922) Add Queryable Window Operator
Aljoscha Krettek created FLINK-2922: --- Summary: Add Queryable Window Operator Key: FLINK-2922 URL: https://issues.apache.org/jira/browse/FLINK-2922 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek The idea is to provide a window operator that allows to query the current window result at any time without discarding the current result. For example, a user might have an aggregation window operation with tumbling windows of 1 hour. Now, at any time they might be interested in the current aggregated value for the currently in-flight hour window. The idea is to make the operator a two input operator where normal elements arrive on input one while queries arrive on input two. The query stream must be keyed by the same key as the input stream. If an input arrives for a key the current value for that key is emitted along with the query element so that the user can map the result to the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974346#comment-14974346 ] ASF GitHub Bot commented on FLINK-7: Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43004309 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.api.java.functions; + +import org.apache.flink.api.common.distributions.DataDistribution; +import org.apache.flink.api.common.functions.RichMapPartitionFunction; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.util.Collector; + +import java.util.ArrayList; +import java.util.List; + +/** + * This mapPartition function require a DataSet with DataDistribution as broadcast input, it read + * target parallelism from parameter, build partition boundaries with input DataDistribution, then + * compute the range index for each record. + * + * @param The original data type. + * @param The key type. + */ +public class AssignRangeIndex+ extends RichMapPartitionFunction , Tuple2 > { + + private List partitionBoundaries; + private int numberChannels; + + @Override + public void open(Configuration parameters) throws Exception { + this.numberChannels = parameters.getInteger("TargetParallelism", 1); + } + + @Override + public void mapPartition(Iterable > values, Collector > out) throws Exception { + + List broadcastVariable = getRuntimeContext().getBroadcastVariable("DataDistribution"); + if (broadcastVariable == null || broadcastVariable.size() != 1) { --- End diff -- Nevermind, I thought you were using a MapFunction, but its a MapPartitionFunction. So this is only done once. > [GitHub] Enable Range Partitioner > - > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime >Reporter: GitHub Import >Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2017] Add predefined required parameter...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1097#issuecomment-151142957 Going through FLINK-2017, I interpreted the semantics of `check` and `checkAndPopulate` as follows: - `check` verifies that all required parameters are provided and valid (type, choice, etc.) - `checkAndPopulate` copied the default values of the options into the `ParameterTool` I think a method that is called `checkAndPopulate` should perform the same checks as `check` and do the *populate* in addition. Moreover, I do not see why `check` alone would be useful. Why would somebody define required parameters with default values without enforcing them, i.e., why you call `check` without `checkAndPopulate`. If we have a single method, we could also give it a simpler name such as `applyTo`. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2918] Add method to read a file of type...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1299#issuecomment-151125493 LGTM. The `readAsSequenceFile` is nice syntactic sugar to add. But we should add documentation to the online documentation since it's the user facing API which we extend here. Furthermore, we should think about adding a corresponding `writeAsSequenceFile` method to the `DataSet` as the counter-part to `readAsSequenceFile`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2918) Add a method to be able to read SequenceFileInputFormat
[ https://issues.apache.org/jira/browse/FLINK-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974183#comment-14974183 ] ASF GitHub Bot commented on FLINK-2918: --- Github user smarthi commented on the pull request: https://github.com/apache/flink/pull/1299#issuecomment-151130141 Also need to add a method for the new Hadoop API, the present PR only deals with Hadoop v1 > Add a method to be able to read SequenceFileInputFormat > --- > > Key: FLINK-2918 > URL: https://issues.apache.org/jira/browse/FLINK-2918 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 0.9.1 >Reporter: Suneel Marthi >Assignee: Suneel Marthi >Priority: Minor > Fix For: 0.10 > > > This is to add a method to ExecutionEnvironment.{java,scala} to be able to > provide syntactic sugar to read a SequenceFileInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool
[ https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974232#comment-14974232 ] ASF GitHub Bot commented on FLINK-2017: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1097#issuecomment-151142957 Going through FLINK-2017, I interpreted the semantics of `check` and `checkAndPopulate` as follows: - `check` verifies that all required parameters are provided and valid (type, choice, etc.) - `checkAndPopulate` copied the default values of the options into the `ParameterTool` I think a method that is called `checkAndPopulate` should perform the same checks as `check` and do the *populate* in addition. Moreover, I do not see why `check` alone would be useful. Why would somebody define required parameters with default values without enforcing them, i.e., why you call `check` without `checkAndPopulate`. If we have a single method, we could also give it a simpler name such as `applyTo`. What do you think? > Add predefined required parameters to ParameterTool > --- > > Key: FLINK-2017 > URL: https://issues.apache.org/jira/browse/FLINK-2017 > Project: Flink > Issue Type: Improvement >Affects Versions: 0.9 >Reporter: Robert Metzger > Labels: starter > > In FLINK-1525 we've added the {{ParameterTool}}. > During the PR review, there was a request for required parameters. > This issue is about implementing a facility to define required parameters. > The tool should also be able to print a help menu with a list of all > parameters. > This test case shows my initial ideas how to design the API > {code} > @Test > public void requiredParameters() { > RequiredParameters required = new RequiredParameters(); > Option input = required.add("input").alt("i").help("Path to > input file or directory"); // parameter with long and short variant > required.add("output"); // parameter only with long variant > Option parallelism = > required.add("parallelism").alt("p").type(Integer.class); // parameter with > type > Option spOption = > required.add("sourceParallelism").alt("sp").defaultValue(12).help("Number > specifying the number of parallel data source instances"); // parameter with > default value, specifying the type. > Option executionType = > required.add("executionType").alt("et").defaultValue("pipelined").choices("pipelined", > "batch"); > ParameterUtil parameter = ParameterUtil.fromArgs(new > String[]{"-i", "someinput", "--output", "someout", "-p", "15"}); > required.check(parameter); > required.printHelp(); > required.checkAndPopulate(parameter); > String inputString = input.get(); > int par = parallelism.getInteger(); > String output = parameter.get("output"); > int sourcePar = parameter.getInteger(spOption.getName()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2918] Add method to read a file of type...
Github user smarthi commented on the pull request: https://github.com/apache/flink/pull/1299#issuecomment-151130141 Also need to add a method for the new Hadoop API, the present PR only deals with Hadoop v1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2017] Add predefined required parameter...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1097#discussion_r42993834 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/utils/RequiredParameter.java --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.utils; + +import java.util.HashMap; +import java.util.Map; +import java.util.Objects; + +/** + * Facility to manage required parameters in user defined functions. + */ +public class RequiredParameter { + + private static final String HELP_TEXT_PARAM_DELIMITER = "\t"; + private static final String HELP_TEXT_LINE_DELIMITER = "\n"; + + private HashMapdata; + + public RequiredParameter() { + this.data = new HashMap<>(); + } + + public void add(Option option) throws RequiredParameterException { --- End diff -- You can also do: `req.add("name").type(...).values(...)` just like proposed in the FLINK-2017 and users do not need to care about the `Option` class which is very nice, IMO. In some cases, alternatives can raise more questions and confusion than help. So I would rather remove `add(Option)` but that's just my opinion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool
[ https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974213#comment-14974213 ] ASF GitHub Bot commented on FLINK-2017: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1097#discussion_r42993834 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/utils/RequiredParameter.java --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.utils; + +import java.util.HashMap; +import java.util.Map; +import java.util.Objects; + +/** + * Facility to manage required parameters in user defined functions. + */ +public class RequiredParameter { + + private static final String HELP_TEXT_PARAM_DELIMITER = "\t"; + private static final String HELP_TEXT_LINE_DELIMITER = "\n"; + + private HashMapdata; + + public RequiredParameter() { + this.data = new HashMap<>(); + } + + public void add(Option option) throws RequiredParameterException { --- End diff -- You can also do: `req.add("name").type(...).values(...)` just like proposed in the FLINK-2017 and users do not need to care about the `Option` class which is very nice, IMO. In some cases, alternatives can raise more questions and confusion than help. So I would rather remove `add(Option)` but that's just my opinion. > Add predefined required parameters to ParameterTool > --- > > Key: FLINK-2017 > URL: https://issues.apache.org/jira/browse/FLINK-2017 > Project: Flink > Issue Type: Improvement >Affects Versions: 0.9 >Reporter: Robert Metzger > Labels: starter > > In FLINK-1525 we've added the {{ParameterTool}}. > During the PR review, there was a request for required parameters. > This issue is about implementing a facility to define required parameters. > The tool should also be able to print a help menu with a list of all > parameters. > This test case shows my initial ideas how to design the API > {code} > @Test > public void requiredParameters() { > RequiredParameters required = new RequiredParameters(); > Option input = required.add("input").alt("i").help("Path to > input file or directory"); // parameter with long and short variant > required.add("output"); // parameter only with long variant > Option parallelism = > required.add("parallelism").alt("p").type(Integer.class); // parameter with > type > Option spOption = > required.add("sourceParallelism").alt("sp").defaultValue(12).help("Number > specifying the number of parallel data source instances"); // parameter with > default value, specifying the type. > Option executionType = > required.add("executionType").alt("et").defaultValue("pipelined").choices("pipelined", > "batch"); > ParameterUtil parameter = ParameterUtil.fromArgs(new > String[]{"-i", "someinput", "--output", "someout", "-p", "15"}); > required.check(parameter); > required.printHelp(); > required.checkAndPopulate(parameter); > String inputString = input.get(); > int par = parallelism.getInteger(); > String output = parameter.get("output"); > int sourcePar = parameter.getInteger(spOption.getName()); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974366#comment-14974366 ] ASF GitHub Bot commented on FLINK-7: Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43006432 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java --- @@ -1223,6 +1230,51 @@ public long count() throws Exception { final TypeInformation keyType = TypeExtractor.getKeySelectorTypes(keyExtractor, getType()); return new PartitionOperator(this, PartitionMethod.HASH, new Keys.SelectorFunctionKeys(clean(keyExtractor), this.getType(), keyType), Utils.getCallLocationName()); } + + /** +* Range-partitions a DataSet using the specified KeySelector. +* +* Important:This operation shuffles the whole DataSet over the network and can take significant amount of time. +* +* @param keySelector The KeySelector with which the DataSet is range-partitioned. +* @return The partitioned DataSet. +* +* @see KeySelector +*/ + public > DataSet partitionByRange(KeySelector keySelector) { + final TypeInformation keyType = TypeExtractor.getKeySelectorTypes(keySelector, getType()); + String callLocation = Utils.getCallLocationName(); + + // Extract key from input element by keySelector. + KeyExtractorMapper keyExtractorMapper = new KeyExtractorMapper (keySelector); --- End diff -- It would be good to inject the sampling and partition ID assignment code in the `JobGraphGenerator` and not at the API level. The `JobGraphGenerator` is called after the `Optimizer` and translates the optimized plan into a parallel data flow called `JobGraph` which is executed by the runtime. The benefit of injecting the code at this point is that any range partitioning can be handled transparently within the optimizer. This means also other operators except the explicit `partitionByRange()` such as Join, CoGroup, and Reduce can benefit from range partitioning. In addition this makes the injected code a part of the runtime which can be more transparently improved later on. The downside (for you) is that the job abstraction is much lower at this level. However, you have still access to the chosen key fields and type information of all operators. See the `JavaApiPostPass` class to learn how to generate serializers and comparators at this level. > [GitHub] Enable Range Partitioner > - > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime >Reporter: GitHub Import >Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1255#issuecomment-151177199 Hi, I left a few comments inside. I don't think we need the special user code to extract keys. The comparators provide this functionality and can be generated within the JobGraphGenerator. Let me know what you think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1283#issuecomment-151191115 Thanks for the update. Looks good to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode
[ https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974487#comment-14974487 ] ASF GitHub Bot commented on FLINK-2797: --- Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/1214#discussion_r43015201 --- Diff: flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java --- @@ -304,4 +368,60 @@ public String getDescription() { return "TestOptimizerPlan "; } } + + public static final class TestExecuteTwice { + + public static void main(String args[]) throws Exception { --- End diff -- Okay. Cool. Lemme know if there's any other issues to address. :smile: > CLI: Missing option to submit jobs in detached mode > --- > > Key: FLINK-2797 > URL: https://issues.apache.org/jira/browse/FLINK-2797 > Project: Flink > Issue Type: Bug > Components: Command-line client >Affects Versions: 0.9, 0.10 >Reporter: Maximilian Michels >Assignee: Sachin Goel > Fix For: 0.10 > > > Jobs can only be submitted in detached mode using YARN but not on a > standalone installation. This has been requested by users who want to submit > a job, get the job id, and later query its status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974391#comment-14974391 ] ASF GitHub Bot commented on FLINK-7: Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43008448 --- Diff: flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java --- @@ -1138,12 +1138,13 @@ private DistributionPattern connectJobVertices(Channel channel, int inputNumber, if (channel.getShipStrategy() == ShipStrategyType.PARTITION_RANGE) { - final DataDistribution dataDistribution = channel.getDataDistribution(); + DataDistribution dataDistribution = channel.getDataDistribution(); if (dataDistribution != null) { sourceConfig.setOutputDataDistribution(dataDistribution, outputIndex); } else { - throw new RuntimeException("Range partitioning requires data distribution"); - // TODO: inject code and configuration for automatic histogram generation --- End diff -- As the TODO comment says, here should go the sampling and distribution building code :-) > [GitHub] Enable Range Partitioner > - > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime >Reporter: GitHub Import >Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43008448 --- Diff: flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java --- @@ -1138,12 +1138,13 @@ private DistributionPattern connectJobVertices(Channel channel, int inputNumber, if (channel.getShipStrategy() == ShipStrategyType.PARTITION_RANGE) { - final DataDistribution dataDistribution = channel.getDataDistribution(); + DataDistribution dataDistribution = channel.getDataDistribution(); if (dataDistribution != null) { sourceConfig.setOutputDataDistribution(dataDistribution, outputIndex); } else { - throw new RuntimeException("Range partitioning requires data distribution"); - // TODO: inject code and configuration for automatic histogram generation --- End diff -- As the TODO comment says, here should go the sampling and distribution building code :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class
[ https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974467#comment-14974467 ] ASF GitHub Bot commented on FLINK-2889: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1283#issuecomment-151191115 Thanks for the update. Looks good to merge. > Apply JMH on LongSerializationSpeedBenchmark class > -- > > Key: FLINK-2889 > URL: https://issues.apache.org/jira/browse/FLINK-2889 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > LongSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43006432 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java --- @@ -1223,6 +1230,51 @@ public long count() throws Exception { final TypeInformation keyType = TypeExtractor.getKeySelectorTypes(keyExtractor, getType()); return new PartitionOperator(this, PartitionMethod.HASH, new Keys.SelectorFunctionKeys(clean(keyExtractor), this.getType(), keyType), Utils.getCallLocationName()); } + + /** +* Range-partitions a DataSet using the specified KeySelector. +* +* Important:This operation shuffles the whole DataSet over the network and can take significant amount of time. +* +* @param keySelector The KeySelector with which the DataSet is range-partitioned. +* @return The partitioned DataSet. +* +* @see KeySelector +*/ + public > DataSet partitionByRange(KeySelector keySelector) { + final TypeInformation keyType = TypeExtractor.getKeySelectorTypes(keySelector, getType()); + String callLocation = Utils.getCallLocationName(); + + // Extract key from input element by keySelector. + KeyExtractorMapper keyExtractorMapper = new KeyExtractorMapper (keySelector); --- End diff -- It would be good to inject the sampling and partition ID assignment code in the `JobGraphGenerator` and not at the API level. The `JobGraphGenerator` is called after the `Optimizer` and translates the optimized plan into a parallel data flow called `JobGraph` which is executed by the runtime. The benefit of injecting the code at this point is that any range partitioning can be handled transparently within the optimizer. This means also other operators except the explicit `partitionByRange()` such as Join, CoGroup, and Reduce can benefit from range partitioning. In addition this makes the injected code a part of the runtime which can be more transparently improved later on. The downside (for you) is that the job abstraction is much lower at this level. However, you have still access to the chosen key fields and type information of all operators. See the `JavaApiPostPass` class to learn how to generate serializers and comparators at this level. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43008620 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/operators/BatchTask.java --- @@ -1264,11 +1264,10 @@ public static void logAndThrowException(Exception ex, AbstractInvokable parent) oe = new OutputEmitter(strategy); } else { - final DataDistribution dataDist = config.getOutputDataDistribution(i, cl); --- End diff -- Can we keep the interface with the `DataDistribution` and simply inject a simple int distribution (1,2,3,...,n)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner
[ https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974395#comment-14974395 ] ASF GitHub Bot commented on FLINK-7: Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/1255#discussion_r43008620 --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/operators/BatchTask.java --- @@ -1264,11 +1264,10 @@ public static void logAndThrowException(Exception ex, AbstractInvokable parent) oe = new OutputEmitter(strategy); } else { - final DataDistribution dataDist = config.getOutputDataDistribution(i, cl); --- End diff -- Can we keep the interface with the `DataDistribution` and simply inject a simple int distribution (1,2,3,...,n)? > [GitHub] Enable Range Partitioner > - > > Key: FLINK-7 > URL: https://issues.apache.org/jira/browse/FLINK-7 > Project: Flink > Issue Type: Sub-task > Components: Distributed Runtime >Reporter: GitHub Import >Assignee: Chengxiang Li > Fix For: pre-apache > > > The range partitioner is currently disabled. We need to implement the > following aspects: > 1) Distribution information, if available, must be propagated back together > with the ordering property. > 2) A generic bucket lookup structure (currently specific to PactRecord). > Tests to re-enable after fixing this issue: > - TeraSortITCase > - GlobalSortingITCase > - GlobalSortingMixedOrderITCase > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/7 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: core, enhancement, optimizer, > Milestone: Release 0.4 > Assignee: [fhueske|https://github.com/fhueske] > Created at: Fri Apr 26 13:48:24 CEST 2013 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2922) Add Queryable Window Operator
[ https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974474#comment-14974474 ] Aljoscha Krettek commented on FLINK-2922: - The idea is, for example, that the regular emission of you window result gets stored as final truth in your database that serves some statistics to users. If you only have this, you always get your data with a lag of 1 hour. You could also want to allow users to query the current count inside that 1 hour window. To do that you need to have a way to match the query to the result. For that, my idea is to have (conceptually) two output streams. One for the regular window results and another one for query results. In the query result stream you basically get a tuple (query, window-result). So that the user can match elements in the query result stream to the queries that they sent. > Add Queryable Window Operator > - > > Key: FLINK-2922 > URL: https://issues.apache.org/jira/browse/FLINK-2922 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > The idea is to provide a window operator that allows to query the current > window result at any time without discarding the current result. > For example, a user might have an aggregation window operation with tumbling > windows of 1 hour. Now, at any time they might be interested in the current > aggregated value for the currently in-flight hour window. > The idea is to make the operator a two input operator where normal elements > arrive on input one while queries arrive on input two. The query stream must > be keyed by the same key as the input stream. If an input arrives for a key > the current value for that key is emitted along with the query element so > that the user can map the result to the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2922) Add Queryable Window Operator
[ https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974438#comment-14974438 ] Gyula Fora commented on FLINK-2922: --- I don't immediately see a lot of use cases for this, could you please give me some examples that are not covered by non-discarding triggers? > Add Queryable Window Operator > - > > Key: FLINK-2922 > URL: https://issues.apache.org/jira/browse/FLINK-2922 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > The idea is to provide a window operator that allows to query the current > window result at any time without discarding the current result. > For example, a user might have an aggregation window operation with tumbling > windows of 1 hour. Now, at any time they might be interested in the current > aggregated value for the currently in-flight hour window. > The idea is to make the operator a two input operator where normal elements > arrive on input one while queries arrive on input two. The query stream must > be keyed by the same key as the input stream. If an input arrives for a key > the current value for that key is emitted along with the query element so > that the user can map the result to the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...
Github user sachingoel0101 commented on a diff in the pull request: https://github.com/apache/flink/pull/1214#discussion_r43015201 --- Diff: flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java --- @@ -304,4 +368,60 @@ public String getDescription() { return "TestOptimizerPlan "; } } + + public static final class TestExecuteTwice { + + public static void main(String args[]) throws Exception { --- End diff -- Okay. Cool. Lemme know if there's any other issues to address. :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add org.apache.httpcomponents:(httpcore, httpc...
GitHub user uce opened a pull request: https://github.com/apache/flink/pull/1301 Add org.apache.httpcomponents:(httpcore, httpclient) to dependency management Hadoop 2.4.0 has `httpcomponents` dependencies, which breaks Flink on Amazon EMR AMI 3.9, because 4.1.2 is missing a method, on which `EmrFileSystem` relies. ```bash [INFO] org.apache.flink:flink-shaded-hadoop2:jar:1.0-SNAPSHOT ... [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile [INFO] | | +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile [INFO] | | +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile [INFO] | | \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:compile ``` This change moves both `httpclient` and `httpcore` to our root pom dependency management section, which makes `net.java.dev.jets3t:jets3t:jar` pull in the 4.2 version. This has been tested on the named EMR version. If there is a new RC, we should merge this. I don't think that we have to cancel an ongoing RC. We can add it to 0.10.1. Thanks for @tillrohrmann for spotting a crucial client vs. core typo, which was driving me nuts while testing the change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/uce/flink http Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1301.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1301 commit 2ae9d03b5f78135ab619e704d5fddfa30e98ef6f Author: Ufuk CelebiDate: 2015-10-26T14:34:58Z Add org.apache.httpcomponents:(httpcore, httpclient) to dependency management --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1214#issuecomment-151224527 I took a look at this again because I wanted to merge it for 0.10. However, I think it still needs a bit of work. The `ContextEnvironmentFactory` shouldn't hold the state for detached batch and stream executions. Could you store the batch plan in the `ContextEnvironment` and the stream graph in the `StreamContextEnvironment` (non-static)? Please give them appropriate names. Since the `ContextEnvironmentFactory` is currently the entry point for batch and streaming execution (the streaming environment just checks whether the `ContextEnvironment` is instantiated), you may set a flag in the factory to prevent multiple executions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode
[ https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974646#comment-14974646 ] ASF GitHub Bot commented on FLINK-2797: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1214#issuecomment-151224527 I took a look at this again because I wanted to merge it for 0.10. However, I think it still needs a bit of work. The `ContextEnvironmentFactory` shouldn't hold the state for detached batch and stream executions. Could you store the batch plan in the `ContextEnvironment` and the stream graph in the `StreamContextEnvironment` (non-static)? Please give them appropriate names. Since the `ContextEnvironmentFactory` is currently the entry point for batch and streaming execution (the streaming environment just checks whether the `ContextEnvironment` is instantiated), you may set a flag in the factory to prevent multiple executions. > CLI: Missing option to submit jobs in detached mode > --- > > Key: FLINK-2797 > URL: https://issues.apache.org/jira/browse/FLINK-2797 > Project: Flink > Issue Type: Bug > Components: Command-line client >Affects Versions: 0.9, 0.10 >Reporter: Maximilian Michels >Assignee: Sachin Goel > Fix For: 0.10 > > > Jobs can only be submitted in detached mode using YARN but not on a > standalone installation. This has been requested by users who want to submit > a job, get the job id, and later query its status. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974538#comment-14974538 ] ASF GitHub Bot commented on FLINK-2853: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1267#issuecomment-151208136 I'm going to merge this PR. > Apply JMH on MutableHashTablePerformanceBenchmark class. > > > Key: FLINK-2853 > URL: https://issues.apache.org/jira/browse/FLINK-2853 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()
[ https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974534#comment-14974534 ] ASF GitHub Bot commented on FLINK-2827: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1276#issuecomment-151207922 I'm going to merge this PR. > Potential resource leak in TwitterSource#loadAuthenticationProperties() > --- > > Key: FLINK-2827 > URL: https://issues.apache.org/jira/browse/FLINK-2827 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10 >Reporter: Ted Yu >Assignee: Saumitra Shahapure >Priority: Minor > Labels: starter > > Here is related code: > {code} > Properties properties = new Properties(); > try { > InputStream input = new FileInputStream(authPath); > properties.load(input); > input.close(); > } catch (Exception e) { > throw new RuntimeException("Cannot open .properties > file: " + authPath, e); > } > {code} > If there is exception coming out of properties.load() call, input would be > left open. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974495#comment-14974495 ] ASF GitHub Bot commented on FLINK-2919: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1300#issuecomment-151196004 Looks good to merge > Apply JMH on FieldAccessMinibenchmark class. > > > Key: FLINK-2919 > URL: https://issues.apache.org/jira/browse/FLINK-2919 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2853] [tests] Apply JMH on MutableHashT...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1267#issuecomment-151208136 I'm going to merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974533#comment-14974533 ] ASF GitHub Bot commented on FLINK-2919: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1300#issuecomment-151207830 I'm going to merge this PR. > Apply JMH on FieldAccessMinibenchmark class. > > > Key: FLINK-2919 > URL: https://issues.apache.org/jira/browse/FLINK-2919 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > FieldAccessMinibenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1276#issuecomment-151207922 I'm going to merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class
[ https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974537#comment-14974537 ] ASF GitHub Bot commented on FLINK-2889: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1283#issuecomment-151208005 I'm going to merge this PR. > Apply JMH on LongSerializationSpeedBenchmark class > -- > > Key: FLINK-2889 > URL: https://issues.apache.org/jira/browse/FLINK-2889 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > LongSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1283#issuecomment-151208005 I'm going to merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1300#issuecomment-151207830 I'm going to merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.
[ https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974539#comment-14974539 ] ASF GitHub Bot commented on FLINK-2890: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1284#issuecomment-151208210 I'm going to merge this PR. > Apply JMH on StringSerializationSpeedBenchmark class. > - > > Key: FLINK-2890 > URL: https://issues.apache.org/jira/browse/FLINK-2890 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: GaoLun >Assignee: GaoLun >Priority: Minor > Labels: easyfix > > JMH is a Java harness for building, running, and analysing > nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks > method in order to get much more accurate results.Modify the > StringSerializationSpeedBenchmark class and move it to flink-benchmark module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1300#issuecomment-151196004 Looks good to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2923) Make it possible to mix-and-match StateBackends with KvState implementations
Gyula Fora created FLINK-2923: - Summary: Make it possible to mix-and-match StateBackends with KvState implementations Key: FLINK-2923 URL: https://issues.apache.org/jira/browse/FLINK-2923 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gyula Fora Assignee: Gyula Fora The KvState implementations are currently very closely tied to the specific StateBackend implementations and one has to reimplement the Backend to change the KvState. For many applications it is probably necessary to store the non-partitioned states differently from the key-value states. An example would be a sql KvState with a simple file/memory backend for non-partitioned states (like bloomfilters, and offsets). A wrapper object should be provided to allow the use of a backend with a custom KvState. My proposal: Create a KvStateProvider class which will have methods for initializing, creating and closing KvStates (independent from the backend) Create a wrapper StateBackend that wraps a StateBackend and a KvStateprovider into a new StateBackend. This could be a static method of the StateBackend class: `public static Statebackend StateBackend.createWithCustomKvState(StateBackend, KvStateProvider)` -- This message was sent by Atlassian JIRA (v6.3.4#6332)