[jira] [Updated] (FLINK-2435) Add support for custom CSV field parsers

2015-10-26 Thread Maximilian Michels (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated FLINK-2435:
--
Fix Version/s: 1.0

> Add support for custom CSV field parsers
> 
>
> Key: FLINK-2435
> URL: https://issues.apache.org/jira/browse/FLINK-2435
> Project: Flink
>  Issue Type: New Feature
>  Components: Java API, Scala API
>Affects Versions: 0.10
>Reporter: Fabian Hueske
> Fix For: 1.0
>
>
> The {{CSVInputFormats}} have only {{FieldParsers}} for Java's primitive types 
> (byte, short, int, long, float, double, boolean, String).
> It would be good to add support for CSV field parsers for custom data types 
> which can be registered in a {{CSVReader}}. 
> We could offer two interfaces for field parsers.
> 1. The regular low-level {{FieldParser}} which operates on a byte array and 
> offsets.
> 2. A {{StringFieldParser}} which operates on a String that has been extracted 
> by a {{StringParser}} before. This interface will be easier to implement but 
> less efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...

2015-10-26 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1276#issuecomment-151091135
  
Looks good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973982#comment-14973982
 ] 

ASF GitHub Bot commented on FLINK-2827:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1276#issuecomment-151091135
  
Looks good!


> Potential resource leak in TwitterSource#loadAuthenticationProperties()
> ---
>
> Key: FLINK-2827
> URL: https://issues.apache.org/jira/browse/FLINK-2827
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.10
>Reporter: Ted Yu
>Assignee: Saumitra Shahapure
>Priority: Minor
>  Labels: starter
>
> Here is related code:
> {code}
> Properties properties = new Properties();
> try {
> InputStream input = new FileInputStream(authPath);
> properties.load(input);
> input.close();
> } catch (Exception e) {
> throw new RuntimeException("Cannot open .properties 
> file: " + authPath, e);
> }
> {code}
> If there is exception coming out of properties.load() call, input would be 
> left open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2411) Add basic graph summarization algorithm

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974011#comment-14974011
 ] 

ASF GitHub Bot commented on FLINK-2411:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1269


> Add basic graph summarization algorithm
> ---
>
> Key: FLINK-2411
> URL: https://issues.apache.org/jira/browse/FLINK-2411
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Graph summarization determines a structural grouping of similar vertices and 
> edges to condense a graph and thus helps to uncover insights about patterns 
> hidden in the graph. It can be used in OLAP-style operations on the graph and 
> is similar to group by in SQL but on the graph structure instead of rows.
>  
> The graph summarization operator represents every vertex group by a single 
> vertex in the summarized graph; edges between vertices in the summary graph 
> represent a group of edges between the vertex group members of the original 
> graph. Summarization is defined by specifying grouping keys for vertices and 
> edges, respectively.
> One publication that presents a Map/Reduce based approach is "Pagrol: 
> Parallel graph olap over large-scale attributed graphs", however they 
> pre-compute the graph-cube before it can be analyzed. With Flink, we can give 
> the user an interactive way of summarizing the graph and do not need to 
> compute the  cube beforehand.
> A more complex approach focuses on summarization on graph patterns  
> "SynopSys: Large Graph Analytics in the SAP HANA Database Through 
> Summarization".
> However, I want to start with a simple algorithm that summarizes the graph on 
> vertex and optionally edge values and additionally stores a count aggregate 
> at summarized vertices/edges.
> Consider the following two examples (e.g., social network with users from 
> cities and friendships with timestamp):
>  
> h4. Input graph:
>  
> Vertices (id, value):
> (0, Leipzig)
> (1, Leipzig)
> (2, Dresden)
> (3, Dresden)
> (4, Dresden)
> (5, Berlin)
> Edges (source, target, value):
> (0, 1, 2014)
> (1, 0, 2014)
> (1, 2, 2013)
> (2, 1, 2013)
> (2, 3, 2014)
> (3, 2, 2014)
> (4, 0, 2013)
> (4, 1, 2015)
> (5, 2, 2015)
> (5, 3, 2015)
> h4. Output graph (summarized on vertex value):
> Vertices (id, value, count)
> (0, Leipzig, 2) // "2 users from Leipzig"
> (2, Dresden, 3) // "3 users from Dresden"
> (5, Berlin, 1) // "1 user from Berlin"
> Edges (source, target, count) 
> (0, 0, 2) // "2 edges between users in Leipzig"
> (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden"
> (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig"
> (2, 2, 2) // "2 edges between users in Dresden"
> (5, 2, 2) // "2 edges from users in Berlin to users in Dresden"
> h4. Output graph (summarized on vertex and edge value):
> Vertices (id, value, count)
> (0, Leipzig, 2)
> (2, Dresden, 3)
> (5, Berlin, 1)
> Edges (source, target, value, count) 
> (0, 0, 2014, 2) // ...
> (0, 2, 2013, 1) // ...
> (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with 
> timestamp 2013"
> (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with 
> timestamp 2015"
> (2, 2, 2014, 2) // ...
> (5, 2, 2015, 2) // ...
> I've already implemented two versions of the summarization algorithm in our 
> own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a 
> graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. 
> You can see the current WIP here: 
> 1 [Abstract 
> summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java]
> 2 [Implementation using 
> cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java]
> 3 [Implementation using 
> joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java]
> 4 
> [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java]
> 5 
> [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf]
> I would basically use the same implementation as in 3 in combination with 
> KeySelectors to select the grouping keys on vertices and edges.
> As you can see in the example, each vertex in the resulting graph has a 
> vertex id that is contained in the original graph. This id is the smallest id 
> among the grouped vertices (e.g., vertices 2, 3 and 4 

[jira] [Created] (FLINK-2921) Add online documentation of sample methods

2015-10-26 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2921:


 Summary: Add online documentation of sample methods
 Key: FLINK-2921
 URL: https://issues.apache.org/jira/browse/FLINK-2921
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.10
Reporter: Till Rohrmann
Priority: Minor


I couldn't find online documentation about Flink's sampling API (as part of the 
{{DataSetUtils}}/{{utils}} package object). We should add information for these 
methods to our online documentation so that people can more easily use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2905) Add intersect method to Graph class

2015-10-26 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973972#comment-14973972
 ] 

Vasia Kalavri commented on FLINK-2905:
--

Hi [~mju],

several {{Graph}} methods use the composite {{sourceID - targetID}} key as a 
unique identifier. I see the problem with that in your example. If G1 has the 
edge 1,3,13 and G2 has the edge 1,3,14, these will match, but what should the 
value be in the intersection?
>From the options you present, Option 3 seems the most intuitive to me. 
>Alternatively, we could maybe provide an option to the method, defining 
>whether the edge value should be used for matching?


> Add intersect method to Graph class
> ---
>
> Key: FLINK-2905
> URL: https://issues.apache.org/jira/browse/FLINK-2905
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Currently, the Gelly Graph supports the set operations 
> {{Graph.union(otherGraph)}} and {{Graph.difference(otherGraph)}}. It would be 
> nice to have a {{Graph.intersect(otherGraph)}} method, where the resulting 
> graph contains all vertices and edges contained in both input graphs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2591) Add configuration parameter for default number of yarn containers

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973986#comment-14973986
 ] 

ASF GitHub Bot commented on FLINK-2591:
---

Github user willmiao commented on the pull request:

https://github.com/apache/flink/pull/1121#issuecomment-151092573
  
hi @rmetzger ,
I believe I find the reason why my test failed.
A test will fail if we don’t specify “-t” argument(_which is an optional 
argument_) in the command when call runWithArgs function, and we will get 
exception here:
```
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/log4j/Level
at org.apache.hadoop.mapred.JobConf.(JobConf.java:357)
```
It seems some jar files relevant to log callback are needed.

One more thing, I noticed a segment of code as follows in the file 
“FlinkYarnSessionCli.java”:
```
File logback = new File(confDirPath + File.pathSeparator + 
CONFIG_FILE_LOGBACK_NAME);
if (logback.exists()) {
shipFiles.add(logback);
flinkYarnClient.setFlinkLoggingConfigurationPath(new 
Path(logback.toURI()));
}
```
And I wonder if “**File.pathSeparator**” should be “**File.separator**” 
here.


> Add configuration parameter for default number of yarn containers
> -
>
> Key: FLINK-2591
> URL: https://issues.apache.org/jira/browse/FLINK-2591
> Project: Flink
>  Issue Type: Improvement
>  Components: YARN Client
>Reporter: Robert Metzger
>Assignee: Will Miao
>Priority: Minor
>  Labels: starter
>
> A user complained about the requirement to always specify the number of yarn 
> containers (-n) when starting a job.
> Adding a configuration value with a default value will allow users to set a 
> default ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2591] Add configuration parameter for d...

2015-10-26 Thread willmiao
Github user willmiao commented on the pull request:

https://github.com/apache/flink/pull/1121#issuecomment-151092573
  
hi @rmetzger ,
I believe I find the reason why my test failed.
A test will fail if we don’t specify “-t” argument(_which is an 
optional argument_) in the command when call runWithArgs function, and we will 
get exception here:
```
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/log4j/Level
at org.apache.hadoop.mapred.JobConf.(JobConf.java:357)
```
It seems some jar files relevant to log callback are needed.

One more thing, I noticed a segment of code as follows in the file 
“FlinkYarnSessionCli.java”:
```
File logback = new File(confDirPath + File.pathSeparator + 
CONFIG_FILE_LOGBACK_NAME);
if (logback.exists()) {
shipFiles.add(logback);
flinkYarnClient.setFlinkLoggingConfigurationPath(new 
Path(logback.toURI()));
}
```
And I wonder if “**File.pathSeparator**” should be 
“**File.separator**” here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-2411) Add basic graph summarization algorithm

2015-10-26 Thread Vasia Kalavri (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasia Kalavri resolved FLINK-2411.
--
   Resolution: Implemented
Fix Version/s: 1.0

> Add basic graph summarization algorithm
> ---
>
> Key: FLINK-2411
> URL: https://issues.apache.org/jira/browse/FLINK-2411
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
> Fix For: 1.0
>
>
> Graph summarization determines a structural grouping of similar vertices and 
> edges to condense a graph and thus helps to uncover insights about patterns 
> hidden in the graph. It can be used in OLAP-style operations on the graph and 
> is similar to group by in SQL but on the graph structure instead of rows.
>  
> The graph summarization operator represents every vertex group by a single 
> vertex in the summarized graph; edges between vertices in the summary graph 
> represent a group of edges between the vertex group members of the original 
> graph. Summarization is defined by specifying grouping keys for vertices and 
> edges, respectively.
> One publication that presents a Map/Reduce based approach is "Pagrol: 
> Parallel graph olap over large-scale attributed graphs", however they 
> pre-compute the graph-cube before it can be analyzed. With Flink, we can give 
> the user an interactive way of summarizing the graph and do not need to 
> compute the  cube beforehand.
> A more complex approach focuses on summarization on graph patterns  
> "SynopSys: Large Graph Analytics in the SAP HANA Database Through 
> Summarization".
> However, I want to start with a simple algorithm that summarizes the graph on 
> vertex and optionally edge values and additionally stores a count aggregate 
> at summarized vertices/edges.
> Consider the following two examples (e.g., social network with users from 
> cities and friendships with timestamp):
>  
> h4. Input graph:
>  
> Vertices (id, value):
> (0, Leipzig)
> (1, Leipzig)
> (2, Dresden)
> (3, Dresden)
> (4, Dresden)
> (5, Berlin)
> Edges (source, target, value):
> (0, 1, 2014)
> (1, 0, 2014)
> (1, 2, 2013)
> (2, 1, 2013)
> (2, 3, 2014)
> (3, 2, 2014)
> (4, 0, 2013)
> (4, 1, 2015)
> (5, 2, 2015)
> (5, 3, 2015)
> h4. Output graph (summarized on vertex value):
> Vertices (id, value, count)
> (0, Leipzig, 2) // "2 users from Leipzig"
> (2, Dresden, 3) // "3 users from Dresden"
> (5, Berlin, 1) // "1 user from Berlin"
> Edges (source, target, count) 
> (0, 0, 2) // "2 edges between users in Leipzig"
> (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden"
> (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig"
> (2, 2, 2) // "2 edges between users in Dresden"
> (5, 2, 2) // "2 edges from users in Berlin to users in Dresden"
> h4. Output graph (summarized on vertex and edge value):
> Vertices (id, value, count)
> (0, Leipzig, 2)
> (2, Dresden, 3)
> (5, Berlin, 1)
> Edges (source, target, value, count) 
> (0, 0, 2014, 2) // ...
> (0, 2, 2013, 1) // ...
> (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with 
> timestamp 2013"
> (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with 
> timestamp 2015"
> (2, 2, 2014, 2) // ...
> (5, 2, 2015, 2) // ...
> I've already implemented two versions of the summarization algorithm in our 
> own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a 
> graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. 
> You can see the current WIP here: 
> 1 [Abstract 
> summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java]
> 2 [Implementation using 
> cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java]
> 3 [Implementation using 
> joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java]
> 4 
> [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java]
> 5 
> [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf]
> I would basically use the same implementation as in 3 in combination with 
> KeySelectors to select the grouping keys on vertices and edges.
> As you can see in the example, each vertex in the resulting graph has a 
> vertex id that is contained in the original graph. This id is the smallest id 
> among the grouped vertices (e.g., vertices 2, 3 and 4 represent Dresden, so 2 
> is the group 

[jira] [Commented] (FLINK-2411) Add basic graph summarization algorithm

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973926#comment-14973926
 ] 

ASF GitHub Bot commented on FLINK-2411:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1269#issuecomment-151077098
  
I'll merge this.


> Add basic graph summarization algorithm
> ---
>
> Key: FLINK-2411
> URL: https://issues.apache.org/jira/browse/FLINK-2411
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 0.10
>Reporter: Martin Junghanns
>Assignee: Martin Junghanns
>Priority: Minor
>
> Graph summarization determines a structural grouping of similar vertices and 
> edges to condense a graph and thus helps to uncover insights about patterns 
> hidden in the graph. It can be used in OLAP-style operations on the graph and 
> is similar to group by in SQL but on the graph structure instead of rows.
>  
> The graph summarization operator represents every vertex group by a single 
> vertex in the summarized graph; edges between vertices in the summary graph 
> represent a group of edges between the vertex group members of the original 
> graph. Summarization is defined by specifying grouping keys for vertices and 
> edges, respectively.
> One publication that presents a Map/Reduce based approach is "Pagrol: 
> Parallel graph olap over large-scale attributed graphs", however they 
> pre-compute the graph-cube before it can be analyzed. With Flink, we can give 
> the user an interactive way of summarizing the graph and do not need to 
> compute the  cube beforehand.
> A more complex approach focuses on summarization on graph patterns  
> "SynopSys: Large Graph Analytics in the SAP HANA Database Through 
> Summarization".
> However, I want to start with a simple algorithm that summarizes the graph on 
> vertex and optionally edge values and additionally stores a count aggregate 
> at summarized vertices/edges.
> Consider the following two examples (e.g., social network with users from 
> cities and friendships with timestamp):
>  
> h4. Input graph:
>  
> Vertices (id, value):
> (0, Leipzig)
> (1, Leipzig)
> (2, Dresden)
> (3, Dresden)
> (4, Dresden)
> (5, Berlin)
> Edges (source, target, value):
> (0, 1, 2014)
> (1, 0, 2014)
> (1, 2, 2013)
> (2, 1, 2013)
> (2, 3, 2014)
> (3, 2, 2014)
> (4, 0, 2013)
> (4, 1, 2015)
> (5, 2, 2015)
> (5, 3, 2015)
> h4. Output graph (summarized on vertex value):
> Vertices (id, value, count)
> (0, Leipzig, 2) // "2 users from Leipzig"
> (2, Dresden, 3) // "3 users from Dresden"
> (5, Berlin, 1) // "1 user from Berlin"
> Edges (source, target, count) 
> (0, 0, 2) // "2 edges between users in Leipzig"
> (0, 2, 1) // "1 edge from users in Leipzig to users in Dresden"
> (2, 0, 3) // "3 edges from users in Dresden to users in Leipzig"
> (2, 2, 2) // "2 edges between users in Dresden"
> (5, 2, 2) // "2 edges from users in Berlin to users in Dresden"
> h4. Output graph (summarized on vertex and edge value):
> Vertices (id, value, count)
> (0, Leipzig, 2)
> (2, Dresden, 3)
> (5, Berlin, 1)
> Edges (source, target, value, count) 
> (0, 0, 2014, 2) // ...
> (0, 2, 2013, 1) // ...
> (2, 0, 2013, 2) // "2 edges from users in Dresden to users in Leipzig with 
> timestamp 2013"
> (2, 0, 2015, 1) // "1 edge from users in Dresden to users in Leipzig with 
> timestamp 2015"
> (2, 2, 2014, 2) // ...
> (5, 2, 2015, 2) // ...
> I've already implemented two versions of the summarization algorithm in our 
> own project [Gradoop|https://github.com/dbs-leipzig/gradoop], which is a 
> graph analytics stack on top of Hadoop + Gelly/Flink with a fixed data model. 
> You can see the current WIP here: 
> 1 [Abstract 
> summarization|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/Summarization.java]
> 2 [Implementation using 
> cross|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationCross.java]
> 3 [Implementation using 
> joins|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/main/java/org/gradoop/model/impl/operators/SummarizationJoin.java]
> 4 
> [Tests|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/gradoop-flink/src/test/java/org/gradoop/model/impl/EPGraphSummarizeTest.java]
> 5 
> [TestGraph|https://github.com/dbs-leipzig/gradoop/blob/%2345_gradoop_flink/dev-support/social-network.pdf]
> I would basically use the same implementation as in 3 in combination with 
> KeySelectors to select the grouping keys on vertices and edges.
> As you can see in the example, each vertex in the resulting graph has a 
> vertex id that is contained in the original graph. This id is the smallest id 
> 

[GitHub] flink pull request: [FLINK-2411] [gelly] Add Summarization Algorit...

2015-10-26 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1269#issuecomment-151077098
  
I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2015-10-26 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973937#comment-14973937
 ] 

Vasia Kalavri commented on FLINK-2909:
--

Hi [~greghogan],
thank you for opening this issue :)
Can you give us some idea on what are your implementation plans for this? Are 
you planning to implement all the generators you mention in the description as 
separate algorithms or are you considering making this a Gelly utility where we 
can add generators incrementally?
Thank you!

> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-26 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/1214#discussion_r42979520
  
--- Diff: 
flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java ---
@@ -304,4 +368,60 @@ public String getDescription() {
return "TestOptimizerPlan  
";
}
}
+
+   public static final class TestExecuteTwice {
+
+   public static void main(String args[]) throws Exception {
--- End diff --

Ok that's better then! Thanks



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2411] [gelly] Add Summarization Algorit...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1269


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973843#comment-14973843
 ] 

ASF GitHub Bot commented on FLINK-2919:
---

GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1300

[FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class.

JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
`FieldAccessMinibenchmark` class and move it to `flink-benchmark` module.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink fieldAccessMiniBenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1300


commit 3b9a8db1abaf956a04416974cdb2f41fd6fc4300
Author: gallenvara 
Date:   2015-10-26T07:15:14Z

[FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class.




> Apply JMH on FieldAccessMinibenchmark class.
> 
>
> Key: FLINK-2919
> URL: https://issues.apache.org/jira/browse/FLINK-2919
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...

2015-10-26 Thread gallenvara
GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1300

[FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class.

JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
`FieldAccessMinibenchmark` class and move it to `flink-benchmark` module.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink fieldAccessMiniBenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1300


commit 3b9a8db1abaf956a04416974cdb2f41fd6fc4300
Author: gallenvara 
Date:   2015-10-26T07:15:14Z

[FLINK-2919] [tests] Apply JMH on FieldAccessMinibenchmark class.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-2870) Add support for accumulating/discarding for Event-Time Windows

2015-10-26 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-2870:

Fix Version/s: (was: 0.10)

> Add support for accumulating/discarding for Event-Time Windows
> --
>
> Key: FLINK-2870
> URL: https://issues.apache.org/jira/browse/FLINK-2870
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This would allow to specify whether windows should be discarded after the 
> trigger fires or kept in the operator if late elements arrive.
> When keeping elements, the user would also have to specify an allowed 
> lateness time after which the window contents are discarded without emitting 
> any further window evaluation result.
> If elements arrive after the allowed lateness they would trigger the window 
> immediately with only the one single element.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2891) Key for Keyed State is not set upon Window Evaluation

2015-10-26 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-2891.
---
Resolution: Fixed

Fixed for both types of window operations now.

> Key for Keyed State is not set upon Window Evaluation
> -
>
> Key: FLINK-2891
> URL: https://issues.apache.org/jira/browse/FLINK-2891
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Affects Versions: 0.10
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Blocker
> Fix For: 0.10
>
>
> In both the aligned and the general-purpose windows the key for the keyed 
> operator state is not set when evaluating the windows. This silently leads to 
> incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975163#comment-14975163
 ] 

ASF GitHub Bot commented on FLINK-2919:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1300


> Apply JMH on FieldAccessMinibenchmark class.
> 
>
> Key: FLINK-2919
> URL: https://issues.apache.org/jira/browse/FLINK-2919
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975164#comment-14975164
 ] 

ASF GitHub Bot commented on FLINK-2827:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1276


> Potential resource leak in TwitterSource#loadAuthenticationProperties()
> ---
>
> Key: FLINK-2827
> URL: https://issues.apache.org/jira/browse/FLINK-2827
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.10
>Reporter: Ted Yu
>Assignee: Saumitra Shahapure
>Priority: Minor
>  Labels: starter
>
> Here is related code:
> {code}
> Properties properties = new Properties();
> try {
> InputStream input = new FileInputStream(authPath);
> properties.load(input);
> input.close();
> } catch (Exception e) {
> throw new RuntimeException("Cannot open .properties 
> file: " + authPath, e);
> }
> {code}
> If there is exception coming out of properties.load() call, input would be 
> left open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975161#comment-14975161
 ] 

ASF GitHub Bot commented on FLINK-2889:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1283


> Apply JMH on LongSerializationSpeedBenchmark class
> --
>
> Key: FLINK-2889
> URL: https://issues.apache.org/jira/browse/FLINK-2889
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975160#comment-14975160
 ] 

ASF GitHub Bot commented on FLINK-2890:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1284


> Apply JMH on StringSerializationSpeedBenchmark class.
> -
>
> Key: FLINK-2890
> URL: https://issues.apache.org/jira/browse/FLINK-2890
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> StringSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1611) [REFACTOR] Rename classes and packages in test that contains Nephele

2015-10-26 Thread Henry Saputra (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Saputra closed FLINK-1611.

   Resolution: Fixed
Fix Version/s: 0.10

This seemed to be fixed.

> [REFACTOR] Rename classes and packages in test that contains Nephele
> 
>
> Key: FLINK-1611
> URL: https://issues.apache.org/jira/browse/FLINK-1611
> Project: Flink
>  Issue Type: Improvement
>  Components: other
>Reporter: Henry Saputra
>Assignee: Henry Saputra
>Priority: Minor
> Fix For: 0.10
>
>
> We have several classes and packages names that have Nephele names:
> ./flink-tests/src/test/java/org/apache/flink/test/broadcastvars/BroadcastVarsNepheleITCase.java
> ./flink-tests/src/test/java/org/apache/flink/test/broadcastvars/KMeansIterativeNepheleITCase.java
> ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/ConnectedComponentsNepheleITCase.java
> ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/DanglingPageRankNepheleITCase.java
> ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/DanglingPageRankWithCombinerNepheleITCase.java
> ./flink-tests/src/test/java/org/apache/flink/test/iterative/nephele/IterationWithChainingNepheleITCase.java
> Nephele was the older name used by Flink in early years to describe the Flink 
> processing engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2909) Gelly Graph Generators

2015-10-26 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974985#comment-14974985
 ] 

Greg Hogan commented on FLINK-2909:
---

My thought was to initially implement one or two of each to prove the API. Each 
generator will have its own class.

> Gelly Graph Generators
> --
>
> Key: FLINK-2909
> URL: https://issues.apache.org/jira/browse/FLINK-2909
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>
> Include a selection of graph generators in Gelly. Generated graphs will be 
> useful for performing scalability, stress, and regression testing as well as 
> benchmarking and comparing algorithms, for both Flink users and developers. 
> Generated data is infinitely scalable yet described by a few simple 
> parameters and can often substitute for user data or sharing large files when 
> reporting issues.
> There are at multiple categories of graphs as documented by 
> [NetworkX|https://networkx.github.io/documentation/latest/reference/generators.html]
>  and elsewhere.
> Graphs may be a well-defined, i.e. the [Chvátal 
> graph|https://en.wikipedia.org/wiki/Chv%C3%A1tal_graph]. These may be 
> sufficiently small to populate locally.
> Graphs may be scalable, i.e. complete and star graphs. These should use 
> Flink's distributed parallelism.
> Graphs may be stochastic, i.e. [RMat 
> graphs|http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf] 
> . A key consideration is that the graphs should source randomness from a 
> seedable PRNG and generate the same Graph regardless of parallelism.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2922) Add Queryable Window Operator

2015-10-26 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974914#comment-14974914
 ] 

Aljoscha Krettek commented on FLINK-2922:
-

I don't exactly know what you mean by that but this would still require a 
special kind of operator to support that, that's where the queryable window 
operator comes in.

This is a mockup of what it would look like in practice:
{code}
DataStream text = env.socketTextStream("localhost", );
DataStream query = env.socketTextStream("localhost", 9998);

WindowStreamOperator, Tuple2> 
winStream = text
.flatMap(new WordCount.Tokenizer())
.keyBy(0)
.countWindow(10)
.query(query.keyBy(new IdentityKey()))
.apply(new WindowFunction, Tuple2, Tuple, GlobalWindow>() {
private static final long serialVersionUID = 1L;

@Override
public void apply(Tuple tuple,
GlobalWindow window,
Iterable> values,
Collector> out) throws Exception {
int sum = 0;
for (Tuple2 val : values) {
sum += val.f1;
}
out.collect(Tuple2.of((String) tuple.getField(0), sum));
}
});

winStream.print();

// WindowResult 
// QT = query type
// T = window result type
DataStream>> queryResults = 
winStream.getQueryResultStream();
querResults.print();
{code}

> Add Queryable Window Operator
> -
>
> Key: FLINK-2922
> URL: https://issues.apache.org/jira/browse/FLINK-2922
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The idea is to provide a window operator that allows to query the current 
> window result at any time without discarding the current result.
> For example, a user might have an aggregation window operation with tumbling 
> windows of 1 hour. Now, at any time they might be interested in the current 
> aggregated value for the currently in-flight hour window.
> The idea is to make the operator a two input operator where normal elements 
> arrive on input one while queries arrive on input two. The query stream must 
> be keyed by the same key as the input stream. If an input arrives for a key 
> the current value for that key is emitted along with the query element so 
> that the user can map the result to the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.

2015-10-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-2890.

   Resolution: Fixed
Fix Version/s: 1.0

Fixed with 75a5257412606ac70113850439457cce7da3b2e6

> Apply JMH on StringSerializationSpeedBenchmark class.
> -
>
> Key: FLINK-2890
> URL: https://issues.apache.org/jira/browse/FLINK-2890
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.0
>
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> StringSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-2919.

   Resolution: Fixed
Fix Version/s: 1.0

Fixed with 7265d81ff95aff4ddfbcbd4ef25869ea8f159769

> Apply JMH on FieldAccessMinibenchmark class.
> 
>
> Key: FLINK-2919
> URL: https://issues.apache.org/jira/browse/FLINK-2919
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.0
>
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()

2015-10-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-2827.

   Resolution: Fixed
Fix Version/s: 1.0

Fixed with 6a8e90b3621bb96304b325a4ae8f7f5575ec909a

> Potential resource leak in TwitterSource#loadAuthenticationProperties()
> ---
>
> Key: FLINK-2827
> URL: https://issues.apache.org/jira/browse/FLINK-2827
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.10
>Reporter: Ted Yu
>Assignee: Saumitra Shahapure
>Priority: Minor
>  Labels: starter
> Fix For: 1.0
>
>
> Here is related code:
> {code}
> Properties properties = new Properties();
> try {
> InputStream input = new FileInputStream(authPath);
> properties.load(input);
> input.close();
> } catch (Exception e) {
> throw new RuntimeException("Cannot open .properties 
> file: " + authPath, e);
> }
> {code}
> If there is exception coming out of properties.load() call, input would be 
> left open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-26 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske closed FLINK-2889.

   Resolution: Fixed
Fix Version/s: 1.0

Fixed with 75a5257412606ac70113850439457cce7da3b2e6

> Apply JMH on LongSerializationSpeedBenchmark class
> --
>
> Key: FLINK-2889
> URL: https://issues.apache.org/jira/browse/FLINK-2889
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
> Fix For: 1.0
>
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1276


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975162#comment-14975162
 ] 

ASF GitHub Bot commented on FLINK-2853:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1267


> Apply JMH on MutableHashTablePerformanceBenchmark class.
> 
>
> Key: FLINK-2853
> URL: https://issues.apache.org/jira/browse/FLINK-2853
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1300


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2853] [tests] Apply JMH on MutableHashT...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1267


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2890] [tests] Apply JMH on StringSerial...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1284


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...

2015-10-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1283


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2015-10-26 Thread Flavio Pompermaier (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975242#comment-14975242
 ] 

Flavio Pompermaier commented on FLINK-1827:
---

A lot of improvements regarding Flink tests are going on in these last 
days...any effort in solving this easy issue? Do you want me to open a PR for 
this?

> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2920] [tests] Apply JMH on KryoVersusAv...

2015-10-26 Thread gallenvara
GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1302

[FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class.

JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
`KryoVersusAvroMinibenchmark` class and move it to `flink-benchmark` module.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink KryoVersusAvroMinibenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1302


commit 6c64e262a3a55471640368a1be12feff229a08cd
Author: gallenvara 
Date:   2015-10-27T03:27:03Z

[FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2920) Apply JMH on KryoVersusAvroMinibenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975656#comment-14975656
 ] 

ASF GitHub Bot commented on FLINK-2920:
---

GitHub user gallenvara opened a pull request:

https://github.com/apache/flink/pull/1302

[FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class.

JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
`KryoVersusAvroMinibenchmark` class and move it to `flink-benchmark` module.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gallenvara/flink KryoVersusAvroMinibenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1302


commit 6c64e262a3a55471640368a1be12feff229a08cd
Author: gallenvara 
Date:   2015-10-27T03:27:03Z

[FLINK-2920] [tests] Apply JMH on KryoVersusAvroMinibenchmark class.




> Apply JMH on KryoVersusAvroMinibenchmark class.
> ---
>
> Key: FLINK-2920
> URL: https://issues.apache.org/jira/browse/FLINK-2920
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> KryoVersusAvroMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2925) Client shows incomplete cause of Exception

2015-10-26 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2925:
---
Summary: Client shows incomplete cause of Exception  (was: Client does not 
show root cause of Exception)

> Client shows incomplete cause of Exception
> --
>
> Key: FLINK-2925
> URL: https://issues.apache.org/jira/browse/FLINK-2925
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Affects Versions: 0.10
>Reporter: Ufuk Celebi
>
> I get the following Exception at the client:
> {code}
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Job execution failed
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:370)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:348)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:315)
>   at 
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
>   at 
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:497)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:395)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:252)
>   at 
> org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:670)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:325)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:971)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1021)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job 
> execution failed
>   at 
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:177)
>   at org.apache.flink.client.program.Client.runBlocking(Client.java:368)
>   ... 15 more
> Caused by: java.lang.Exception: Serialized representation of 
> org.apache.flink.runtime.client.JobExecutionException: Failed to submit job 
> 57d1efe70571500f851ed5ff24bf401f (WordCount Example)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:944)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:341)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:152)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>   at 
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   

[jira] [Created] (FLINK-2924) Create database state backend

2015-10-26 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2924:
-

 Summary: Create database state backend
 Key: FLINK-2924
 URL: https://issues.apache.org/jira/browse/FLINK-2924
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora
Assignee: Gyula Fora


The goal is to create a database state backend that can be used with JDBC 
supporting databases.

The backend should support the storage of non-partitioned states, and also the 
storage of Key-value states with high throughput. As databases provide advanced 
querying functionality the key-value state can be implemented to be lazily 
fetched and should scale to "arbitrary" state sizes by not storing the 
non-active key-values on heap.

An adapter class will be provided that can help bridge the gap between 
different sql implementations.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2922) Add Queryable Window Operator

2015-10-26 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974818#comment-14974818
 ] 

Gyula Fora commented on FLINK-2922:
---

I think this could be implemented nicely with the same logic I did for the 
StreamKV (https://github.com/gyfora/StreamKV)

Where you create an abstract queryable object (the window). On which you apply 
queries that return QueryResults from which you can get the streams. This would 
abstract away the tuple logic from the user who can simple work on the streams 
then.

> Add Queryable Window Operator
> -
>
> Key: FLINK-2922
> URL: https://issues.apache.org/jira/browse/FLINK-2922
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The idea is to provide a window operator that allows to query the current 
> window result at any time without discarding the current result.
> For example, a user might have an aggregation window operation with tumbling 
> windows of 1 hour. Now, at any time they might be interested in the current 
> aggregated value for the currently in-flight hour window.
> The idea is to make the operator a two input operator where normal elements 
> arrive on input one while queries arrive on input two. The query stream must 
> be keyed by the same key as the input stream. If an input arrives for a key 
> the current value for that key is emitted along with the query element so 
> that the user can map the result to the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-10-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43004130
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.java.functions;
+
+import org.apache.flink.api.common.distributions.DataDistribution;
+import org.apache.flink.api.common.functions.RichMapPartitionFunction;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.Collector;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This mapPartition function require a DataSet with DataDistribution as 
broadcast input, it read
+ * target parallelism from parameter, build partition boundaries with 
input DataDistribution, then
+ * compute the range index for each record.
+ *
+ * @param  The original data type.
+ * @param  The key type.
+ */
+public class AssignRangeIndex
+   extends RichMapPartitionFunction, Tuple2> {
+
+   private List partitionBoundaries;
+   private int numberChannels;
+
+   @Override
+   public void open(Configuration parameters) throws Exception {
+   this.numberChannels = 
parameters.getInteger("TargetParallelism", 1);
+   }
+
+   @Override
+   public void mapPartition(Iterable> values, 
Collector> out) throws Exception {
+
+   List broadcastVariable = 
getRuntimeContext().getBroadcastVariable("DataDistribution");
+   if (broadcastVariable == null || broadcastVariable.size() != 1) 
{
--- End diff --

You can move the broadcast variable initialization into the open method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974343#comment-14974343
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43004130
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.java.functions;
+
+import org.apache.flink.api.common.distributions.DataDistribution;
+import org.apache.flink.api.common.functions.RichMapPartitionFunction;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.Collector;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This mapPartition function require a DataSet with DataDistribution as 
broadcast input, it read
+ * target parallelism from parameter, build partition boundaries with 
input DataDistribution, then
+ * compute the range index for each record.
+ *
+ * @param  The original data type.
+ * @param  The key type.
+ */
+public class AssignRangeIndex
+   extends RichMapPartitionFunction, Tuple2> {
+
+   private List partitionBoundaries;
+   private int numberChannels;
+
+   @Override
+   public void open(Configuration parameters) throws Exception {
+   this.numberChannels = 
parameters.getInteger("TargetParallelism", 1);
+   }
+
+   @Override
+   public void mapPartition(Iterable> values, 
Collector> out) throws Exception {
+
+   List broadcastVariable = 
getRuntimeContext().getBroadcastVariable("DataDistribution");
+   if (broadcastVariable == null || broadcastVariable.size() != 1) 
{
--- End diff --

You can move the broadcast variable initialization into the open method.


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2922) Add Queryable Window Operator

2015-10-26 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2922:
---

 Summary: Add Queryable Window Operator
 Key: FLINK-2922
 URL: https://issues.apache.org/jira/browse/FLINK-2922
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek


The idea is to provide a window operator that allows to query the current 
window result at any time without discarding the current result.

For example, a user might have an aggregation window operation with tumbling 
windows of 1 hour. Now, at any time they might be interested in the current 
aggregated value for the currently in-flight hour window.

The idea is to make the operator a two input operator where normal elements 
arrive on input one while queries arrive on input two. The query stream must be 
keyed by the same key as the input stream. If an input arrives for a key the 
current value for that key is emitted along with the query element so that the 
user can map the result to the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974346#comment-14974346
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43004309
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/functions/AssignRangeIndex.java
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.flink.api.java.functions;
+
+import org.apache.flink.api.common.distributions.DataDistribution;
+import org.apache.flink.api.common.functions.RichMapPartitionFunction;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.util.Collector;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * This mapPartition function require a DataSet with DataDistribution as 
broadcast input, it read
+ * target parallelism from parameter, build partition boundaries with 
input DataDistribution, then
+ * compute the range index for each record.
+ *
+ * @param  The original data type.
+ * @param  The key type.
+ */
+public class AssignRangeIndex
+   extends RichMapPartitionFunction, Tuple2> {
+
+   private List partitionBoundaries;
+   private int numberChannels;
+
+   @Override
+   public void open(Configuration parameters) throws Exception {
+   this.numberChannels = 
parameters.getInteger("TargetParallelism", 1);
+   }
+
+   @Override
+   public void mapPartition(Iterable> values, 
Collector> out) throws Exception {
+
+   List broadcastVariable = 
getRuntimeContext().getBroadcastVariable("DataDistribution");
+   if (broadcastVariable == null || broadcastVariable.size() != 1) 
{
--- End diff --

Nevermind, I thought you were using a MapFunction, but its a 
MapPartitionFunction. So this is only done once.


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2017] Add predefined required parameter...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1097#issuecomment-151142957
  
Going through FLINK-2017, I interpreted the semantics of `check` and 
`checkAndPopulate` as follows:
- `check` verifies that all required parameters are provided and valid 
(type, choice, etc.)
- `checkAndPopulate` copied the default values of the options into the 
`ParameterTool`

I think a method that is called `checkAndPopulate` should perform the same 
checks as `check` and do the *populate* in addition. Moreover, I do not see why 
`check` alone would be useful. Why would somebody define required parameters 
with default values without enforcing them, i.e., why you call `check` without 
`checkAndPopulate`.
If we have a single method, we could also give it a simpler name such as 
`applyTo`.

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2918] Add method to read a file of type...

2015-10-26 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1299#issuecomment-151125493
  
LGTM. The `readAsSequenceFile` is nice syntactic sugar to add. But we 
should add documentation to the online documentation since it's the user facing 
API which we extend here.

Furthermore, we should think about adding a corresponding 
`writeAsSequenceFile` method to the `DataSet` as the counter-part to 
`readAsSequenceFile`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2918) Add a method to be able to read SequenceFileInputFormat

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974183#comment-14974183
 ] 

ASF GitHub Bot commented on FLINK-2918:
---

Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1299#issuecomment-151130141
  
Also need to add a method for the new Hadoop API, the present PR only deals 
with Hadoop v1


> Add a method to be able to read SequenceFileInputFormat
> ---
>
> Key: FLINK-2918
> URL: https://issues.apache.org/jira/browse/FLINK-2918
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.9.1
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.10
>
>
> This  is to add a method to ExecutionEnvironment.{java,scala} to be able to 
> provide syntactic sugar to read a SequenceFileInputFormat. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974232#comment-14974232
 ] 

ASF GitHub Bot commented on FLINK-2017:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1097#issuecomment-151142957
  
Going through FLINK-2017, I interpreted the semantics of `check` and 
`checkAndPopulate` as follows:
- `check` verifies that all required parameters are provided and valid 
(type, choice, etc.)
- `checkAndPopulate` copied the default values of the options into the 
`ParameterTool`

I think a method that is called `checkAndPopulate` should perform the same 
checks as `check` and do the *populate* in addition. Moreover, I do not see why 
`check` alone would be useful. Why would somebody define required parameters 
with default values without enforcing them, i.e., why you call `check` without 
`checkAndPopulate`.
If we have a single method, we could also give it a simpler name such as 
`applyTo`.

What do you think?


> Add predefined required parameters to ParameterTool
> ---
>
> Key: FLINK-2017
> URL: https://issues.apache.org/jira/browse/FLINK-2017
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Robert Metzger
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> During the PR review, there was a request for required parameters.
> This issue is about implementing a facility to define required parameters. 
> The tool should also be able to print a help menu with a list of all 
> parameters.
> This test case shows my initial ideas how to design the API
> {code}
>   @Test
>   public void requiredParameters() {
>   RequiredParameters required = new RequiredParameters();
>   Option input = required.add("input").alt("i").help("Path to 
> input file or directory"); // parameter with long and short variant
>   required.add("output"); // parameter only with long variant
>   Option parallelism = 
> required.add("parallelism").alt("p").type(Integer.class); // parameter with 
> type
>   Option spOption = 
> required.add("sourceParallelism").alt("sp").defaultValue(12).help("Number 
> specifying the number of parallel data source instances"); // parameter with 
> default value, specifying the type.
>   Option executionType = 
> required.add("executionType").alt("et").defaultValue("pipelined").choices("pipelined",
>  "batch");
>   ParameterUtil parameter = ParameterUtil.fromArgs(new 
> String[]{"-i", "someinput", "--output", "someout", "-p", "15"});
>   required.check(parameter);
>   required.printHelp();
>   required.checkAndPopulate(parameter);
>   String inputString = input.get();
>   int par = parallelism.getInteger();
>   String output = parameter.get("output");
>   int sourcePar = parameter.getInteger(spOption.getName());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2918] Add method to read a file of type...

2015-10-26 Thread smarthi
Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1299#issuecomment-151130141
  
Also need to add a method for the new Hadoop API, the present PR only deals 
with Hadoop v1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2017] Add predefined required parameter...

2015-10-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1097#discussion_r42993834
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/utils/RequiredParameter.java 
---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.utils;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Facility to manage required parameters in user defined functions.
+ */
+public class RequiredParameter {
+
+   private static final String HELP_TEXT_PARAM_DELIMITER = "\t";
+   private static final String HELP_TEXT_LINE_DELIMITER = "\n";
+
+   private HashMap data;
+
+   public RequiredParameter() {
+   this.data = new HashMap<>();
+   }
+
+   public void add(Option option) throws RequiredParameterException {
--- End diff --

You can also do:
`req.add("name").type(...).values(...)` 
just like proposed in the FLINK-2017 and users do not need to care about 
the `Option` class which is very nice, IMO.

In some cases, alternatives can raise more questions and confusion than 
help. So I would rather remove `add(Option)` but that's just my opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2017) Add predefined required parameters to ParameterTool

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974213#comment-14974213
 ] 

ASF GitHub Bot commented on FLINK-2017:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1097#discussion_r42993834
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/utils/RequiredParameter.java 
---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.api.java.utils;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * Facility to manage required parameters in user defined functions.
+ */
+public class RequiredParameter {
+
+   private static final String HELP_TEXT_PARAM_DELIMITER = "\t";
+   private static final String HELP_TEXT_LINE_DELIMITER = "\n";
+
+   private HashMap data;
+
+   public RequiredParameter() {
+   this.data = new HashMap<>();
+   }
+
+   public void add(Option option) throws RequiredParameterException {
--- End diff --

You can also do:
`req.add("name").type(...).values(...)` 
just like proposed in the FLINK-2017 and users do not need to care about 
the `Option` class which is very nice, IMO.

In some cases, alternatives can raise more questions and confusion than 
help. So I would rather remove `add(Option)` but that's just my opinion.


> Add predefined required parameters to ParameterTool
> ---
>
> Key: FLINK-2017
> URL: https://issues.apache.org/jira/browse/FLINK-2017
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 0.9
>Reporter: Robert Metzger
>  Labels: starter
>
> In FLINK-1525 we've added the {{ParameterTool}}.
> During the PR review, there was a request for required parameters.
> This issue is about implementing a facility to define required parameters. 
> The tool should also be able to print a help menu with a list of all 
> parameters.
> This test case shows my initial ideas how to design the API
> {code}
>   @Test
>   public void requiredParameters() {
>   RequiredParameters required = new RequiredParameters();
>   Option input = required.add("input").alt("i").help("Path to 
> input file or directory"); // parameter with long and short variant
>   required.add("output"); // parameter only with long variant
>   Option parallelism = 
> required.add("parallelism").alt("p").type(Integer.class); // parameter with 
> type
>   Option spOption = 
> required.add("sourceParallelism").alt("sp").defaultValue(12).help("Number 
> specifying the number of parallel data source instances"); // parameter with 
> default value, specifying the type.
>   Option executionType = 
> required.add("executionType").alt("et").defaultValue("pipelined").choices("pipelined",
>  "batch");
>   ParameterUtil parameter = ParameterUtil.fromArgs(new 
> String[]{"-i", "someinput", "--output", "someout", "-p", "15"});
>   required.check(parameter);
>   required.printHelp();
>   required.checkAndPopulate(parameter);
>   String inputString = input.get();
>   int par = parallelism.getInteger();
>   String output = parameter.get("output");
>   int sourcePar = parameter.getInteger(spOption.getName());
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974366#comment-14974366
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43006432
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1223,6 +1230,51 @@ public long count() throws Exception {
final TypeInformation keyType = 
TypeExtractor.getKeySelectorTypes(keyExtractor, getType());
return new PartitionOperator(this, PartitionMethod.HASH, new 
Keys.SelectorFunctionKeys(clean(keyExtractor), this.getType(), keyType), 
Utils.getCallLocationName());
}
+
+   /**
+* Range-partitions a DataSet using the specified KeySelector.
+* 
+* Important:This operation shuffles the whole DataSet over the 
network and can take significant amount of time.
+*
+* @param keySelector The KeySelector with which the DataSet is 
range-partitioned.
+* @return The partitioned DataSet.
+*
+* @see KeySelector
+*/
+   public > DataSet 
partitionByRange(KeySelector keySelector) {
+   final TypeInformation keyType = 
TypeExtractor.getKeySelectorTypes(keySelector, getType());
+   String callLocation = Utils.getCallLocationName();
+
+   // Extract key from input element by keySelector.
+   KeyExtractorMapper keyExtractorMapper = new 
KeyExtractorMapper(keySelector);
--- End diff --

It would be good to inject the sampling and partition ID assignment code in 
the `JobGraphGenerator` and not at the API level. The `JobGraphGenerator` is 
called after the `Optimizer` and translates the optimized plan into a parallel 
data flow called `JobGraph` which is executed by the runtime. The benefit of 
injecting the code at this point is that any range partitioning can be handled 
transparently within the optimizer. This means also other operators except the 
explicit `partitionByRange()` such as Join, CoGroup, and Reduce can benefit 
from range partitioning. In addition this makes the injected code a part of the 
runtime which can be more transparently improved later on. 

The downside (for you) is that the job abstraction is much lower at this 
level. However, you have still access to the chosen key fields and type 
information of all operators. See the `JavaApiPostPass` class to learn how to 
generate serializers and comparators at this level.


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1255#issuecomment-151177199
  
Hi, I left a few comments inside. 
I don't think we need the special user code to extract keys. The 
comparators provide this functionality and can be generated within the 
JobGraphGenerator.

Let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1283#issuecomment-151191115
  
Thanks for the update. 
Looks good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974487#comment-14974487
 ] 

ASF GitHub Bot commented on FLINK-2797:
---

Github user sachingoel0101 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1214#discussion_r43015201
  
--- Diff: 
flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java ---
@@ -304,4 +368,60 @@ public String getDescription() {
return "TestOptimizerPlan  
";
}
}
+
+   public static final class TestExecuteTwice {
+
+   public static void main(String args[]) throws Exception {
--- End diff --

Okay. Cool. Lemme know if there's any other issues to address. :smile:


> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974391#comment-14974391
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43008448
  
--- Diff: 
flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java
 ---
@@ -1138,12 +1138,13 @@ private DistributionPattern 
connectJobVertices(Channel channel, int inputNumber,

if (channel.getShipStrategy() == 
ShipStrategyType.PARTITION_RANGE) {

-   final DataDistribution dataDistribution = 
channel.getDataDistribution();
+   DataDistribution dataDistribution = 
channel.getDataDistribution();
if (dataDistribution != null) {

sourceConfig.setOutputDataDistribution(dataDistribution, outputIndex);
} else {
-   throw new RuntimeException("Range partitioning 
requires data distribution");
-   // TODO: inject code and configuration for 
automatic histogram generation
--- End diff --

As the TODO comment says, here should go the sampling and distribution 
building code :-)


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-10-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43008448
  
--- Diff: 
flink-optimizer/src/main/java/org/apache/flink/optimizer/plantranslate/JobGraphGenerator.java
 ---
@@ -1138,12 +1138,13 @@ private DistributionPattern 
connectJobVertices(Channel channel, int inputNumber,

if (channel.getShipStrategy() == 
ShipStrategyType.PARTITION_RANGE) {

-   final DataDistribution dataDistribution = 
channel.getDataDistribution();
+   DataDistribution dataDistribution = 
channel.getDataDistribution();
if (dataDistribution != null) {

sourceConfig.setOutputDataDistribution(dataDistribution, outputIndex);
} else {
-   throw new RuntimeException("Range partitioning 
requires data distribution");
-   // TODO: inject code and configuration for 
automatic histogram generation
--- End diff --

As the TODO comment says, here should go the sampling and distribution 
building code :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974467#comment-14974467
 ] 

ASF GitHub Bot commented on FLINK-2889:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1283#issuecomment-151191115
  
Thanks for the update. 
Looks good to merge.


> Apply JMH on LongSerializationSpeedBenchmark class
> --
>
> Key: FLINK-2889
> URL: https://issues.apache.org/jira/browse/FLINK-2889
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-10-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43006432
  
--- Diff: flink-java/src/main/java/org/apache/flink/api/java/DataSet.java 
---
@@ -1223,6 +1230,51 @@ public long count() throws Exception {
final TypeInformation keyType = 
TypeExtractor.getKeySelectorTypes(keyExtractor, getType());
return new PartitionOperator(this, PartitionMethod.HASH, new 
Keys.SelectorFunctionKeys(clean(keyExtractor), this.getType(), keyType), 
Utils.getCallLocationName());
}
+
+   /**
+* Range-partitions a DataSet using the specified KeySelector.
+* 
+* Important:This operation shuffles the whole DataSet over the 
network and can take significant amount of time.
+*
+* @param keySelector The KeySelector with which the DataSet is 
range-partitioned.
+* @return The partitioned DataSet.
+*
+* @see KeySelector
+*/
+   public > DataSet 
partitionByRange(KeySelector keySelector) {
+   final TypeInformation keyType = 
TypeExtractor.getKeySelectorTypes(keySelector, getType());
+   String callLocation = Utils.getCallLocationName();
+
+   // Extract key from input element by keySelector.
+   KeyExtractorMapper keyExtractorMapper = new 
KeyExtractorMapper(keySelector);
--- End diff --

It would be good to inject the sampling and partition ID assignment code in 
the `JobGraphGenerator` and not at the API level. The `JobGraphGenerator` is 
called after the `Optimizer` and translates the optimized plan into a parallel 
data flow called `JobGraph` which is executed by the runtime. The benefit of 
injecting the code at this point is that any range partitioning can be handled 
transparently within the optimizer. This means also other operators except the 
explicit `partitionByRange()` such as Join, CoGroup, and Reduce can benefit 
from range partitioning. In addition this makes the injected code a part of the 
runtime which can be more transparently improved later on. 

The downside (for you) is that the job abstraction is much lower at this 
level. However, you have still access to the chosen key fields and type 
information of all operators. See the `JavaApiPostPass` class to learn how to 
generate serializers and comparators at this level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-7] [Runtime] Enable Range Partitioner.

2015-10-26 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43008620
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/BatchTask.java 
---
@@ -1264,11 +1264,10 @@ public static void logAndThrowException(Exception 
ex, AbstractInvokable parent)
oe = new OutputEmitter(strategy);
}
else {
-   final DataDistribution dataDist = 
config.getOutputDataDistribution(i, cl);
--- End diff --

Can we keep the interface with the `DataDistribution` and simply inject a 
simple int distribution (1,2,3,...,n)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-7) [GitHub] Enable Range Partitioner

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974395#comment-14974395
 ] 

ASF GitHub Bot commented on FLINK-7:


Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/1255#discussion_r43008620
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/BatchTask.java 
---
@@ -1264,11 +1264,10 @@ public static void logAndThrowException(Exception 
ex, AbstractInvokable parent)
oe = new OutputEmitter(strategy);
}
else {
-   final DataDistribution dataDist = 
config.getOutputDataDistribution(i, cl);
--- End diff --

Can we keep the interface with the `DataDistribution` and simply inject a 
simple int distribution (1,2,3,...,n)?


> [GitHub] Enable Range Partitioner
> -
>
> Key: FLINK-7
> URL: https://issues.apache.org/jira/browse/FLINK-7
> Project: Flink
>  Issue Type: Sub-task
>  Components: Distributed Runtime
>Reporter: GitHub Import
>Assignee: Chengxiang Li
> Fix For: pre-apache
>
>
> The range partitioner is currently disabled. We need to implement the 
> following aspects:
> 1) Distribution information, if available, must be propagated back together 
> with the ordering property.
> 2) A generic bucket lookup structure (currently specific to PactRecord).
> Tests to re-enable after fixing this issue:
>  - TeraSortITCase
>  - GlobalSortingITCase
>  - GlobalSortingMixedOrderITCase
>  Imported from GitHub 
> Url: https://github.com/stratosphere/stratosphere/issues/7
> Created by: [StephanEwen|https://github.com/StephanEwen]
> Labels: core, enhancement, optimizer, 
> Milestone: Release 0.4
> Assignee: [fhueske|https://github.com/fhueske]
> Created at: Fri Apr 26 13:48:24 CEST 2013
> State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2922) Add Queryable Window Operator

2015-10-26 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974474#comment-14974474
 ] 

Aljoscha Krettek commented on FLINK-2922:
-

The idea is, for example, that the regular emission of you window result gets 
stored as final truth in your database that serves some statistics to users. If 
you only have this, you always get your data with a lag of 1 hour.

You could also want to allow users to query the current count inside that 1 
hour window. To do that you need to have a way to match the query to the 
result. For that, my idea is to have (conceptually) two output streams. One for 
the regular window results and another one for query results. In the query 
result stream you basically get a tuple (query, window-result). So that the 
user can match elements in the query result stream to the queries that they 
sent.

> Add Queryable Window Operator
> -
>
> Key: FLINK-2922
> URL: https://issues.apache.org/jira/browse/FLINK-2922
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The idea is to provide a window operator that allows to query the current 
> window result at any time without discarding the current result.
> For example, a user might have an aggregation window operation with tumbling 
> windows of 1 hour. Now, at any time they might be interested in the current 
> aggregated value for the currently in-flight hour window.
> The idea is to make the operator a two input operator where normal elements 
> arrive on input one while queries arrive on input two. The query stream must 
> be keyed by the same key as the input stream. If an input arrives for a key 
> the current value for that key is emitted along with the query element so 
> that the user can map the result to the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2922) Add Queryable Window Operator

2015-10-26 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974438#comment-14974438
 ] 

Gyula Fora commented on FLINK-2922:
---

I don't immediately see a lot of use cases for this, could you please give me 
some examples that are not covered by non-discarding triggers?

> Add Queryable Window Operator
> -
>
> Key: FLINK-2922
> URL: https://issues.apache.org/jira/browse/FLINK-2922
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> The idea is to provide a window operator that allows to query the current 
> window result at any time without discarding the current result.
> For example, a user might have an aggregation window operation with tumbling 
> windows of 1 hour. Now, at any time they might be interested in the current 
> aggregated value for the currently in-flight hour window.
> The idea is to make the operator a two input operator where normal elements 
> arrive on input one while queries arrive on input two. The query stream must 
> be keyed by the same key as the input stream. If an input arrives for a key 
> the current value for that key is emitted along with the query element so 
> that the user can map the result to the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-26 Thread sachingoel0101
Github user sachingoel0101 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1214#discussion_r43015201
  
--- Diff: 
flink-clients/src/test/java/org/apache/flink/client/program/ClientTest.java ---
@@ -304,4 +368,60 @@ public String getDescription() {
return "TestOptimizerPlan  
";
}
}
+
+   public static final class TestExecuteTwice {
+
+   public static void main(String args[]) throws Exception {
--- End diff --

Okay. Cool. Lemme know if there's any other issues to address. :smile:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add org.apache.httpcomponents:(httpcore, httpc...

2015-10-26 Thread uce
GitHub user uce opened a pull request:

https://github.com/apache/flink/pull/1301

Add org.apache.httpcomponents:(httpcore, httpclient) to dependency 
management

Hadoop 2.4.0 has `httpcomponents` dependencies, which breaks Flink on 
Amazon EMR AMI 3.9, because 4.1.2 is missing a method, on which `EmrFileSystem` 
relies.
```bash
[INFO] org.apache.flink:flink-shaded-hadoop2:jar:1.0-SNAPSHOT
...
[INFO] |  +- net.java.dev.jets3t:jets3t:jar:0.9.0:compile
[INFO] |  |  +- org.apache.httpcomponents:httpclient:jar:4.1.2:compile
[INFO] |  |  +- org.apache.httpcomponents:httpcore:jar:4.1.2:compile
[INFO] |  |  \- com.jamesmurty.utils:java-xmlbuilder:jar:0.4:compile
```

This change moves both `httpclient` and `httpcore` to our root pom 
dependency management section, which makes `net.java.dev.jets3t:jets3t:jar` 
pull in the 4.2 version. This has been tested on the named EMR version.

If there is a new RC, we should merge this. I don't think that we have to 
cancel an ongoing RC. We can add it to 0.10.1.

Thanks for @tillrohrmann for spotting a crucial client vs. core typo, which 
was driving me nuts while testing the change.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/flink http

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1301


commit 2ae9d03b5f78135ab619e704d5fddfa30e98ef6f
Author: Ufuk Celebi 
Date:   2015-10-26T14:34:58Z

Add org.apache.httpcomponents:(httpcore, httpclient) to dependency 
management




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2797][cli] Add support for running jobs...

2015-10-26 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-151224527
  
I took a look at this again because I wanted to merge it for 0.10. However, 
I think it still needs a bit of work. The `ContextEnvironmentFactory` shouldn't 
hold the state for detached batch and stream executions. Could you store the 
batch plan in the `ContextEnvironment` and the stream graph in the 
`StreamContextEnvironment` (non-static)? Please give them appropriate names.

Since the `ContextEnvironmentFactory` is currently the entry point for 
batch and streaming execution (the streaming environment just checks whether 
the `ContextEnvironment` is instantiated), you may set a flag in the factory to 
prevent multiple executions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2797) CLI: Missing option to submit jobs in detached mode

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974646#comment-14974646
 ] 

ASF GitHub Bot commented on FLINK-2797:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1214#issuecomment-151224527
  
I took a look at this again because I wanted to merge it for 0.10. However, 
I think it still needs a bit of work. The `ContextEnvironmentFactory` shouldn't 
hold the state for detached batch and stream executions. Could you store the 
batch plan in the `ContextEnvironment` and the stream graph in the 
`StreamContextEnvironment` (non-static)? Please give them appropriate names.

Since the `ContextEnvironmentFactory` is currently the entry point for 
batch and streaming execution (the streaming environment just checks whether 
the `ContextEnvironment` is instantiated), you may set a flag in the factory to 
prevent multiple executions.


> CLI: Missing option to submit jobs in detached mode
> ---
>
> Key: FLINK-2797
> URL: https://issues.apache.org/jira/browse/FLINK-2797
> Project: Flink
>  Issue Type: Bug
>  Components: Command-line client
>Affects Versions: 0.9, 0.10
>Reporter: Maximilian Michels
>Assignee: Sachin Goel
> Fix For: 0.10
>
>
> Jobs can only be submitted in detached mode using YARN but not on a 
> standalone installation. This has been requested by users who want to submit 
> a job, get the job id, and later query its status.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2853) Apply JMH on MutableHashTablePerformanceBenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974538#comment-14974538
 ] 

ASF GitHub Bot commented on FLINK-2853:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1267#issuecomment-151208136
  
I'm going to merge this PR.


> Apply JMH on MutableHashTablePerformanceBenchmark class.
> 
>
> Key: FLINK-2853
> URL: https://issues.apache.org/jira/browse/FLINK-2853
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2827) Potential resource leak in TwitterSource#loadAuthenticationProperties()

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974534#comment-14974534
 ] 

ASF GitHub Bot commented on FLINK-2827:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1276#issuecomment-151207922
  
I'm going to merge this PR.


> Potential resource leak in TwitterSource#loadAuthenticationProperties()
> ---
>
> Key: FLINK-2827
> URL: https://issues.apache.org/jira/browse/FLINK-2827
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 0.10
>Reporter: Ted Yu
>Assignee: Saumitra Shahapure
>Priority: Minor
>  Labels: starter
>
> Here is related code:
> {code}
> Properties properties = new Properties();
> try {
> InputStream input = new FileInputStream(authPath);
> properties.load(input);
> input.close();
> } catch (Exception e) {
> throw new RuntimeException("Cannot open .properties 
> file: " + authPath, e);
> }
> {code}
> If there is exception coming out of properties.load() call, input would be 
> left open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974495#comment-14974495
 ] 

ASF GitHub Bot commented on FLINK-2919:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1300#issuecomment-151196004
  
Looks good to merge


> Apply JMH on FieldAccessMinibenchmark class.
> 
>
> Key: FLINK-2919
> URL: https://issues.apache.org/jira/browse/FLINK-2919
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2853] [tests] Apply JMH on MutableHashT...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1267#issuecomment-151208136
  
I'm going to merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2919) Apply JMH on FieldAccessMinibenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974533#comment-14974533
 ] 

ASF GitHub Bot commented on FLINK-2919:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1300#issuecomment-151207830
  
I'm going to merge this PR.


> Apply JMH on FieldAccessMinibenchmark class.
> 
>
> Key: FLINK-2919
> URL: https://issues.apache.org/jira/browse/FLINK-2919
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the  
> FieldAccessMinibenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2827] Closing FileInputStream through t...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1276#issuecomment-151207922
  
I'm going to merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974537#comment-14974537
 ] 

ASF GitHub Bot commented on FLINK-2889:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1283#issuecomment-151208005
  
I'm going to merge this PR.


> Apply JMH on LongSerializationSpeedBenchmark class
> --
>
> Key: FLINK-2889
> URL: https://issues.apache.org/jira/browse/FLINK-2889
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2889] [tests] Apply JMH on LongSerializ...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1283#issuecomment-151208005
  
I'm going to merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1300#issuecomment-151207830
  
I'm going to merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2890) Apply JMH on StringSerializationSpeedBenchmark class.

2015-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974539#comment-14974539
 ] 

ASF GitHub Bot commented on FLINK-2890:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1284#issuecomment-151208210
  
I'm going to merge this PR.


> Apply JMH on StringSerializationSpeedBenchmark class.
> -
>
> Key: FLINK-2890
> URL: https://issues.apache.org/jira/browse/FLINK-2890
> Project: Flink
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: GaoLun
>Assignee: GaoLun
>Priority: Minor
>  Labels: easyfix
>
> JMH is a Java harness for building, running, and analysing 
> nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
> method in order to get much more accurate results.Modify the 
> StringSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2919] [tests] Apply JMH on FieldAccessM...

2015-10-26 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1300#issuecomment-151196004
  
Looks good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2923) Make it possible to mix-and-match StateBackends with KvState implementations

2015-10-26 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2923:
-

 Summary: Make it possible to mix-and-match StateBackends with 
KvState implementations
 Key: FLINK-2923
 URL: https://issues.apache.org/jira/browse/FLINK-2923
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gyula Fora
Assignee: Gyula Fora


The KvState implementations are currently very closely tied to the specific 
StateBackend implementations and one has to reimplement the Backend to change 
the KvState. For many applications it is probably necessary to store the 
non-partitioned states differently from the key-value states. 

An example would be a sql KvState with a simple file/memory backend for 
non-partitioned states (like bloomfilters, and offsets).

A wrapper object should be provided to allow the use of a backend with a custom 
KvState. 

My proposal:
Create a KvStateProvider class which will have methods for initializing, 
creating and closing KvStates (independent from the backend)

Create a wrapper StateBackend that wraps a StateBackend and a KvStateprovider 
into a new StateBackend. This could be a static method of the StateBackend 
class:

`public static Statebackend StateBackend.createWithCustomKvState(StateBackend, 
KvStateProvider)`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)