Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Maximilian Michels
Like you said, it depends on the use case. The GroupReduceFunction is a generalization of the traditional reduce. Thus, it is more powerful. However, it is also executed differently; a GroupReduceFunction requires the whole group to be materialized and passed at once. If your program doesn't

question please

2015-05-22 Thread Eng Fawzya
hi, i want to know what is the difference between FLink and Hadoop? -- Fawzya Ramadan Sayed, Teaching Assistant, Computer Science Department, Faculty of Computers and Information, Fayoum University

Re: Package multiple jobs in a single jar

2015-05-22 Thread Maximilian Michels
Hi Matthias, Thank you for taking the time to analyze Flink's invocation behavior. I like your proposal. I'm not sure whether it is a good idea to scan the entire JAR for main methods. Sometimes, main methods are added solely for testing purposes and don't really serve any practical use. However,

Re: question please

2015-05-22 Thread Chiwan Park
Hi. Hadoop is a framework for reliable, scalable, distributed computing. So, there are many components for this purpose such as HDFS, YARN and Hadoop MapReduce. Flink is an alternative to Hadoop MapReduce component. It has also some tools to make map-reduce program and extends it to support

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Maximilian Michels
Pardon, what I said is not completely right. Both functions are incrementally constructed. This seems obvious for the reduce function but is also true for the GroupReduce because it receives the values as an Iterable which, under the hood, can be constructed incrementally as well. One other

Re: Package multiple jobs in a single jar

2015-05-22 Thread Robert Metzger
Thank you for working on this. My responses are inline below: (Flavio) My suggestion is to create a specific Flink interface to get also description of a job and standardize parameter passing. I've recently merged the ParameterTool which is solving the standardize parameter passing problem

Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Makes sense to me. :) One more thing: What about extending the ProgramDescription interface to have multiple methods as Flavio suggested (with the config(...) method that should be handle by the ParameterTool) public interface FlinkJob { /** The name to display in the job submission UI or

Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Thanks for your feedback. I agree on the main method problem. For scanning and listing all stuff that is found it's fine. The tricky question is the automatic invocation mechanism, if -c flag is not used, and no manifest program-class or Main-Class entry is found. If multiple classes implement

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread Stephan Ewen
Performance-wise, a GroupReduceFunction with Combiner should right not be slightly faster than the ReduceFunction, but not much. Long term, the ReduceFunction may become faster, because it will use hash aggregation under the hood. On Fri, May 22, 2015 at 11:58 AM, santosh_rajaguru

Re: [DISCUSS] Dedicated streaming mode

2015-05-22 Thread Aljoscha Krettek
Hi, streaming currently does not use any memory manager. All state is kept in Java Objects on the Java Heap, for example an ArrayList for the window buffer. On Thu, May 21, 2015 at 11:56 PM, Henry Saputra henry.sapu...@gmail.com wrote: Hi Stephan, Gyula, Paris, How does streaming currently

[jira] [Created] (FLINK-2081) Change order of restore state and open for Streaming Operators

2015-05-22 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2081: --- Summary: Change order of restore state and open for Streaming Operators Key: FLINK-2081 URL: https://issues.apache.org/jira/browse/FLINK-2081 Project: Flink

Re: [DISCUSS] Dedicated streaming mode

2015-05-22 Thread Stephan Ewen
Aljoscha is right. There are plans to migrate the streaming state to the MemoryManager as well, but streaming state is not managed at this point. What is managed in streaming jobs is the data buffered and cached in the network stack. But that is a different memory pool than the memory manager. We

Re: difference between reducefunction and GroupReduceFunction

2015-05-22 Thread santosh_rajaguru
Thanks Maximilian. My use case is similar to the example given in the graph analysis. In graph analysis, the reduce function used is a normal reduce function. I executed that with both scenarios and your justification is right. the normal reduce function have a combiner before sorting unlike the

Re: Package multiple jobs in a single jar

2015-05-22 Thread Matthias J. Sax
Hi, two more thoughts to this discussion: 1) looking at the commit history of CliFrontend, I found the following closed issue and the closing pull request * https://issues.apache.org/jira/browse/FLINK-1095 * https://github.com/apache/flink/pull/238 It stand in opposite of Flavio's