Hi Fabian, Thank you for your answers, 1) If there is only single instance of that function, then it will defeat the purpose of distributed correct me if I am wrong, so If I run parallelism with 1 on cluster does that mean it will execute on only one node?
2) I mean to say, when a map operator returns a variable, is there any other function which takes that updated variable and returns that to all instances of map? 3) Question Cleared. 4) My question was can I use same ExecutionEnvironment for all flink programs in a module. 5) Question Cleared. Regards Ravikumar On 9 June 2016 at 17:58, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Ravikumar, > > I'll try to answer your questions: > 1) If you set the parallelism of a map function to 1, there will be only a > single instance of that function regardless whether it is execution locally > or remotely in a cluster. > 2) Flink does also support aggregations, (reduce, groupReduce, combine, > ...). However, I do not see how this would help with a stateful map > function. > 3) In Flink DataSet programs you usually construct the complete program > and call execute() after you have defined your sinks. There are two > exceptions: print() and collect() which both add special sinks and > immediately execute your program. print() prints the result to the stdout > of the submitting client and collect() fetches a dataset as collection. > 4) I am not sure I understood your question. When you obtain an > ExecutionEnvironment with ExecutionEnvironment.getExecutionEnvrionment() > the type of the returned environment depends on the context in which the > program was executed. It can be a local environment if it is executed from > within an IDE or a RemodeExecutionEnvironment if the program is executed > via the CLI client and shipped to a remote cluster. > 5) A map operator processes records one after the other, i.e., as a > sequence. If you need a certain order, you can call DataSet.sortPartition() > to locally sort the partition. > > Hope that helps, > Fabian > > 2016-06-09 12:23 GMT+02:00 Ravikumar Hawaldar < > ravikumar.hawal...@gmail.com>: > >> Hi Till, Thank you for your answer, I have couple of questions >> >> 1) Setting parallelism on a single map function in local is fine but on >> distributed will it work as local execution? >> >> 2) Is there any other way apart from setting parallelism? Like spark >> aggregate function? >> >> 3) Is it necessary that after transformations to call execute function? >> Or Execution starts as soon as it encounters a action (Similar to Spark)? >> >> 4) Can I create a global execution environment (Either local or >> distributed) for different Flink program in a module? >> >> 5) How to make the records come in sequence for a map or any other >> operator? >> >> >> Regards, >> Ravikumar >> >> >> On 8 June 2016 at 21:14, Till Rohrmann <trohrm...@apache.org> wrote: >> >>> Hi Ravikumar, >>> >>> Flink's operators are stateful. So you can simply create a variable in >>> your mapper to keep the state around. But every mapper instance will have >>> it's own state. This state is determined by the records which are sent to >>> this mapper instance. If you need a global state, then you have to set the >>> parallelism to 1. >>> >>> Cheers, >>> Till >>> >>> On Wed, Jun 8, 2016 at 5:08 PM, Ravikumar Hawaldar < >>> ravikumar.hawal...@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> I have an DataSet<UserDefinedType> which is roughly a record in a >>>> DataSet Or a file. >>>> >>>> Now I am using map transformation on this DataSet to compute a variable >>>> (coefficients of linear regression parameters and data structure used is a >>>> double[]). >>>> >>>> Now the issue is that, per record the variable will get updated and I >>>> am struggling to maintain state of this variable for the next record. >>>> >>>> In simple, for first record the variable values will be 0.0, and after >>>> first record the variable will get updated and I have to pass this updated >>>> variable for the second record and so on for all records in DataSet. >>>> >>>> Any suggestions on how to maintain state of a variable? >>>> >>>> >>>> Regards, >>>> Ravikumar >>>> >>> >>> >> >