Re: How to maintain the state of a variable in a map transformation.

2016-06-14 Thread Ravikumar Hawaldar
Hi Maximilian, Thank you for the response. Yeah its possible to break up global state but its very tricky to merge two local state variables and also I have to refactor my algorithm logic. Is there way where I can create a new object every time in reduce function so that I can assign the compute

Re: How to maintain the state of a variable in a map transformation.

2016-06-13 Thread Maximilian Michels
Hi Ravikumar, In short: No, you can't use closures to maintain a global state. If you want to keep an always global state, you'll have to use parallelism 1 or an external data store to keep that global state. Is it possible to break up your global state into a set of local states which can be com

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Ravikumar Hawaldar
Hi Fabian, Thank you for your help. I want my Flink application to be distributed as well as I want the facility to support the update of the variable [Coefficients of LinearRegression]. How you would do in that case? The problem with iteration is that it expects Dataset with same type to be fe

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Fabian Hueske
Hi, 1) Yes, that is correct. If you set the parallelism of an operator to 1 it is only executed on a single node. It depends on your application, if you need a global state or whether multiple local states are OK. 2) Flink programs follow the concept a data flow. There is no communication between

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Ravikumar Hawaldar
Hi Fabian, Thank you for your answers, 1) If there is only single instance of that function, then it will defeat the purpose of distributed correct me if I am wrong, so If I run parallelism with 1 on cluster does that mean it will execute on only one node? 2) I mean to say, when a map operator re

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Fabian Hueske
Hi Ravikumar, I'll try to answer your questions: 1) If you set the parallelism of a map function to 1, there will be only a single instance of that function regardless whether it is execution locally or remotely in a cluster. 2) Flink does also support aggregations, (reduce, groupReduce, combine,

Re: How to maintain the state of a variable in a map transformation.

2016-06-09 Thread Ravikumar Hawaldar
Hi Till, Thank you for your answer, I have couple of questions 1) Setting parallelism on a single map function in local is fine but on distributed will it work as local execution? 2) Is there any other way apart from setting parallelism? Like spark aggregate function? 3) Is it necessary that aft

Re: How to maintain the state of a variable in a map transformation.

2016-06-08 Thread Till Rohrmann
Hi Ravikumar, Flink's operators are stateful. So you can simply create a variable in your mapper to keep the state around. But every mapper instance will have it's own state. This state is determined by the records which are sent to this mapper instance. If you need a global state, then you have t

How to maintain the state of a variable in a map transformation.

2016-06-08 Thread Ravikumar Hawaldar
Hello, I have an DataSet which is roughly a record in a DataSet Or a file. Now I am using map transformation on this DataSet to compute a variable (coefficients of linear regression parameters and data structure used is a double[]). Now the issue is that, per record the variable will get updated