Re: Does Flink allows for encapsulation of transformations?

Chesnay Schepler Tue, 07 Jun 2016 05:14:27 -0700

1a. ah. yeah i see how it could work, but i wouldn't count on it in acluster.you would (most likely) run the the sub-job (calculating pi) only on asingle node.

1b. different execution environments generally imply different flinkprograms.

2. sure it does, since it's a normal flink job. yours on the other handdoesn't, since the job calculating PI only runs on a single TaskManager.

3. there are 2 ways. you can either chain jobs like this: (effectivelyrunning 2 flink programs in succession)

|publicstaticvoidmain(String[]args)throwsException{doublepi =newclassPI().compute();System.out.println("We estimate Pi to be: "+pi); newclassThatNeedsPI().computeWhatever(pi); //feeds pi into anenv.fromElements call and proceeds from there }|


or (if all building blocks are flink programs) build a single job:

|publicstaticvoidmain(String[]args)throwsException{ ExecutionEnvironmentenv = ExecutionEnvironment.getExecutionEnvironment(); DataSet<Double> pi=new classPI(env).compute();newclassThatNeedsPI(env).computeWhatever(pi); //append your transformationsto pi env.execute(); } ... ||publicDataSet<Double>compute()throwsException{returnthis.env.generateSequence(1,NumIter).map(newSampler()).reduce(newSumReducer()).map(/*return 4 * x*/);} ... public ? computeWhatever(DataSet<Long> pi)throws Exception { ... } |



On 07.06.2016 13:35, Ser Kho wrote:

Chesnay:
1a. The code actually works, that is the point.
1b. What restrict for a Flink program to have several executionenvironments?
2. I am not sure that your modification allows for parallelism. Does it?
3. This code is a simple example of writing/organizing large andcomplicated programs, where the result of this pi needed to be used inanother DataSet transformations beyond classPi(). What to do in this case?
Thanks a lot for the suggestions.
On Tuesday, June 7, 2016 6:15 AM, Chesnay Schepler<ches...@apache.org> wrote:
from what i can tell from your code you are trying to execute a jobwithin a job. This just doesn't work.
your main method should look like this:
|publicstaticvoidmain(String[]args)throwsException{doublepi =newclassPI().compute();System.out.println("We estimate Pi to be: "+pi);}|
On 06.06.2016 21:14, Ser Kho wrote:
The question is how to encapsulate numerous transformations into oneobject or may be a function in Apache Flink Java setting. I havetried to investigate this question using an example of Pi calculation(see below). I am wondering whether or not the suggested approach isvalid from the Flink's point of view. It works on one computer,however, I do not know how it will behave in a cluster setup. Thecode is given below, and the main idea behind it as follows:
 1. Create a class, named classPI, which method compute() does all
    data transformations, see more about it below.
 2. In the main method create a DataSet as in *DataSet< classPI > opi
    = env.fromElements(new classPI());*
3.
    Create *DataSet< Double > PI*, which equals output of
    transformation map() that calls the object PI's method compute()
    as in
    *DataSet< Double > PI = opi.map(new MapFunction< classPI ,
    Double>() { public Double map(classPI objPI) { return
    objPI.compute(); }});*
4.
    Now about ClassPI
     *
        Constructor instantiates ExecutionEnvironment, which is local
        for this class, as in
        *public classPI(){ this.NumIter=1000000; env =
        ExecutionEnvironment.getExecutionEnvironment();}*
Thus, the code has two ExecutionEnvironment objects: one in main andanother in the class classPI.
 *
    Has method compute() that runs all data transormations (in this
    example it is just several lines but potentially it might contain
    tons of Flink transfromations)
    *public Double compute(){ DataSet count = env.generateSequence(1,
    NumIter) .map(new Sampler()) .reduce(new SumReducer()); PI =
    4.0*count.collect().get(0)/NumIter;
    return PI;}*
the whole code is given below. Again, the question is if this is avalid approach for encapsulation of data transformation into a classin Flink setup that is supposed to be parallelizable to work on acluster. Is there a better way to hide details of data transformations?
Thanks a lot!

-------------------------The code ----------------------
|publicclassPiEstimation{publicstaticvoidmain(String[]args)throwsException{//this is one ExecutionEnvironmentfinalExecutionEnvironmentenv=ExecutionEnvironment.getExecutionEnvironment();// this is criticalDataSet with my classPI that computes PIDataSet<classPI>opi=env.fromElements(newclassPI());// this map calls the methodcompute() of class classPI that computes PIDataSet<Double>PI=opi.map(newMapFunction<classPI ,Double>(){publicDoublemap(classPIobjPI)throwsException{// this is how I call method compute() thatcalculates PI using transformationsreturnobjPI.compute();}});doublepi=PI.collect().get(0);System.out.println("We estimate Pi to be:"+pi);}// this class is of no impotance for my question, howerver, itis relevant for pi calculationpublicstaticclassSamplerimplementsMapFunction<Long,Long>{@OverridepublicLongmap(Longvalue){doublex=Math.random();doubley =Math.random();return(x *x +y *y)<1?1L:0L;}}//this class is of no impotance for my question, howerver, it isrelevant for pi calculationpublicstaticfinalclassSumReducerimplementsReduceFunction<Long>{@OverridepublicLongreduce(Longvalue1,Longvalue2){returnvalue1+value2;}}// this is my class that computes PI, my question iswhether such a class is valid in Flink on cluster with parallelcomputation publicstaticfinalclassclassPI{publicIntegerNumIter;privatefinalExecutionEnvironmentenv;publicDoublePI;//this is constructor with anotherExecutionEnvironmentpublicclassPI(){this.NumIter=1000000;env=ExecutionEnvironment.getExecutionEnvironment();}//This is the themethod that contains all datatransformationpublicDoublecompute()throwsException{DataSet<Long>count=env.generateSequence(1,NumIter).map(newSampler()).reduce(newSumReducer());PI=4.0*count.collect().get(0)/NumIter;returnPI;}}}|

Re: Does Flink allows for encapsulation of transformations?

Reply via email to