There is a common misunderstanding between the "compile" phase of the Java/Scala compiler (which does not generate the Flink plan) and the Flink "compile/optimize" phase (happening when calling env.execute()).
The Flink compile/optimize phase is not a compile phase in the sense that source code is parsed and translated to byte code. It only is a set of transformations on the program's data flow We should probably stop calling the Flink phase "compile", but simply "pre-flight" or "optimize" or "prepare". Otherwise, it creates frequent confusions... On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier <[email protected]> wrote: > Thanks Fabian, that makes a lot of sense :) > > Best, > Flavio > > On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske <[email protected]> wrote: > >> The program is compiled when the ExecutionEnvironment.execute() method is >> called. At that moment, theEexecutionEnvironment collects all data sources >> that were previously created and traverses them towards connected data >> sinks. All sinks that are found this way are remembered and treated as >> execution targets. The sinks and all connected operators and data sources >> are given to the optimizer which analyses the plan, compiles an execution >> plan, and submits the plan to the execution system which the >> ExecutionEnvironment refers to (local, remote, ...). >> >> Therefore, your code can build arbitrary data flows with as many source >> as you like. Once you call ExecutionEnvironment.execute() all data sources >> and operators which are required to compute the result of all data sinks >> are executed. >> >> >> 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier <[email protected]>: >> >>> Great! Could you explain me a little bit the internals of how and when >>> Flink will generate the plan and how the execution environment is involved >>> in this phase? >>> Just to better understand this step! >>> >>> Thanks again, >>> Flavio >>> >>> >>> On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <[email protected]> >>> wrote: >>> >>>> Yes this will also work. You only have to make sure that the list of >>>> data sets is processed properly later on in your code. >>>> >>>> On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier < >>>> [email protected]> wrote: >>>> >>>>> Hi Till, >>>>> thanks for the reply. However my problem is that I'll have something >>>>> like: >>>>> >>>>> List<Dataset<<ElementType>> getInput(String[] args, >>>>> ExecutionEnvironment env) {....} >>>>> >>>>> So I don't know in advance how many of them I'll have at runtime. Does >>>>> it still work? >>>>> >>>>> On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Flavio, >>>>>> >>>>>> if your question was whether you can write a Flink job which can read >>>>>> input from different sources, depending on the user input, then the >>>>>> answer >>>>>> is yes. The Flink job plans are actually generated at runtime so that you >>>>>> can easily write a method which generates a user dependent input/data >>>>>> set. >>>>>> >>>>>> You could do something like this: >>>>>> >>>>>> DataSet<ElementType> getInput(String[] args, ExecutionEnvironment >>>>>> env) { >>>>>> if(args[0] == csv) { >>>>>> return env.readCsvFile(...); >>>>>> } else { >>>>>> return env.createInput(new AvroInputFormat<ElementType>(...)); >>>>>> } >>>>>> } >>>>>> >>>>>> as long as the element type of the data set are all equal for all >>>>>> possible data sources. I hope that I understood your problem correctly. >>>>>> >>>>>> Greets, >>>>>> >>>>>> Till >>>>>> >>>>>> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> I have a big question for you about how Fling handles job's plan >>>>>>> generation: >>>>>>> let's suppose that I want to write a job that takes as input a >>>>>>> description of a set of datasets that I want to work on (for example a >>>>>>> csv >>>>>>> file and its path, 2 hbase tables, 1 parquet directory and its path, >>>>>>> etc). >>>>>>> From what I know Flink generates the job's plan at compile time, so >>>>>>> I was wondering whether this is possible right now or not.. >>>>>>> >>>>>>> Thanks in advance, >>>>>>> Flavio >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >
