But it not only optimizes the data flow. It also translates it into a different representation.
On Thu, Jan 22, 2015 at 3:34 PM, Robert Metzger <[email protected]> wrote: > How about renaming the "flink-compiler" to "flink-optimizer" ? > > On Wed, Jan 21, 2015 at 8:21 PM, Stephan Ewen <[email protected]> wrote: > >> There is a common misunderstanding between the "compile" phase of the >> Java/Scala compiler (which does not generate the Flink plan) and the Flink >> "compile/optimize" phase (happening when calling env.execute()). >> >> The Flink compile/optimize phase is not a compile phase in the sense that >> source code is parsed and translated to byte code. It only is a set of >> transformations on the program's data flow >> >> We should probably stop calling the Flink phase "compile", but simply >> "pre-flight" or "optimize" or "prepare". Otherwise, it creates frequent >> confusions... >> >> On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier <[email protected] >> > wrote: >> >>> Thanks Fabian, that makes a lot of sense :) >>> >>> Best, >>> Flavio >>> >>> On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske <[email protected]> >>> wrote: >>> >>>> The program is compiled when the ExecutionEnvironment.execute() method >>>> is called. At that moment, theEexecutionEnvironment collects all data >>>> sources that were previously created and traverses them towards connected >>>> data sinks. All sinks that are found this way are remembered and treated as >>>> execution targets. The sinks and all connected operators and data sources >>>> are given to the optimizer which analyses the plan, compiles an execution >>>> plan, and submits the plan to the execution system which the >>>> ExecutionEnvironment refers to (local, remote, ...). >>>> >>>> Therefore, your code can build arbitrary data flows with as many source >>>> as you like. Once you call ExecutionEnvironment.execute() all data sources >>>> and operators which are required to compute the result of all data sinks >>>> are executed. >>>> >>>> >>>> 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier <[email protected]>: >>>> >>>>> Great! Could you explain me a little bit the internals of how and when >>>>> Flink will generate the plan and how the execution environment is involved >>>>> in this phase? >>>>> Just to better understand this step! >>>>> >>>>> Thanks again, >>>>> Flavio >>>>> >>>>> >>>>> On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <[email protected]> >>>>> wrote: >>>>> >>>>>> Yes this will also work. You only have to make sure that the list of >>>>>> data sets is processed properly later on in your code. >>>>>> >>>>>> On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Till, >>>>>>> thanks for the reply. However my problem is that I'll have something >>>>>>> like: >>>>>>> >>>>>>> List<Dataset<<ElementType>> getInput(String[] args, >>>>>>> ExecutionEnvironment env) {....} >>>>>>> >>>>>>> So I don't know in advance how many of them I'll have at runtime. >>>>>>> Does it still work? >>>>>>> >>>>>>> On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Flavio, >>>>>>>> >>>>>>>> if your question was whether you can write a Flink job which can >>>>>>>> read input from different sources, depending on the user input, then >>>>>>>> the >>>>>>>> answer is yes. The Flink job plans are actually generated at runtime so >>>>>>>> that you can easily write a method which generates a user dependent >>>>>>>> input/data set. >>>>>>>> >>>>>>>> You could do something like this: >>>>>>>> >>>>>>>> DataSet<ElementType> getInput(String[] args, ExecutionEnvironment >>>>>>>> env) { >>>>>>>> if(args[0] == csv) { >>>>>>>> return env.readCsvFile(...); >>>>>>>> } else { >>>>>>>> return env.createInput(new AvroInputFormat<ElementType>(...)); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> as long as the element type of the data set are all equal for all >>>>>>>> possible data sources. I hope that I understood your problem correctly. >>>>>>>> >>>>>>>> Greets, >>>>>>>> >>>>>>>> Till >>>>>>>> >>>>>>>> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi guys, >>>>>>>>> >>>>>>>>> I have a big question for you about how Fling handles job's plan >>>>>>>>> generation: >>>>>>>>> let's suppose that I want to write a job that takes as input a >>>>>>>>> description of a set of datasets that I want to work on (for example >>>>>>>>> a csv >>>>>>>>> file and its path, 2 hbase tables, 1 parquet directory and its path, >>>>>>>>> etc). >>>>>>>>> From what I know Flink generates the job's plan at compile time, >>>>>>>>> so I was wondering whether this is possible right now or not.. >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> Flavio >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>> >> >
