How about renaming the "flink-compiler" to "flink-optimizer" ?
On Wed, Jan 21, 2015 at 8:21 PM, Stephan Ewen <[email protected]> wrote: > There is a common misunderstanding between the "compile" phase of the > Java/Scala compiler (which does not generate the Flink plan) and the Flink > "compile/optimize" phase (happening when calling env.execute()). > > The Flink compile/optimize phase is not a compile phase in the sense that > source code is parsed and translated to byte code. It only is a set of > transformations on the program's data flow > > We should probably stop calling the Flink phase "compile", but simply > "pre-flight" or "optimize" or "prepare". Otherwise, it creates frequent > confusions... > > On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier <[email protected]> > wrote: > >> Thanks Fabian, that makes a lot of sense :) >> >> Best, >> Flavio >> >> On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske <[email protected]> wrote: >> >>> The program is compiled when the ExecutionEnvironment.execute() method >>> is called. At that moment, theEexecutionEnvironment collects all data >>> sources that were previously created and traverses them towards connected >>> data sinks. All sinks that are found this way are remembered and treated as >>> execution targets. The sinks and all connected operators and data sources >>> are given to the optimizer which analyses the plan, compiles an execution >>> plan, and submits the plan to the execution system which the >>> ExecutionEnvironment refers to (local, remote, ...). >>> >>> Therefore, your code can build arbitrary data flows with as many source >>> as you like. Once you call ExecutionEnvironment.execute() all data sources >>> and operators which are required to compute the result of all data sinks >>> are executed. >>> >>> >>> 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier <[email protected]>: >>> >>>> Great! Could you explain me a little bit the internals of how and when >>>> Flink will generate the plan and how the execution environment is involved >>>> in this phase? >>>> Just to better understand this step! >>>> >>>> Thanks again, >>>> Flavio >>>> >>>> >>>> On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <[email protected]> >>>> wrote: >>>> >>>>> Yes this will also work. You only have to make sure that the list of >>>>> data sets is processed properly later on in your code. >>>>> >>>>> On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Till, >>>>>> thanks for the reply. However my problem is that I'll have something >>>>>> like: >>>>>> >>>>>> List<Dataset<<ElementType>> getInput(String[] args, >>>>>> ExecutionEnvironment env) {....} >>>>>> >>>>>> So I don't know in advance how many of them I'll have at runtime. >>>>>> Does it still work? >>>>>> >>>>>> On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Flavio, >>>>>>> >>>>>>> if your question was whether you can write a Flink job which can >>>>>>> read input from different sources, depending on the user input, then the >>>>>>> answer is yes. The Flink job plans are actually generated at runtime so >>>>>>> that you can easily write a method which generates a user dependent >>>>>>> input/data set. >>>>>>> >>>>>>> You could do something like this: >>>>>>> >>>>>>> DataSet<ElementType> getInput(String[] args, ExecutionEnvironment >>>>>>> env) { >>>>>>> if(args[0] == csv) { >>>>>>> return env.readCsvFile(...); >>>>>>> } else { >>>>>>> return env.createInput(new AvroInputFormat<ElementType>(...)); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> as long as the element type of the data set are all equal for all >>>>>>> possible data sources. I hope that I understood your problem correctly. >>>>>>> >>>>>>> Greets, >>>>>>> >>>>>>> Till >>>>>>> >>>>>>> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I have a big question for you about how Fling handles job's plan >>>>>>>> generation: >>>>>>>> let's suppose that I want to write a job that takes as input a >>>>>>>> description of a set of datasets that I want to work on (for example a >>>>>>>> csv >>>>>>>> file and its path, 2 hbase tables, 1 parquet directory and its path, >>>>>>>> etc). >>>>>>>> From what I know Flink generates the job's plan at compile time, so >>>>>>>> I was wondering whether this is possible right now or not.. >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> Flavio >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >> >
