Thanks Fabian, that makes a lot of sense :) Best, Flavio
On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske <[email protected]> wrote: > The program is compiled when the ExecutionEnvironment.execute() method is > called. At that moment, theEexecutionEnvironment collects all data sources > that were previously created and traverses them towards connected data > sinks. All sinks that are found this way are remembered and treated as > execution targets. The sinks and all connected operators and data sources > are given to the optimizer which analyses the plan, compiles an execution > plan, and submits the plan to the execution system which the > ExecutionEnvironment refers to (local, remote, ...). > > Therefore, your code can build arbitrary data flows with as many source as > you like. Once you call ExecutionEnvironment.execute() all data sources and > operators which are required to compute the result of all data sinks are > executed. > > > 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier <[email protected]>: > >> Great! Could you explain me a little bit the internals of how and when >> Flink will generate the plan and how the execution environment is involved >> in this phase? >> Just to better understand this step! >> >> Thanks again, >> Flavio >> >> >> On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <[email protected]> >> wrote: >> >>> Yes this will also work. You only have to make sure that the list of >>> data sets is processed properly later on in your code. >>> >>> On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier < >>> [email protected]> wrote: >>> >>>> Hi Till, >>>> thanks for the reply. However my problem is that I'll have something >>>> like: >>>> >>>> List<Dataset<<ElementType>> getInput(String[] args, >>>> ExecutionEnvironment env) {....} >>>> >>>> So I don't know in advance how many of them I'll have at runtime. Does >>>> it still work? >>>> >>>> On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected]> >>>> wrote: >>>> >>>>> Hi Flavio, >>>>> >>>>> if your question was whether you can write a Flink job which can read >>>>> input from different sources, depending on the user input, then the answer >>>>> is yes. The Flink job plans are actually generated at runtime so that you >>>>> can easily write a method which generates a user dependent input/data set. >>>>> >>>>> You could do something like this: >>>>> >>>>> DataSet<ElementType> getInput(String[] args, ExecutionEnvironment env) >>>>> { >>>>> if(args[0] == csv) { >>>>> return env.readCsvFile(...); >>>>> } else { >>>>> return env.createInput(new AvroInputFormat<ElementType>(...)); >>>>> } >>>>> } >>>>> >>>>> as long as the element type of the data set are all equal for all >>>>> possible data sources. I hope that I understood your problem correctly. >>>>> >>>>> Greets, >>>>> >>>>> Till >>>>> >>>>> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi guys, >>>>>> >>>>>> I have a big question for you about how Fling handles job's plan >>>>>> generation: >>>>>> let's suppose that I want to write a job that takes as input a >>>>>> description of a set of datasets that I want to work on (for example a >>>>>> csv >>>>>> file and its path, 2 hbase tables, 1 parquet directory and its path, >>>>>> etc). >>>>>> From what I know Flink generates the job's plan at compile time, so I >>>>>> was wondering whether this is possible right now or not.. >>>>>> >>>>>> Thanks in advance, >>>>>> Flavio >>>>>> >>>>> >>>>> >>>> >>>> >>> >>
