Absolutely correct and good idea! Why don’t call it a „digester“, taking the term from chemistry/ medicine:
Chemistry A vessel („pipeline“) in which substances are softened or decomposed, usually for further processing. All the best! ___________________________________ Dr.-Ing. Plamen L. Simeonov Department 1: Geodäsie und Fernerkundung Sektion 1.5: Geoinformatik Tel.: +49 (0)331/288-1587 Fax: +49 (0)331/288-1732 email: [email protected] http://www.gfz-potsdam.de/ ___________________________________ Helmholtz-Zentrum Potsdam Deutsches GeoForschungsZentrum - GFZ Stiftung des öff. Rechts Land Brandenburg Telegrafenberg A 20, 14473 Potsdam ************************************************** > On 21 Jan 2015, at 20:21, Stephan Ewen <[email protected]> wrote: > > There is a common misunderstanding between the "compile" phase of the > Java/Scala compiler (which does not generate the Flink plan) and the Flink > "compile/optimize" phase (happening when calling env.execute()). > > The Flink compile/optimize phase is not a compile phase in the sense that > source code is parsed and translated to byte code. It only is a set of > transformations on the program's data flow > > We should probably stop calling the Flink phase "compile", but simply > "pre-flight" or "optimize" or "prepare". Otherwise, it creates frequent > confusions... > > On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier <[email protected] > <mailto:[email protected]>> wrote: > Thanks Fabian, that makes a lot of sense :) > > Best, > Flavio > > On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske <[email protected] > <mailto:[email protected]>> wrote: > The program is compiled when the ExecutionEnvironment.execute() method is > called. At that moment, theEexecutionEnvironment collects all data sources > that were previously created and traverses them towards connected data sinks. > All sinks that are found this way are remembered and treated as execution > targets. The sinks and all connected operators and data sources are given to > the optimizer which analyses the plan, compiles an execution plan, and > submits the plan to the execution system which the ExecutionEnvironment > refers to (local, remote, ...). > > Therefore, your code can build arbitrary data flows with as many source as > you like. Once you call ExecutionEnvironment.execute() all data sources and > operators which are required to compute the result of all data sinks are > executed. > > > 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier <[email protected] > <mailto:[email protected]>>: > Great! Could you explain me a little bit the internals of how and when Flink > will generate the plan and how the execution environment is involved in this > phase? > Just to better understand this step! > > Thanks again, > Flavio > > > On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <[email protected] > <mailto:[email protected]>> wrote: > Yes this will also work. You only have to make sure that the list of data > sets is processed properly later on in your code. > > On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier <[email protected] > <mailto:[email protected]>> wrote: > Hi Till, > thanks for the reply. However my problem is that I'll have something like: > > List<Dataset<<ElementType>> getInput(String[] args, ExecutionEnvironment > env) {....} > > So I don't know in advance how many of them I'll have at runtime. Does it > still work? > > On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected] > <mailto:[email protected]>> wrote: > Hi Flavio, > > if your question was whether you can write a Flink job which can read input > from different sources, depending on the user input, then the answer is yes. > The Flink job plans are actually generated at runtime so that you can easily > write a method which generates a user dependent input/data set. > > You could do something like this: > > DataSet<ElementType> getInput(String[] args, ExecutionEnvironment env) { > if(args[0] == csv) { > return env.readCsvFile(...); > } else { > return env.createInput(new AvroInputFormat<ElementType>(...)); > } > } > > as long as the element type of the data set are all equal for all possible > data sources. I hope that I understood your problem correctly. > > Greets, > > Till > > On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier <[email protected] > <mailto:[email protected]>> wrote: > Hi guys, > > > > > I have a big question for you about how Fling handles job's plan generation: > let's suppose that I want to write a job that takes as input a description of > a set of datasets that I want to work on (for example a csv file and its > path, 2 hbase tables, 1 parquet directory and its path, etc). > From what I know Flink generates the job's plan at compile time, so I was > wondering whether this is possible right now or not.. > > Thanks in advance, > Flavio > > > > > > > > > > > > > > > > > > >
