Re: Runtime generated (source) datasets

Flavio Pompermaier Wed, 21 Jan 2015 05:12:13 -0800

Hi Till,
thanks for the reply. However my problem is that I'll have something like:


List<Dataset<<ElementType>>  getInput(String[] args, ExecutionEnvironment
env) {....}

So I don't know in advance how many of them I'll have at runtime. Does it
still work?

On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann <[email protected]> wrote:

> Hi Flavio,
>
> if your question was whether you can write a Flink job which can read
> input from different sources, depending on the user input, then the answer
> is yes. The Flink job plans are actually generated at runtime so that you
> can easily write a method which generates a user dependent input/data set.
>
> You could do something like this:
>
> DataSet<ElementType> getInput(String[] args, ExecutionEnvironment env) {
>   if(args[0] == csv) {
>     return env.readCsvFile(...);
>   } else {
>     return env.createInput(new AvroInputFormat<ElementType>(...));
>   }
> }
>
> as long as the element type of the data set are all equal for all possible
> data sources. I hope that I understood your problem correctly.
>
> Greets,
>
> Till
>
> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier <[email protected]
> > wrote:
>
>> Hi guys,
>>
>> I have a big question for you about how Fling handles job's plan
>> generation:
>> let's suppose that I want to write a job that takes as input a
>> description of a set of datasets that I want to work on (for example a csv
>> file and its path, 2 hbase tables, 1 parquet directory and its path, etc).
>> From what I know Flink generates the job's plan at compile time, so I was
>> wondering whether this is possible right now or not..
>>
>> Thanks in advance,
>> Flavio
>>
>
>

Re: Runtime generated (source) datasets

Reply via email to