Re: Using Hadoop Input/Output formats

Suneel Marthi Tue, 24 Nov 2015 11:41:36 -0800

Guess, it makes sense to add readHadoopXXX() methods to
StreamExecutionEnvironment (for feature parity with what's existing
presently in ExecutionEnvironment).


Also Flink-2949 addresses the need to add relevant syntactic sugar wrappers
in DataSet api for the code snippet in Fabian's previous email. Its not
cool, having to instantiate a JobConf in client code and having to pass
that around.



On Tue, Nov 24, 2015 at 2:26 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Nick,
>
> you can use Flink's HadoopInputFormat wrappers also for the DataStream
> API. However, DataStream does not offer as much "sugar" as DataSet because
> StreamEnvironment does not offer dedicated createHadoopInput or
> readHadoopFile methods.
>
> In DataStream Scala you can read from a Hadoop InputFormat
> (TextInputFormat in this case) as follows:
>
> val textData: DataStream[(LongWritable, Text)] = env.createInput(
>   new HadoopInputFormat[LongWritable, Text](
>     new TextInputFormat,
>     classOf[LongWritable],
>     classOf[Text],
>     new JobConf()
> ))
>
> The Java version is very similar.
>
> Note: Flink has wrappers for both MR APIs: mapred and mapreduce.
>
> Cheers,
> Fabian
>
> 2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanp...@apache.org>:
>
>> I’m not streaming expert. AFAIK, the layer can be used with only DataSet.
>> There are some streaming-specific features such as distributed snapshot in
>> Flink. These need some supports of source and sink. So you have to
>> implement I/O.
>>
>> > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>> >
>> > I completely missed this, thanks Chiwan. Can these be used with
>> DataStreams as well as DataSets?
>> >
>> > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanp...@apache.org>
>> wrote:
>> > Hi Nick,
>> >
>> > You can use Hadoop Input/Output Format without modification! Please
>> check the documentation[1] in Flink homepage.
>> >
>> > [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
>> >
>> > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org>
>> wrote:
>> > >
>> > > Hello,
>> > >
>> > > Is it possible to use existing Hadoop Input and OutputFormats with
>> Flink? There's a lot of existing code that conforms to these interfaces,
>> seems a shame to have to re-implement it all. Perhaps some adapter shim..?
>> > >
>> > > Thanks,
>> > > Nick
>> >
>> > Regards,
>> > Chiwan Park
>> >
>> >
>>
>> Regards,
>> Chiwan Park
>>
>>
>>
>>
>

Re: Using Hadoop Input/Output formats

Reply via email to