Re: Using Hadoop Input/Output formats

Chiwan Park Tue, 24 Nov 2015 23:23:33 -0800

Thanks for correction @Fabian. :)

> On Nov 25, 2015, at 4:40 AM, Suneel Marthi <smar...@apache.org> wrote:
> 
> Guess, it makes sense to add readHadoopXXX() methods to 
> StreamExecutionEnvironment (for feature parity with what's existing presently 
> in ExecutionEnvironment).
> 
> Also Flink-2949 addresses the need to add relevant syntactic sugar wrappers 
> in DataSet api for the code snippet in Fabian's previous email. Its not cool, 
> having to instantiate a JobConf in client code and having to pass that 
> around. 
> 
> 
> 
> On Tue, Nov 24, 2015 at 2:26 PM, Fabian Hueske <fhue...@gmail.com> wrote:
> Hi Nick,
> 
> you can use Flink's HadoopInputFormat wrappers also for the DataStream API. 
> However, DataStream does not offer as much "sugar" as DataSet because 
> StreamEnvironment does not offer dedicated createHadoopInput or 
> readHadoopFile methods.
> 
> In DataStream Scala you can read from a Hadoop InputFormat (TextInputFormat 
> in this case) as follows:
> 
> val textData: DataStream[(LongWritable, Text)] = env.createInput(
>   new HadoopInputFormat[LongWritable, Text](
>     new TextInputFormat,
>     classOf[LongWritable],
>     classOf[Text],
>     new JobConf()
> ))
> 
> The Java version is very similar.
> 
> Note: Flink has wrappers for both MR APIs: mapred and mapreduce.
> 
> Cheers,
> Fabian
> 
> 2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanp...@apache.org>:
> I’m not streaming expert. AFAIK, the layer can be used with only DataSet. 
> There are some streaming-specific features such as distributed snapshot in 
> Flink. These need some supports of source and sink. So you have to implement 
> I/O.
> 
> > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:
> >
> > I completely missed this, thanks Chiwan. Can these be used with DataStreams 
> > as well as DataSets?
> >
> > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanp...@apache.org> wrote:
> > Hi Nick,
> >
> > You can use Hadoop Input/Output Format without modification! Please check 
> > the documentation[1] in Flink homepage.
> >
> > [1] 
> > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
> >
> > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> > >
> > > Hello,
> > >
> > > Is it possible to use existing Hadoop Input and OutputFormats with Flink? 
> > > There's a lot of existing code that conforms to these interfaces, seems a 
> > > shame to have to re-implement it all. Perhaps some adapter shim..?
> > >
> > > Thanks,
> > > Nick
> >
> > Regards,
> > Chiwan Park
> >
> >
> 
> Regards,
> Chiwan Park
>


Regards,
Chiwan Park

Re: Using Hadoop Input/Output formats

Reply via email to