Reading data from HDFS

Sandeep Deshmukh Fri, 25 Nov 2016 09:45:37 -0800

Hi,

I am trying to user recently added support for Apex runner. This is to run
the program[1] using Apex on Hadoop cluster. The program is getting
launched successfully.


I would like to change the input and output to HDFS. I looked at the
HDFSFileSource and planning to use the same. I would reading simple text
file from HDFS and same way writing to HDFS.

I tried something like below, but looks like missing something trivial.

 Pipeline p = Pipeline.create(options);
HDFSFileSource<IntWritable, Text> source =
                HDFSFileSource.from("filePath",
                    SequenceFileInputFormat.class, IntWritable.class,
Text.class);

p.apply("ReadLines", source)
       .apply(new CountWords())
       .apply(....)

What would be the right format and <Key,Value> to use for this.

[1]
https://github.com/tweise/apex-samples/blob/master/beam-apex-wordcount/src/main/java/com/example/myapexapp/Application.java

Regards,
Sandeep

Reading data from HDFS

Reply via email to