Hi,
I am trying to user recently added support for Apex runner. This is to run
the program[1] using Apex on Hadoop cluster. The program is getting
launched successfully.
I would like to change the input and output to HDFS. I looked at the
HDFSFileSource and planning to use the same. I would reading simple text
file from HDFS and same way writing to HDFS.
I tried something like below, but looks like missing something trivial.
Pipeline p = Pipeline.create(options);
HDFSFileSource<IntWritable, Text> source =
HDFSFileSource.from("filePath",
SequenceFileInputFormat.class, IntWritable.class,
Text.class);
p.apply("ReadLines", source)
.apply(new CountWords())
.apply(....)
What would be the right format and <Key,Value> to use for this.
[1]
https://github.com/tweise/apex-samples/blob/master/beam-apex-wordcount/src/main/java/com/example/myapexapp/Application.java
Regards,
Sandeep