Ben, You might look at the OrcSourceTarget integration tests[1]. I'm not an expert at OrcFiles but looks like it has a few examples for reading/writing data.
[1] - https://github.com/apache/crunch/blob/master/crunch-hive/src/it/java/org/apache/crunch/io/orc/OrcFileSourceTargetIT.java#L64 On Mon, Sep 14, 2015 at 8:29 AM, Ben Watson <[email protected]> wrote: > Hi all, > > I'm trying to write a simple converter in Crunch to turn Sequence files > into ORC files. The only examples that I can find for dealing with ORC > files are the tutorial at > http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/ and > then the discussion at https://issues.apache.org/jira/browse/CRUNCH-450. > The tutorial seems to only show how to output data that's already in ORC > format, which isn't much use for me here. > > It would be nice to be able to output ORC files like you can with Java > MapReduce - > http://hadoopathome.logdown.com/posts/277986-using-multipleoutputs-with-orc-in-mapreduce > - specifying a Struct, parsing each record into some type of object, and > letting the output do the rest. I've tried to replicate this in Crunch by > writing a MapFn that basically turns each record into an OrcWritable, but > it doesn't work, and even if it did I suspect it wouldn't be very efficient. > > Is this something that's already possible that I'm missing? > > Thanks, > > Ben >
