On Sat, Sep 12, 2015 at 2:35 PM, Everett Anderson <[email protected]> wrote:
> Hi, > > I've got two basic questions about org.apache.crunch.io.Compress > <https://crunch.apache.org/apidocs/0.12.0/index.html?overview-summary.html> > . > > 1) It seems like it should only be used to wrap Targets that are > themselves binary file output formats, but org.apache.crunch.io.To only > has text, avro, and sequence, none of which seem appropriate. How do people > tend to use this? Is there a Hadoop FileOutputFormat that they give to > To.formattedFile? > I don't understand the question-- the Compress methods can be used for any sort of output format that extends FileOutputFormat, it doesn't matter whether it's text/sequence/avro or a custom thing. > > 2) The implementation of Compress.gzip is > > public static <T extends Target> T gzip(T target) { > return (T) compress(target, GzipCodec.class) > .outputConf(*AvroJob.OUTPUT_CODEC*, > DataFileConstants.DEFLATE_CODEC); > } > > Does this mean it can only work with Avro? > No, it's just that Avro has its own built-in support for gzip/snappy serialization and it requires some extra conf to enable it. Any other output format will just ignore that configuration parameter. > Thanks! > > *DISCLAIMER:* The contents of this email, including any attachments, may > contain information that is confidential, proprietary in nature, protected > health information (PHI), or otherwise protected by law from disclosure, > and is solely for the use of the intended recipient(s). If you are not the > intended recipient, you are hereby notified that any use, disclosure or > copying of this email, including any attachments, is unauthorized and > strictly prohibited. If you have received this email in error, please > notify the sender of this email. Please delete this and all copies of this > email from your system. Any opinions either expressed or implied in this > email and all attachments, are those of its author only, and do not > necessarily reflect those of Nuna Health, Inc.
