On Wed, Aug 18, 2010 at 11:07 PM, Doug Cutting <[email protected]> wrote: > On 08/18/2010 10:18 AM, ey-chih chow wrote: >> >> Thanks. But by doing this way, what kind of advantage we can get from >> Avro? > > The Avro MapReduce API is easiest to use when both inputs and outputs are > Avro data. > > If inputs are not Avro data, but you want to use the rest of the Avro MR > API, then you'd need to write an InputFormat that produces an AvroWrapper<T> > where T is a type that Avro can serialize. > > Another alternative might be to first convert your inputs to be avro data > files. For example, one can use Avro's 'fromtext' tool to convert > line-oriented files into equivalent compressed, splittable, Avro data files. > This could be done as log files are loaded into HDFS, since this tool > accepts Hadoop paths as output. > > We hope to add more such tools for such conversion/ingest, e.g.: > > https://issues.apache.org/jira/browse/AVRO-458 Offtopic, but is there any work being done on this already? I saw one of them tagged with 'GSOC', so wish to know before I sink something down. > > We also expect that systems like Flume will produce Avro data files. > > Doug >
-- Harsh J www.harshj.com
