Sorry, It's been some time that I last looked into these. AvroStore uses files and writes data with DatumWriter directly, whereas DataFileAvroStore uses the data file, which is an avro file format. This format support blocks, so they can be split for mapreduce tasks.
Yes, all FileBasedDataStores work on top of files stored at a hadoop file system. even local file system should work. Enis On Tue, Oct 9, 2012 at 4:31 PM, Mike Baranczak <[email protected]> wrote: > On Oct 9, 2012, at 7:07 PM, Enis Söztutar wrote: > > > Hi Mike, > > > > You should use DataFileAvroStore. > > OK, but why? > > > > Is there any reason you are using a file-backed data store for nutch. I > am not sure this is tested enough. > > Well, right now I'm not using anything. I'm still trying to figure out > which data store I want. I picked these because I wanted to keep things > simple: they don't require setting up any servers besides Hadoop with HDFS > (they don't, right?) > > -MB > >

