But why would this be a problem? As long as it's using HDFS to access the files, it should be able to fetch the chunks from wherever they might be in the cluster.
I don't see why it wouldn't work. Let us know if it works! On Sat, Feb 16, 2013 at 7:38 PM, Claudio Reggiani <[email protected]> wrote: > Yes, thank you Steve. And sorry for my encoded messages > > Claudio > > > 2013/2/16 Steve Chien <[email protected]> > >> I think he meant that code is reading and converting the files from the >> Input directory as a standalone program. Not a map-reduce program... >> >> On Feb 16, 2013, at 11:22, Dan Filimon <[email protected]> >> wrote: >> >> > Hi Claudio, >> > >> > Could you be more specific? What does 'MapReduce style' mean? >> > seqdirectory should create sequence files from the documents in a >> > folder, where the keys are the document names and the values are the >> > documents' content. >> > >> > What do you need it to do? >> > >> > On Sat, Feb 16, 2013 at 5:55 PM, Claudio Reggiani <[email protected]> >> wrote: >> >> Hello, >> >> >> >> I have a text dataset. Running "seqdirectory" command on it I see it's >> not >> >> written in MapReduce style (looking at the source code of >> >> SequenceFilesFromDirectory confirms that). >> >> >> >> What if I have a big dataset stored in HDFS and I would like to convert >> it >> >> in SequenceFile format? Do I need to create my own custom job or >> >> seqdirectory does that? >> >> >> >> Thanks >> >> Claudio Reggiani >>
