Thank you Priyanka for a quick response. I need to use S3 bucket as my source of data. So do I need to give my S3 bucket path there?
Thanks, Vishal On Thu, Dec 1, 2016 at 1:28 AM, Priyanka Gugale <[email protected]> wrote: > Hi Vishal, > > The "file" filed helps the operator understand which FileSystem it's > working with. Check "getFSInstance()" method. Splitter can work with all FS > supported by hadoop. > In your case as you have different operator to figure out the input > file(s), you can provide any one of the known path from your input Source, > so the Splitter is initialized to work with your filesystem. > > -Priyanka > > On Thu, Dec 1, 2016 at 11:47 AM, Vishal Agrawal < > [email protected]> wrote: > >> Hi, >> >> I am planning to use below Dag configuration. >> >> >> >> public void populateDAG(DAG dag, Configuration configuration){ >> >> DagInput input = dag.addOperator("Input", new DagInput()); >> >> FileSplitterBase splitter = dag.addOperator("Splitter", new >> FileSplitterBase()); >> >> FSSliceReader blockReader = dag.addOperator("BlockReader", new >> FSSliceReader()); >> >> dag.addStream("file-info", input.output, splitter.input); >> >> dag.addStream("block-metadata", splitter.blocksMetadataOutput, >> blockReader.blocksMetadataInput); >> >> ... >> >> } >> >> >> Here DagInput will lookup the source files path and will pass it to >> FileSplitterBase operator in FileInfo Object. >> >> Now as Splitter already has Absolute path of source file in FileInfo >> Object, I didn’t understand the significance of >> com.datatorrent.lib.io.fs.FileSplitterBase.file field. >> >> >> Thanks, >> >> Vishal >> >> >> >
