Re: FileSplitterBase Operator

Vishal Agrawal Wed, 30 Nov 2016 23:17:48 -0800

Thank you Priyanka for a quick response.

I need to use S3 bucket as my source of data. So do I need to give my S3
bucket path there?



Thanks,
Vishal


On Thu, Dec 1, 2016 at 1:28 AM, Priyanka Gugale <[email protected]> wrote:

> Hi Vishal,
>
> The "file" filed helps the operator understand which FileSystem it's
> working with. Check "getFSInstance()" method. Splitter can work with all FS
> supported by hadoop.
> In your case as you have different operator to figure out the input
> file(s), you can provide any one of the known path from your input Source,
> so the Splitter is initialized to work with your filesystem.
>
> -Priyanka
>
> On Thu, Dec 1, 2016 at 11:47 AM, Vishal Agrawal <
> [email protected]> wrote:
>
>> Hi,
>>
>> I am planning to use below Dag configuration.
>>
>>
>>
>> public void populateDAG(DAG dag, Configuration configuration){
>>
>>     DagInput input = dag.addOperator("Input", new DagInput());
>>
>>     FileSplitterBase splitter = dag.addOperator("Splitter", new
>> FileSplitterBase());
>>
>>     FSSliceReader blockReader = dag.addOperator("BlockReader", new
>> FSSliceReader());
>>
>>     dag.addStream("file-info", input.output, splitter.input);
>>
>>     dag.addStream("block-metadata", splitter.blocksMetadataOutput,
>> blockReader.blocksMetadataInput);
>>
>>     ...
>>
>>   }
>>
>>
>> Here DagInput will lookup the source files path and will pass it to
>> FileSplitterBase operator in FileInfo Object.
>>
>> Now as Splitter already has Absolute path of source file in FileInfo
>> Object, I didn’t understand the significance of
>> com.datatorrent.lib.io.fs.FileSplitterBase.file field.
>>
>>
>> Thanks,
>>
>> Vishal
>>
>>
>>
>

Re: FileSplitterBase Operator

Reply via email to