Re: FileSplitterBase Operator

Priyanka Gugale Thu, 01 Dec 2016 00:19:54 -0800

Yes, you should provide input in format: s3n://ukey:upass@bucketName/path

-Priyanka


On Thu, Dec 1, 2016 at 12:46 PM, Vishal Agrawal <[email protected]
> wrote:

> Thank you Priyanka for a quick response.
>
> I need to use S3 bucket as my source of data. So do I need to give my S3
> bucket path there?
>
>
> Thanks,
> Vishal
>
>
> On Thu, Dec 1, 2016 at 1:28 AM, Priyanka Gugale <[email protected]> wrote:
>
>> Hi Vishal,
>>
>> The "file" filed helps the operator understand which FileSystem it's
>> working with. Check "getFSInstance()" method. Splitter can work with all FS
>> supported by hadoop.
>> In your case as you have different operator to figure out the input
>> file(s), you can provide any one of the known path from your input Source,
>> so the Splitter is initialized to work with your filesystem.
>>
>> -Priyanka
>>
>> On Thu, Dec 1, 2016 at 11:47 AM, Vishal Agrawal <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am planning to use below Dag configuration.
>>>
>>>
>>>
>>> public void populateDAG(DAG dag, Configuration configuration){
>>>
>>>     DagInput input = dag.addOperator("Input", new DagInput());
>>>
>>>     FileSplitterBase splitter = dag.addOperator("Splitter", new
>>> FileSplitterBase());
>>>
>>>     FSSliceReader blockReader = dag.addOperator("BlockReader", new
>>> FSSliceReader());
>>>
>>>     dag.addStream("file-info", input.output, splitter.input);
>>>
>>>     dag.addStream("block-metadata", splitter.blocksMetadataOutput,
>>> blockReader.blocksMetadataInput);
>>>
>>>     ...
>>>
>>>   }
>>>
>>>
>>> Here DagInput will lookup the source files path and will pass it to
>>> FileSplitterBase operator in FileInfo Object.
>>>
>>> Now as Splitter already has Absolute path of source file in FileInfo
>>> Object, I didn’t understand the significance of
>>> com.datatorrent.lib.io.fs.FileSplitterBase.file field.
>>>
>>>
>>> Thanks,
>>>
>>> Vishal
>>>
>>>
>>>
>>
>

Re: FileSplitterBase Operator

Reply via email to