Hi Ram,

Assuming that properties are set as Key,Value pairs. I have used the properties 
as below and I can read the multiple directories in parallel. Thank you.

<!-- Source : source_123 properties -->
   <property>
    
<name>dt.application.FileIO.operator.read.prop.inputDirectory(source_123)</name>
    <value>tmp/fileIO/source_123</value>
  </property>
  <property>
    
<name>dt.application.FileIO.operator.read.prop.inputConfigFile(source_123)</name>
    <value>tmp/fileIO/config/source_123/source_123_input_config.xml</value>
  </property>
  <property>
    <name>dt.application.FileIO.operator.read.prop.partCount(source_123)</name>
    <value>1</value>
  </property>
  <property>
    
<name>dt.application.FileIO.operator.read.prop.outputDirectory(source_123)</name>
    <value>tmp/fileIO/source_123</value>
  </property>
  <property>
    
<name>dt.application.FileIO.operator.read.prop.outputConfigFile(source_123)</name>
    <value>tmp/fileIO/config/source_123/source_123_output_config.xml</value>
  </property>

Regards,
Surya Vamshi

From: Mukkamula, Suryavamshivardhan (CWM-NR) 
[mailto:[email protected]]
Sent: 2016, June, 08 5:14 PM
To: [email protected]
Subject: RE: Multiple directories

Hi Ram,

Thank you.

I would like to define the below class elements as list from the properties.xml 
, I tried creating as below but no luck. Can you please help on how to 
correctly define the list of below elements.

public class InputValues<SOURCEID,DIRECTORY,CONFIGFILE,PARTITIONCOUNT> {

           public SOURCEID sourceId;
           public DIRECTORY directory;
           public CONFIGFILE configFile;
           public PARTITIONCOUNT partitionCount;
           public InputValues() {
           }

           public InputValues(SOURCEID sourceId, DIRECTORY directory,CONFIGFILE 
configFile,PARTITIONCOUNT partitionCount) {
               this.sourceId = sourceId;
               this.directory = directory;
               this.configFile = configFile;
               this.partitionCount = partitionCount;
           }


}

Properties:

<property>
    
<name>dt.application.FileIO.operator.read.prop.inputValues(source_123)</name>
    <value>tmp/fileIO/source_123</value>
    <value>tmp/fileIO/config/source_123_config.xml</value>
    <value>1</value>
  </property>

  <property>
    
<name>dt.application.FileIO.operator.read.prop.inputValues(source_124)</name>
    <value>tmp/fileIO/source_124</value>
    <value>tmp/fileIO/config/source_124_config.xml</value>
    <value>1</value>
  </property>

Regards,
Surya Vamshi

From: Munagala Ramanath [mailto:[email protected]]
Sent: 2016, June, 05 10:24 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Multiple directories

Some sample code to monitor multiple directories is now available at:
https://github.com/DataTorrent/examples/tree/master/tutorials/fileIO-multiDir

It shows how to use a custom implementation of definePartitions() to create
multiple partitions of the file input operator and group them
into "slices" where each slice monitors a single directory.

Ram

On Wed, May 25, 2016 at 9:55 AM, Munagala Ramanath 
<[email protected]<mailto:[email protected]>> wrote:
I'm hoping to have a sample sometime next week.

Ram

On Wed, May 25, 2016 at 9:30 AM, Mukkamula, Suryavamshivardhan (CWM-NR) 
<[email protected]<mailto:[email protected]>>
 wrote:
Thank you so much ram, for your advice , Option (a) would be ideal for my 
requirement.

Do you have sample usage for partitioning with individual configuration set ups 
different partitions?

Regards,
Surya Vamshi

From: Munagala Ramanath 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 2016, May, 25 12:11 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Multiple directories

You have 2 options: (a) AbstractFileInputOperator (b) FileSplitter/BlockReader

For (a), each partition (i.e. replica or the operator) can scan only a single 
directory, so if you have 100
directories, you can simply start with 100 partitions; since each partition is 
scanning its own directory
you don't need to worry about which files the lines came from. This approach 
however needs a custom
definePartition() implementation in your subclass to assign the appropriate 
directory and XML parsing
config file to each partition; it also needs adequate cluster resources to be 
able to spin up the required
number of partitions.

For (b), there is some documentation in the Operators section at 
http://docs.datatorrent.com/ including
sample code. There operators support scanning multiple directories out of the 
box but have more
elaborate configuration options. Check this out and see if it works in your use 
case.

Ram

On Wed, May 25, 2016 at 8:17 AM, Mukkamula, Suryavamshivardhan (CWM-NR) 
<[email protected]<mailto:[email protected]>>
 wrote:
Hello Ram/Team,

My requirement is to read input feeds from different locations on HDFS and 
parse those files by reading XML configuration files (each input feed has 
configuration file which defines the fields inside the input feeds).

My approach : I would like to define a mapping file which contains individual 
feed identifier, feed location , configuration file location. I would like to 
read this mapping file at initial load within setup() method and define my 
DirectoryScan.acceptFiles. Here my challenge is when I read the files , I 
should parse the lines by reading the individual configuration files. How do I 
know the line is from particular file , if I know this I can read the 
corresponding configuration file before parsing the line.

Please let me know how do I handle this.

Regards,
Surya Vamshi

From: Munagala Ramanath 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 2016, May, 24 5:49 PM
To: Mukkamula, Suryavamshivardhan (CWM-NR)
Subject: Multiple directories

One way of addressing the issue is to use some sort of external tool (like a 
script) to
copy all the input files to a common directory (making sure that the file names 
are
unique to prevent one file from overwriting another) before the Apex 
application starts.

The Apex application then starts and processes files from this directory.

If you set the partition count of the file input operator to N, it will create 
N partitions and
the files will be automatically distributed among the partitions. The 
partitions will work
in parallel.

Ram

_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not 
waive any related rights and obligations. Any distribution, use or copying of 
this [email] or the information it contains by other than an intended recipient 
is unauthorized. If you received this [email] in error, please advise the 
sender (by return [email] or otherwise) immediately. You have consented to 
receive the attached electronically at the above-noted address; please retain a 
copy of this confirmation for future reference.


_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not 
waive any related rights and obligations. Any distribution, use or copying of 
this [email] or the information it contains by other than an intended recipient 
is unauthorized. If you received this [email] in error, please advise the 
sender (by return [email] or otherwise) immediately. You have consented to 
receive the attached electronically at the above-noted address; please retain a 
copy of this confirmation for future reference.



_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not 
waive any related rights and obligations. Any distribution, use or copying of 
this [email] or the information it contains by other than an intended recipient 
is unauthorized. If you received this [email] in error, please advise the 
sender (by return [email] or otherwise) immediately. You have consented to 
receive the attached electronically at the above-noted address; please retain a 
copy of this confirmation for future reference.
_______________________________________________________________________

This [email] may be privileged and/or confidential, and the sender does not 
waive any related rights and obligations. Any distribution, use or copying of 
this [email] or the information it contains by other than an intended recipient 
is unauthorized. If you received this [email] in error, please advise the 
sender (by return [email] or otherwise) immediately. You have consented to 
receive the attached electronically at the above-noted address; please retain a 
copy of this confirmation for future reference.

Reply via email to