Hello Aaron, Please correct me if am wrong,You start processing files as soon as it is written and rotated by the hdfs bolt. On Dec 1, 2015 12:41 AM, "Aaron.Dossett" <[email protected]> wrote:
> I recently had to solve a use case like that. I decided to track while > files i had processed instead of records within each file. If a file is > still open for writing you could ignore it and come back for it later, or > insert it more than once if your process is idempotent. > > From: Gaurav Agarwal <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Monday, November 30, 2015 at 1:01 PM > To: "[email protected]" <[email protected]> > Subject: Writing file to storm hdfs > > Hello > > In storm topology we r receiving tuples in millions from Kafka and we have > to perform some calculations in bolt. Parallely we have bolt that starts > writing into hdfs ,now we have parallelism hint for writing the file is 8. > So 8 files will be there. > Actually problem is once the snapshot data is enriched Nd written to > multiple file nd completed,we have to trigger the other job that will copy > the records from files into database. > How can we find with multiple files created Nd bolt writing paraalely in > files which is the last record written so that we can trigger nextjob.Any > ideas? >
