Hello Aaron,
Please correct me if am wrong,You start processing files as soon as it is
written and rotated by the hdfs bolt.
On Dec 1, 2015 12:41 AM, "Aaron.Dossett" <[email protected]> wrote:

> I recently had to solve a use case like that.  I decided to track while
> files i had processed instead of records within each file.  If a file is
> still open for writing you could ignore it and come back for it later, or
> insert it more than once if your process is idempotent.
>
> From: Gaurav Agarwal <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Monday, November 30, 2015 at 1:01 PM
> To: "[email protected]" <[email protected]>
> Subject: Writing file to storm hdfs
>
> Hello
>
> In storm topology we r receiving tuples in millions from Kafka and we have
> to perform some calculations in bolt. Parallely we have bolt that starts
> writing into hdfs ,now we have parallelism hint for writing the file is 8.
> So 8 files will be there.
> Actually problem is once the snapshot data is enriched Nd written to
> multiple file nd completed,we have to trigger the other job that will copy
> the records from files into database.
> How can we find with multiple files created Nd bolt writing paraalely in
> files which is the last record written so that we can trigger nextjob.Any
> ideas?
>

Reply via email to