Use spout parallelism as 1 to open the file only once and read it.

Else with more than 1 parallelism, the file file will be opened more
than once but to get word count correct, let spout instance 1 read
lines 1,3,5... and instance 2 read lines 2,4,6..
ie. skip that many no of lines depending on parallelism.

On 5/13/14, Komal Thombare <[email protected]> wrote:
>  Hi all,
>
> I am new to storm and working on Storm word count. I have confusion while
> setting spout parallelism.
>
> I am using TopologyBuilder.setSpout() method to initialise and set
> parallelism hint for spout as 2 , where I am actually creating two instances
> of same spout.
>
> In Spout implementation I am reading data from a file which gets opened in
> the open() method of spout.
>
> Because of parallelism hint 2 file gets read twice and my word count output
> is doubled.
>
> So can anyone help me, to understand the way where I can have multiple
> instances of same spout(parallelism hint > 1) but file gets read only once
> and get correct word count output?Is there any other way out besides the
> normal storm topology?
>
> Thanks and Regards,
>
> Komal Thombare
> Tata Consultancy Services Limited
> Ph:- 086-55388772
> Mail-to: [email protected]
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Consulting
> ________________________________________
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>

Reply via email to