Use spout parallelism as 1 to open the file only once and read it. Else with more than 1 parallelism, the file file will be opened more than once but to get word count correct, let spout instance 1 read lines 1,3,5... and instance 2 read lines 2,4,6.. ie. skip that many no of lines depending on parallelism.
On 5/13/14, Komal Thombare <[email protected]> wrote: > Hi all, > > I am new to storm and working on Storm word count. I have confusion while > setting spout parallelism. > > I am using TopologyBuilder.setSpout() method to initialise and set > parallelism hint for spout as 2 , where I am actually creating two instances > of same spout. > > In Spout implementation I am reading data from a file which gets opened in > the open() method of spout. > > Because of parallelism hint 2 file gets read twice and my word count output > is doubled. > > So can anyone help me, to understand the way where I can have multiple > instances of same spout(parallelism hint > 1) but file gets read only once > and get correct word count output?Is there any other way out besides the > normal storm topology? > > Thanks and Regards, > > Komal Thombare > Tata Consultancy Services Limited > Ph:- 086-55388772 > Mail-to: [email protected] > Website: http://www.tcs.com > ____________________________________________ > Experience certainty. IT Services > Business Solutions > Consulting > ________________________________________ > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > >
