Sam,

Bolts are taking data you emit from your spout (or from other bolt) and
then do what you need (persist data in db, aggregate etc).

In your case - you have a spout which emits sentences, you need to create
another bolt that split the sentence in words and emit each word as a tuple.

Then you should have another bolt - that gets the word as tuple from the
previous bolt and does your processing.

Use fieldsGrouping for your word processing task in topology and
shuffleGrouping for your split sentence bolt.

*I highly recommend Petrel library for python*
https://github.com/AirSage/Petrel

*Take a look at the sample that is very similar to your own task*
https://github.com/AirSage/Petrel/tree/master/samples/wordcount

*topology is defined here*
https://github.com/AirSage/Petrel/blob/master/samples/wordcount/create.py

*If you want to use Python/Storm (with Petrel) read this book*
https://www.packtpub.com/big-data-and-business-intelligence/building-python-real-time-applications-storm

Thanks,
Dmitry
​

On Tue, Apr 4, 2017 at 1:49 AM, sam mohel <[email protected]> wrote:

> I need some help from you in this problem . I read that spout is
> responsible for reading data or preparing it for processing in Bolt . so i
> wrote some code in spout to open the file and read line by line
>
> class SimSpout(storm.Spout):
>     # Not much to do here for such a basic spout
>     def initialize(self, conf, context):
>     ## Open the file with read only permit
>         self.f = open('data.txt', 'r')
>     ## Read the first line
>         self._conf = conf
>         self._context = context
>         storm.logInfo("Spout instance starting...")
>     # Process the next tuple
>     def nextTuple(self):
>         # check if it reach at the EOF to close it
>       for line in self.f.readlines():
>         # Emit a random sentence
>         storm.logInfo("Emiting %s" % line)
>         storm.emit([line])
>
> # Start the spout when it's invoked
> SimSpout().run()
>
>
> Is that right ?
> The actual problem with me now , How can i make Bolt take each line from
> spout to make the processing on it as the processing on it is to read from
> another file some calculations to compute the vector of each word
>



-- 
------------------------------
<http://www.saritasa.com/>
Dmitry Semenov
[email protected] | 949.200.6839 | www.saritasa.com
20411 Birch St., Suite 330, Newport Beach, CA 92660

Reply via email to