I started looking into setting up internal message buffer as mentioned in this link. http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/#how-to-configure-storms-internal-message-buffers
I found out that my message size could be as big as 10K. So does that mean that I should set the buffer size about 10K? -- Kushan Maskey 817.403.7500 On Tue, Aug 26, 2014 at 7:45 AM, Kushan Maskey < [email protected]> wrote: > Thanks, Michael, > > How do you verify the reliability of the KafkaSpout? I am using the > KafkaSpout that came with storm 0.9.2. AFAIK kafkaSpout is quite reliable. > I am guessing it the processing time for each record in the bolt. Yes form > the log I do see few Cassandra exceptions while inserting the records. > > -- > Kushan Maskey > 817.403.7500 > > > On Mon, Aug 25, 2014 at 9:39 PM, Michael Rose <[email protected]> > wrote: > >> Hi Kushan, >> >> Depending on the Kafka spout you're using, it could be doing different >> things when it failed. However, if it's running reliably, the Cassandra >> insertion failures would have forced a replay from the spout until they had >> completed. >> >> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >> [email protected] >> >> >> On Mon, Aug 25, 2014 at 4:42 PM, Kushan Maskey < >> [email protected]> wrote: >> >>> I have set up topology to load a very large volume of data. Recently I >>> just loaded about 60K records and found out that there are some failed acks >>> on few spouts but non on the bolts. Storm completed running and seem to >>> look stable. Initially i started with a lesser amount of data like about >>> 500 records successfully and then increased up to 60K where i saw the >>> failed acks. >>> >>> Questions: >>> 1. Does that mean that the spout was not able to read some messages from >>> Kafka? Since there are no failed ack on the bolts as per UI, what ever the >>> message received has been successfully processed by the bolts. >>> 2. how do i interpret the numbers of failed acks like this acked:315500 >>> and failed: 2980. >>> Does this mean that 2980 records failed to be processed? Is this is the >>> case then, how do I avoid this from happening because I will be loosing >>> 2980 records. >>> 3. I also see that few of the records failed to be inserted into >>> Cassandra database. What is the best way to reprocess the data again as it >>> is quite difficult to do it through the batch process that I am currently >>> running. >>> >>> LMK, thanks. >>> >>> -- >>> Kushan Maskey >>> 817.403.7500 >>> >> >> >
