What's curious is the enrichment topology showing the same issues, but my mind went to ES as well.
On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]> wrote: > Yes which bolt is reporting all those failures? My theory is that there > is some ES tuning that needs to be done. > > On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]> wrote: > >> Could I see a little more of that screen? Specifically what the bolts >> look like. >> >> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian <[email protected]> >> wrote: >> >>> Please find the storm-UI screenshot as follows. >>> >>> http://imgur.com/FhIrGFd >>> >>> >>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian <[email protected]> >>> wrote: >>> >>>> Hi Casey, >>>> >>>> - topology.message.timeout: It was 30s at first. I have increased it to >>>> 300s, no changes! >>>> - It is a very basic geo-enrichment and simple rule for threat triage! >>>> - No, not at all. >>>> - I have changed that to find the best value. it is 5000 which is about >>>> to 5MB. >>>> - I have changed the number of executors for the Storm acker thread, >>>> and I have also changed the value of topology.max.spout.pending, still no >>>> changes! >>>> >>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]> >>>> wrote: >>>> >>>>> Also, >>>>> * what's your setting for topology.message.timeout? >>>>> * You said you're seeing this in indexing and enrichment, what >>>>> enrichments do you have in place? >>>>> * Is ES being taxed heavily? >>>>> * What's your ES batch size for the sensor? >>>>> >>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella <[email protected]> >>>>> wrote: >>>>> >>>>>> So you're seeing failures in the storm topology but no errors in the >>>>>> logs. Would you mind sending over a screenshot of the indexing topology >>>>>> from the storm UI? You might not be able to paste the image on the >>>>>> mailing >>>>>> list, so maybe an imgur link would be in order. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Casey >>>>>> >>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Hi Ryan, >>>>>>> >>>>>>> No, I cannot see any error inside the indexing error topic. Also, >>>>>>> the number of tuples is emitted and transferred to the error indexing >>>>>>> bolt >>>>>>> is zero! >>>>>>> >>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Do you see any errors in the error* index in Elasticsearch? There >>>>>>>> are several catch blocks across the different topologies that transform >>>>>>>> errors into json objects and forward them on to the indexing topology. >>>>>>>> If >>>>>>>> you're not seeing anything in the worker logs it's likely the errors >>>>>>>> were >>>>>>>> captured there instead. >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> No everything is fine at the log level. Also, when I checked >>>>>>>>> resource consumption at the workers, there had been plenty resources >>>>>>>>> still >>>>>>>>> available! >>>>>>>>> >>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>> >>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> After I tried to tune the Metron performance I have noticed the >>>>>>>>>>> rate of failure for the indexing/enrichment topologies are very >>>>>>>>>>> high (about >>>>>>>>>>> 95%). However, I can see the messages in Elasticsearch. I have >>>>>>>>>>> tried to >>>>>>>>>>> increase the timeout value for the acknowledgement. It didn't fix >>>>>>>>>>> the >>>>>>>>>>> problem. I can set the number of acker executors to 0 to >>>>>>>>>>> temporarily fix >>>>>>>>>>> the problem which is not a good idea at all. Do you have any idea >>>>>>>>>>> what have >>>>>>>>>>> caused such issue? The percentage of failure decreases by reducing >>>>>>>>>>> the >>>>>>>>>>> number of parallelism, but even without any parallelism, it is >>>>>>>>>>> still high! >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Ali >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> A.Nazemian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> A.Nazemian >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> A.Nazemian >>>> >>> >>> >>> >>> -- >>> A.Nazemian >>> >> >> >
