Please find the bolt part of Storm-UI related to indexing topology: http://imgur.com/a/tFkmO
As you can see a hdfs error has also appeared which is not important right now. On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> wrote: > What's curious is the enrichment topology showing the same issues, but my > mind went to ES as well. > > On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]> > wrote: > >> Yes which bolt is reporting all those failures? My theory is that there >> is some ES tuning that needs to be done. >> >> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]> >> wrote: >> >>> Could I see a little more of that screen? Specifically what the bolts >>> look like. >>> >>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian <[email protected]> >>> wrote: >>> >>>> Please find the storm-UI screenshot as follows. >>>> >>>> http://imgur.com/FhIrGFd >>>> >>>> >>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian <[email protected]> >>>> wrote: >>>> >>>>> Hi Casey, >>>>> >>>>> - topology.message.timeout: It was 30s at first. I have increased it >>>>> to 300s, no changes! >>>>> - It is a very basic geo-enrichment and simple rule for threat triage! >>>>> - No, not at all. >>>>> - I have changed that to find the best value. it is 5000 which is >>>>> about to 5MB. >>>>> - I have changed the number of executors for the Storm acker thread, >>>>> and I have also changed the value of topology.max.spout.pending, still no >>>>> changes! >>>>> >>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]> >>>>> wrote: >>>>> >>>>>> Also, >>>>>> * what's your setting for topology.message.timeout? >>>>>> * You said you're seeing this in indexing and enrichment, what >>>>>> enrichments do you have in place? >>>>>> * Is ES being taxed heavily? >>>>>> * What's your ES batch size for the sensor? >>>>>> >>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> So you're seeing failures in the storm topology but no errors in the >>>>>>> logs. Would you mind sending over a screenshot of the indexing topology >>>>>>> from the storm UI? You might not be able to paste the image on the >>>>>>> mailing >>>>>>> list, so maybe an imgur link would be in order. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Casey >>>>>>> >>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Ryan, >>>>>>>> >>>>>>>> No, I cannot see any error inside the indexing error topic. Also, >>>>>>>> the number of tuples is emitted and transferred to the error indexing >>>>>>>> bolt >>>>>>>> is zero! >>>>>>>> >>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Do you see any errors in the error* index in Elasticsearch? There >>>>>>>>> are several catch blocks across the different topologies that >>>>>>>>> transform >>>>>>>>> errors into json objects and forward them on to the indexing >>>>>>>>> topology. If >>>>>>>>> you're not seeing anything in the worker logs it's likely the errors >>>>>>>>> were >>>>>>>>> captured there instead. >>>>>>>>> >>>>>>>>> Ryan >>>>>>>>> >>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> No everything is fine at the log level. Also, when I checked >>>>>>>>>> resource consumption at the workers, there had been plenty resources >>>>>>>>>> still >>>>>>>>>> available! >>>>>>>>>> >>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> After I tried to tune the Metron performance I have noticed the >>>>>>>>>>>> rate of failure for the indexing/enrichment topologies are very >>>>>>>>>>>> high (about >>>>>>>>>>>> 95%). However, I can see the messages in Elasticsearch. I have >>>>>>>>>>>> tried to >>>>>>>>>>>> increase the timeout value for the acknowledgement. It didn't fix >>>>>>>>>>>> the >>>>>>>>>>>> problem. I can set the number of acker executors to 0 to >>>>>>>>>>>> temporarily fix >>>>>>>>>>>> the problem which is not a good idea at all. Do you have any idea >>>>>>>>>>>> what have >>>>>>>>>>>> caused such issue? The percentage of failure decreases by reducing >>>>>>>>>>>> the >>>>>>>>>>>> number of parallelism, but even without any parallelism, it is >>>>>>>>>>>> still high! >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Ali >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> A.Nazemian >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> A.Nazemian >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> A.Nazemian >>>>> >>>> >>>> >>>> >>>> -- >>>> A.Nazemian >>>> >>> >>> >> > -- A.Nazemian
