Although this is a test platform with a way less spec than production, it should be enough for indexing 600 docs per second. I have seen benchmark result of 150-200k docs per second with this spec! I haven't played with tuning the template yet, but I still think the current rate does not make sense at all.
I have changed the batch size to 100. Throughput has been dropped, but still a very high rate of failure! Please find the screenshots for the enrichments: http://imgur.com/a/ceC8f http://imgur.com/a/sBQwM On Sat, Apr 22, 2017 at 2:08 AM, Casey Stella <[email protected]> wrote: > Ok, yeah, those latencies are pretty high. I think what's happening is > that the tuples aren't being acked fast enough and are timing out. How > taxed is your ES box? Can you drop the batch size down to maybe 100 and > see what happens? > > On Fri, Apr 21, 2017 at 12:05 PM, Ali Nazemian <[email protected]> > wrote: > >> Please find the bolt part of Storm-UI related to indexing topology: >> >> http://imgur.com/a/tFkmO >> >> As you can see a hdfs error has also appeared which is not important >> right now. >> >> On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> wrote: >> >>> What's curious is the enrichment topology showing the same issues, but >>> my mind went to ES as well. >>> >>> On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]> >>> wrote: >>> >>>> Yes which bolt is reporting all those failures? My theory is that >>>> there is some ES tuning that needs to be done. >>>> >>>> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]> >>>> wrote: >>>> >>>>> Could I see a little more of that screen? Specifically what the bolts >>>>> look like. >>>>> >>>>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian <[email protected]> >>>>> wrote: >>>>> >>>>>> Please find the storm-UI screenshot as follows. >>>>>> >>>>>> http://imgur.com/FhIrGFd >>>>>> >>>>>> >>>>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi Casey, >>>>>>> >>>>>>> - topology.message.timeout: It was 30s at first. I have increased it >>>>>>> to 300s, no changes! >>>>>>> - It is a very basic geo-enrichment and simple rule for threat >>>>>>> triage! >>>>>>> - No, not at all. >>>>>>> - I have changed that to find the best value. it is 5000 which is >>>>>>> about to 5MB. >>>>>>> - I have changed the number of executors for the Storm acker thread, >>>>>>> and I have also changed the value of topology.max.spout.pending, still >>>>>>> no >>>>>>> changes! >>>>>>> >>>>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Also, >>>>>>>> * what's your setting for topology.message.timeout? >>>>>>>> * You said you're seeing this in indexing and enrichment, what >>>>>>>> enrichments do you have in place? >>>>>>>> * Is ES being taxed heavily? >>>>>>>> * What's your ES batch size for the sensor? >>>>>>>> >>>>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> So you're seeing failures in the storm topology but no errors in >>>>>>>>> the logs. Would you mind sending over a screenshot of the indexing >>>>>>>>> topology from the storm UI? You might not be able to paste the image >>>>>>>>> on >>>>>>>>> the mailing list, so maybe an imgur link would be in order. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Casey >>>>>>>>> >>>>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Ryan, >>>>>>>>>> >>>>>>>>>> No, I cannot see any error inside the indexing error topic. Also, >>>>>>>>>> the number of tuples is emitted and transferred to the error >>>>>>>>>> indexing bolt >>>>>>>>>> is zero! >>>>>>>>>> >>>>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Do you see any errors in the error* index in Elasticsearch? >>>>>>>>>>> There are several catch blocks across the different topologies that >>>>>>>>>>> transform errors into json objects and forward them on to the >>>>>>>>>>> indexing >>>>>>>>>>> topology. If you're not seeing anything in the worker logs it's >>>>>>>>>>> likely the >>>>>>>>>>> errors were captured there instead. >>>>>>>>>>> >>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> No everything is fine at the log level. Also, when I checked >>>>>>>>>>>> resource consumption at the workers, there had been plenty >>>>>>>>>>>> resources still >>>>>>>>>>>> available! >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> After I tried to tune the Metron performance I have noticed >>>>>>>>>>>>>> the rate of failure for the indexing/enrichment topologies are >>>>>>>>>>>>>> very high >>>>>>>>>>>>>> (about 95%). However, I can see the messages in Elasticsearch. I >>>>>>>>>>>>>> have tried >>>>>>>>>>>>>> to increase the timeout value for the acknowledgement. It didn't >>>>>>>>>>>>>> fix the >>>>>>>>>>>>>> problem. I can set the number of acker executors to 0 to >>>>>>>>>>>>>> temporarily fix >>>>>>>>>>>>>> the problem which is not a good idea at all. Do you have any >>>>>>>>>>>>>> idea what have >>>>>>>>>>>>>> caused such issue? The percentage of failure decreases by >>>>>>>>>>>>>> reducing the >>>>>>>>>>>>>> number of parallelism, but even without any parallelism, it is >>>>>>>>>>>>>> still high! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Ali >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> A.Nazemian >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> A.Nazemian >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> A.Nazemian >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> A.Nazemian >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> A.Nazemian >> > > -- A.Nazemian
