Anything going on in the kafka broker logs? On Fri, Apr 21, 2017 at 12:24 PM, Ali Nazemian <[email protected]> wrote:
> Although this is a test platform with a way less spec than production, it > should be enough for indexing 600 docs per second. I have seen benchmark > result of 150-200k docs per second with this spec! I haven't played with > tuning the template yet, but I still think the current rate does not make > sense at all. > > I have changed the batch size to 100. Throughput has been dropped, but > still a very high rate of failure! > > Please find the screenshots for the enrichments: > http://imgur.com/a/ceC8f > http://imgur.com/a/sBQwM > > On Sat, Apr 22, 2017 at 2:08 AM, Casey Stella <[email protected]> wrote: > >> Ok, yeah, those latencies are pretty high. I think what's happening is >> that the tuples aren't being acked fast enough and are timing out. How >> taxed is your ES box? Can you drop the batch size down to maybe 100 and >> see what happens? >> >> On Fri, Apr 21, 2017 at 12:05 PM, Ali Nazemian <[email protected]> >> wrote: >> >>> Please find the bolt part of Storm-UI related to indexing topology: >>> >>> http://imgur.com/a/tFkmO >>> >>> As you can see a hdfs error has also appeared which is not important >>> right now. >>> >>> On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> >>> wrote: >>> >>>> What's curious is the enrichment topology showing the same issues, but >>>> my mind went to ES as well. >>>> >>>> On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]> >>>> wrote: >>>> >>>>> Yes which bolt is reporting all those failures? My theory is that >>>>> there is some ES tuning that needs to be done. >>>>> >>>>> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]> >>>>> wrote: >>>>> >>>>>> Could I see a little more of that screen? Specifically what the >>>>>> bolts look like. >>>>>> >>>>>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Please find the storm-UI screenshot as follows. >>>>>>> >>>>>>> http://imgur.com/FhIrGFd >>>>>>> >>>>>>> >>>>>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian <[email protected] >>>>>>> > wrote: >>>>>>> >>>>>>>> Hi Casey, >>>>>>>> >>>>>>>> - topology.message.timeout: It was 30s at first. I have increased >>>>>>>> it to 300s, no changes! >>>>>>>> - It is a very basic geo-enrichment and simple rule for threat >>>>>>>> triage! >>>>>>>> - No, not at all. >>>>>>>> - I have changed that to find the best value. it is 5000 which is >>>>>>>> about to 5MB. >>>>>>>> - I have changed the number of executors for the Storm acker >>>>>>>> thread, and I have also changed the value of >>>>>>>> topology.max.spout.pending, >>>>>>>> still no changes! >>>>>>>> >>>>>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Also, >>>>>>>>> * what's your setting for topology.message.timeout? >>>>>>>>> * You said you're seeing this in indexing and enrichment, what >>>>>>>>> enrichments do you have in place? >>>>>>>>> * Is ES being taxed heavily? >>>>>>>>> * What's your ES batch size for the sensor? >>>>>>>>> >>>>>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> So you're seeing failures in the storm topology but no errors in >>>>>>>>>> the logs. Would you mind sending over a screenshot of the indexing >>>>>>>>>> topology from the storm UI? You might not be able to paste the >>>>>>>>>> image on >>>>>>>>>> the mailing list, so maybe an imgur link would be in order. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Casey >>>>>>>>>> >>>>>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Ryan, >>>>>>>>>>> >>>>>>>>>>> No, I cannot see any error inside the indexing error topic. >>>>>>>>>>> Also, the number of tuples is emitted and transferred to the error >>>>>>>>>>> indexing >>>>>>>>>>> bolt is zero! >>>>>>>>>>> >>>>>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Do you see any errors in the error* index in Elasticsearch? >>>>>>>>>>>> There are several catch blocks across the different topologies that >>>>>>>>>>>> transform errors into json objects and forward them on to the >>>>>>>>>>>> indexing >>>>>>>>>>>> topology. If you're not seeing anything in the worker logs it's >>>>>>>>>>>> likely the >>>>>>>>>>>> errors were captured there instead. >>>>>>>>>>>> >>>>>>>>>>>> Ryan >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> No everything is fine at the log level. Also, when I checked >>>>>>>>>>>>> resource consumption at the workers, there had been plenty >>>>>>>>>>>>> resources still >>>>>>>>>>>>> available! >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> After I tried to tune the Metron performance I have noticed >>>>>>>>>>>>>>> the rate of failure for the indexing/enrichment topologies are >>>>>>>>>>>>>>> very high >>>>>>>>>>>>>>> (about 95%). However, I can see the messages in Elasticsearch. >>>>>>>>>>>>>>> I have tried >>>>>>>>>>>>>>> to increase the timeout value for the acknowledgement. It >>>>>>>>>>>>>>> didn't fix the >>>>>>>>>>>>>>> problem. I can set the number of acker executors to 0 to >>>>>>>>>>>>>>> temporarily fix >>>>>>>>>>>>>>> the problem which is not a good idea at all. Do you have any >>>>>>>>>>>>>>> idea what have >>>>>>>>>>>>>>> caused such issue? The percentage of failure decreases by >>>>>>>>>>>>>>> reducing the >>>>>>>>>>>>>>> number of parallelism, but even without any parallelism, it is >>>>>>>>>>>>>>> still high! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Ali >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> A.Nazemian >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> A.Nazemian >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> A.Nazemian >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> A.Nazemian >>> >> >> > > > -- > A.Nazemian >
