No, nothing ... On Sat, Apr 22, 2017 at 2:46 AM, Casey Stella <[email protected]> wrote:
> Anything going on in the kafka broker logs? > > On Fri, Apr 21, 2017 at 12:24 PM, Ali Nazemian <[email protected]> > wrote: > >> Although this is a test platform with a way less spec than production, it >> should be enough for indexing 600 docs per second. I have seen benchmark >> result of 150-200k docs per second with this spec! I haven't played with >> tuning the template yet, but I still think the current rate does not make >> sense at all. >> >> I have changed the batch size to 100. Throughput has been dropped, but >> still a very high rate of failure! >> >> Please find the screenshots for the enrichments: >> http://imgur.com/a/ceC8f >> http://imgur.com/a/sBQwM >> >> On Sat, Apr 22, 2017 at 2:08 AM, Casey Stella <[email protected]> wrote: >> >>> Ok, yeah, those latencies are pretty high. I think what's happening is >>> that the tuples aren't being acked fast enough and are timing out. How >>> taxed is your ES box? Can you drop the batch size down to maybe 100 and >>> see what happens? >>> >>> On Fri, Apr 21, 2017 at 12:05 PM, Ali Nazemian <[email protected]> >>> wrote: >>> >>>> Please find the bolt part of Storm-UI related to indexing topology: >>>> >>>> http://imgur.com/a/tFkmO >>>> >>>> As you can see a hdfs error has also appeared which is not important >>>> right now. >>>> >>>> On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> >>>> wrote: >>>> >>>>> What's curious is the enrichment topology showing the same issues, but >>>>> my mind went to ES as well. >>>>> >>>>> On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]> >>>>> wrote: >>>>> >>>>>> Yes which bolt is reporting all those failures? My theory is that >>>>>> there is some ES tuning that needs to be done. >>>>>> >>>>>> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Could I see a little more of that screen? Specifically what the >>>>>>> bolts look like. >>>>>>> >>>>>>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Please find the storm-UI screenshot as follows. >>>>>>>> >>>>>>>> http://imgur.com/FhIrGFd >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Casey, >>>>>>>>> >>>>>>>>> - topology.message.timeout: It was 30s at first. I have increased >>>>>>>>> it to 300s, no changes! >>>>>>>>> - It is a very basic geo-enrichment and simple rule for threat >>>>>>>>> triage! >>>>>>>>> - No, not at all. >>>>>>>>> - I have changed that to find the best value. it is 5000 which is >>>>>>>>> about to 5MB. >>>>>>>>> - I have changed the number of executors for the Storm acker >>>>>>>>> thread, and I have also changed the value of >>>>>>>>> topology.max.spout.pending, >>>>>>>>> still no changes! >>>>>>>>> >>>>>>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Also, >>>>>>>>>> * what's your setting for topology.message.timeout? >>>>>>>>>> * You said you're seeing this in indexing and enrichment, what >>>>>>>>>> enrichments do you have in place? >>>>>>>>>> * Is ES being taxed heavily? >>>>>>>>>> * What's your ES batch size for the sensor? >>>>>>>>>> >>>>>>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> So you're seeing failures in the storm topology but no errors in >>>>>>>>>>> the logs. Would you mind sending over a screenshot of the indexing >>>>>>>>>>> topology from the storm UI? You might not be able to paste the >>>>>>>>>>> image on >>>>>>>>>>> the mailing list, so maybe an imgur link would be in order. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Casey >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>> >>>>>>>>>>>> No, I cannot see any error inside the indexing error topic. >>>>>>>>>>>> Also, the number of tuples is emitted and transferred to the error >>>>>>>>>>>> indexing >>>>>>>>>>>> bolt is zero! >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Do you see any errors in the error* index in Elasticsearch? >>>>>>>>>>>>> There are several catch blocks across the different topologies >>>>>>>>>>>>> that >>>>>>>>>>>>> transform errors into json objects and forward them on to the >>>>>>>>>>>>> indexing >>>>>>>>>>>>> topology. If you're not seeing anything in the worker logs it's >>>>>>>>>>>>> likely the >>>>>>>>>>>>> errors were captured there instead. >>>>>>>>>>>>> >>>>>>>>>>>>> Ryan >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> No everything is fine at the log level. Also, when I checked >>>>>>>>>>>>>> resource consumption at the workers, there had been plenty >>>>>>>>>>>>>> resources still >>>>>>>>>>>>>> available! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> After I tried to tune the Metron performance I have noticed >>>>>>>>>>>>>>>> the rate of failure for the indexing/enrichment topologies are >>>>>>>>>>>>>>>> very high >>>>>>>>>>>>>>>> (about 95%). However, I can see the messages in Elasticsearch. >>>>>>>>>>>>>>>> I have tried >>>>>>>>>>>>>>>> to increase the timeout value for the acknowledgement. It >>>>>>>>>>>>>>>> didn't fix the >>>>>>>>>>>>>>>> problem. I can set the number of acker executors to 0 to >>>>>>>>>>>>>>>> temporarily fix >>>>>>>>>>>>>>>> the problem which is not a good idea at all. Do you have any >>>>>>>>>>>>>>>> idea what have >>>>>>>>>>>>>>>> caused such issue? The percentage of failure decreases by >>>>>>>>>>>>>>>> reducing the >>>>>>>>>>>>>>>> number of parallelism, but even without any parallelism, it is >>>>>>>>>>>>>>>> still high! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Ali >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> A.Nazemian >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> A.Nazemian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> A.Nazemian >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> A.Nazemian >>>> >>> >>> >> >> >> -- >> A.Nazemian >> > > -- A.Nazemian
