So what is perplexing is that the latency is low and the capacity for each bolt is less than 1, so it's keeping up. I would have expected this kind of thing if the latency was high and timeouts were happening.
If you drop the spout pending config lower do you get to a point with no errors (at obvious consequences to throughput)? Also how many ackers are you running? On Sat, Apr 22, 2017 at 00:50 Ali Nazemian <[email protected]> wrote: > I have disabled the reliability retry by setting the number of > acker executors to zero. I can see based on the number of tuples have been > emitted on the indexing topologies and the number of documents in > Elasticsearch there is almost no missing document. It seems for some reason > acker executors can not pick the acknowledgement for indexing and > enrichments topology. However, it can be seen at the destination of those > topologies. > > I am also wondering where the best approach would be to find the failed > tuples? I though I can find them in the corresponding error topics which > seem to be not like that. > > On Sat, Apr 22, 2017 at 2:36 PM, Ali Nazemian <[email protected]> > wrote: > >> Is the following fact rings any bell? >> >> There is no failure at the bolt level acknowledgement, but from the >> topology status, the rate of failure is very high! This is the same >> scenario for both indexing and enrichment topologies. >> >> On Sat, Apr 22, 2017 at 2:29 PM, Ali Nazemian <[email protected]> >> wrote: >> >>> The value for topology.max.spout.pending is 1000 currently. I did >>> decrease it previously to understand the effect of that value on my >>> problem. Clearly, throughput dropped, but still a very high rate of failure! >>> >>> On Sat, Apr 22, 2017 at 3:12 AM, Casey Stella <[email protected]> >>> wrote: >>> >>>> Ok, so ignoring the indexing topology, the fact that you're seeing >>>> failures in the enrichment topology, which has no ES component, is >>>> telling. It's also telling that the enrichment topology stats are >>>> perfectly sensible latency-wise (i.e. it's not sweating). >>>> >>>> What's your storm configuration for topology.max.spout.pending? If >>>> it's not set, then try setting it to 1000 and bouncing the topologies. >>>> >>>> On Fri, Apr 21, 2017 at 12:54 PM, Ali Nazemian <[email protected]> >>>> wrote: >>>> >>>>> No, nothing ... >>>>> >>>>> On Sat, Apr 22, 2017 at 2:46 AM, Casey Stella <[email protected]> >>>>> wrote: >>>>> >>>>>> Anything going on in the kafka broker logs? >>>>>> >>>>>> On Fri, Apr 21, 2017 at 12:24 PM, Ali Nazemian <[email protected] >>>>>> > wrote: >>>>>> >>>>>>> Although this is a test platform with a way less spec than >>>>>>> production, it should be enough for indexing 600 docs per second. I have >>>>>>> seen benchmark result of 150-200k docs per second with this spec! I >>>>>>> haven't >>>>>>> played with tuning the template yet, but I still think the current rate >>>>>>> does not make sense at all. >>>>>>> >>>>>>> I have changed the batch size to 100. Throughput has been dropped, >>>>>>> but still a very high rate of failure! >>>>>>> >>>>>>> Please find the screenshots for the enrichments: >>>>>>> http://imgur.com/a/ceC8f >>>>>>> http://imgur.com/a/sBQwM >>>>>>> >>>>>>> On Sat, Apr 22, 2017 at 2:08 AM, Casey Stella <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Ok, yeah, those latencies are pretty high. I think what's >>>>>>>> happening is that the tuples aren't being acked fast enough and are >>>>>>>> timing >>>>>>>> out. How taxed is your ES box? Can you drop the batch size down to >>>>>>>> maybe >>>>>>>> 100 and see what happens? >>>>>>>> >>>>>>>> On Fri, Apr 21, 2017 at 12:05 PM, Ali Nazemian < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Please find the bolt part of Storm-UI related to indexing topology: >>>>>>>>> >>>>>>>>> http://imgur.com/a/tFkmO >>>>>>>>> >>>>>>>>> As you can see a hdfs error has also appeared which is not >>>>>>>>> important right now. >>>>>>>>> >>>>>>>>> On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> What's curious is the enrichment topology showing the same >>>>>>>>>> issues, but my mind went to ES as well. >>>>>>>>>> >>>>>>>>>> On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Yes which bolt is reporting all those failures? My theory is >>>>>>>>>>> that there is some ES tuning that needs to be done. >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Could I see a little more of that screen? Specifically what >>>>>>>>>>>> the bolts look like. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Please find the storm-UI screenshot as follows. >>>>>>>>>>>>> >>>>>>>>>>>>> http://imgur.com/FhIrGFd >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Casey, >>>>>>>>>>>>>> >>>>>>>>>>>>>> - topology.message.timeout: It was 30s at first. I have >>>>>>>>>>>>>> increased it to 300s, no changes! >>>>>>>>>>>>>> - It is a very basic geo-enrichment and simple rule for >>>>>>>>>>>>>> threat triage! >>>>>>>>>>>>>> - No, not at all. >>>>>>>>>>>>>> - I have changed that to find the best value. it is 5000 >>>>>>>>>>>>>> which is about to 5MB. >>>>>>>>>>>>>> - I have changed the number of executors for the Storm acker >>>>>>>>>>>>>> thread, and I have also changed the value of >>>>>>>>>>>>>> topology.max.spout.pending, >>>>>>>>>>>>>> still no changes! >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, >>>>>>>>>>>>>>> * what's your setting for topology.message.timeout? >>>>>>>>>>>>>>> * You said you're seeing this in indexing and enrichment, >>>>>>>>>>>>>>> what enrichments do you have in place? >>>>>>>>>>>>>>> * Is ES being taxed heavily? >>>>>>>>>>>>>>> * What's your ES batch size for the sensor? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So you're seeing failures in the storm topology but no >>>>>>>>>>>>>>>> errors in the logs. Would you mind sending over a screenshot >>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>> indexing topology from the storm UI? You might not be able to >>>>>>>>>>>>>>>> paste the >>>>>>>>>>>>>>>> image on the mailing list, so maybe an imgur link would be in >>>>>>>>>>>>>>>> order. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Casey >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Ryan, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No, I cannot see any error inside the indexing error >>>>>>>>>>>>>>>>> topic. Also, the number of tuples is emitted and transferred >>>>>>>>>>>>>>>>> to the error >>>>>>>>>>>>>>>>> indexing bolt is zero! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Do you see any errors in the error* index in >>>>>>>>>>>>>>>>>> Elasticsearch? There are several catch blocks across the >>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>> topologies that transform errors into json objects and >>>>>>>>>>>>>>>>>> forward them on to >>>>>>>>>>>>>>>>>> the indexing topology. If you're not seeing anything in the >>>>>>>>>>>>>>>>>> worker logs >>>>>>>>>>>>>>>>>> it's likely the errors were captured there instead. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> No everything is fine at the log level. Also, when I >>>>>>>>>>>>>>>>>>> checked resource consumption at the workers, there had been >>>>>>>>>>>>>>>>>>> plenty >>>>>>>>>>>>>>>>>>> resources still available! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Seeing anything in the storm logs for the workers? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> After I tried to tune the Metron performance I have >>>>>>>>>>>>>>>>>>>>> noticed the rate of failure for the indexing/enrichment >>>>>>>>>>>>>>>>>>>>> topologies are very >>>>>>>>>>>>>>>>>>>>> high (about 95%). However, I can see the messages in >>>>>>>>>>>>>>>>>>>>> Elasticsearch. I have >>>>>>>>>>>>>>>>>>>>> tried to increase the timeout value for the >>>>>>>>>>>>>>>>>>>>> acknowledgement. It didn't fix >>>>>>>>>>>>>>>>>>>>> the problem. I can set the number of acker executors to 0 >>>>>>>>>>>>>>>>>>>>> to temporarily >>>>>>>>>>>>>>>>>>>>> fix the problem which is not a good idea at all. Do you >>>>>>>>>>>>>>>>>>>>> have any idea what >>>>>>>>>>>>>>>>>>>>> have caused such issue? The percentage of failure >>>>>>>>>>>>>>>>>>>>> decreases by reducing the >>>>>>>>>>>>>>>>>>>>> number of parallelism, but even without any parallelism, >>>>>>>>>>>>>>>>>>>>> it is still high! >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>>> Ali >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> A.Nazemian >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> A.Nazemian >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> A.Nazemian >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> A.Nazemian >>>>> >>>> >>>> >>> >>> >>> -- >>> A.Nazemian >>> >> >> >> >> -- >> A.Nazemian >> > > > > -- > A.Nazemian >
