Ok, yeah, those latencies are pretty high.  I think what's happening is
that the tuples aren't being acked fast enough and are timing out.  How
taxed is your ES box?  Can you drop the batch size down to maybe 100 and
see what happens?

On Fri, Apr 21, 2017 at 12:05 PM, Ali Nazemian <[email protected]>
wrote:

> Please find the bolt part of Storm-UI related to indexing topology:
>
> http://imgur.com/a/tFkmO
>
> As you can see a hdfs error has also appeared which is not important right
> now.
>
> On Sat, Apr 22, 2017 at 1:59 AM, Casey Stella <[email protected]> wrote:
>
>> What's curious is the enrichment topology showing the same issues, but my
>> mind went to ES as well.
>>
>> On Fri, Apr 21, 2017 at 11:57 AM, Ryan Merriman <[email protected]>
>> wrote:
>>
>>> Yes which bolt is reporting all those failures?  My theory is that there
>>> is some ES tuning that needs to be done.
>>>
>>> On Fri, Apr 21, 2017 at 10:53 AM, Casey Stella <[email protected]>
>>> wrote:
>>>
>>>> Could I see a little more of that screen?  Specifically what the bolts
>>>> look like.
>>>>
>>>> On Fri, Apr 21, 2017 at 11:51 AM, Ali Nazemian <[email protected]>
>>>> wrote:
>>>>
>>>>> Please find the storm-UI screenshot as follows.
>>>>>
>>>>> http://imgur.com/FhIrGFd
>>>>>
>>>>>
>>>>> On Sat, Apr 22, 2017 at 1:41 AM, Ali Nazemian <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Casey,
>>>>>>
>>>>>> - topology.message.timeout: It was 30s at first. I have increased it
>>>>>> to 300s, no changes!
>>>>>> - It is a very basic geo-enrichment and simple rule for threat triage!
>>>>>> - No, not at all.
>>>>>> - I have changed that to find the best value. it is 5000 which is
>>>>>> about to 5MB.
>>>>>> - I have changed the number of executors for the Storm acker thread,
>>>>>> and I have also changed the value of topology.max.spout.pending, still no
>>>>>> changes!
>>>>>>
>>>>>> On Sat, Apr 22, 2017 at 1:24 AM, Casey Stella <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Also,
>>>>>>> * what's your setting for topology.message.timeout?
>>>>>>> * You said you're seeing this in indexing and enrichment, what
>>>>>>> enrichments do you have in place?
>>>>>>> * Is ES being taxed heavily?
>>>>>>> * What's your ES batch size for the sensor?
>>>>>>>
>>>>>>> On Fri, Apr 21, 2017 at 10:46 AM, Casey Stella <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> So you're seeing failures in the storm topology but no errors in
>>>>>>>> the logs.  Would you mind sending over a screenshot of the indexing
>>>>>>>> topology from the storm UI?  You might not be able to paste the image 
>>>>>>>> on
>>>>>>>> the mailing list, so maybe an imgur link would be in order.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Casey
>>>>>>>>
>>>>>>>> On Fri, Apr 21, 2017 at 10:34 AM, Ali Nazemian <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Ryan,
>>>>>>>>>
>>>>>>>>> No, I cannot see any error inside the indexing error topic. Also,
>>>>>>>>> the number of tuples is emitted and transferred to the error indexing 
>>>>>>>>> bolt
>>>>>>>>> is zero!
>>>>>>>>>
>>>>>>>>> On Sat, Apr 22, 2017 at 12:29 AM, Ryan Merriman <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Do you see any errors in the error* index in Elasticsearch?
>>>>>>>>>> There are several catch blocks across the different topologies that
>>>>>>>>>> transform errors into json objects and forward them on to the 
>>>>>>>>>> indexing
>>>>>>>>>> topology.  If you're not seeing anything in the worker logs it's 
>>>>>>>>>> likely the
>>>>>>>>>> errors were captured there instead.
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>> On Fri, Apr 21, 2017 at 9:19 AM, Ali Nazemian <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> No everything is fine at the log level. Also, when I checked
>>>>>>>>>>> resource consumption at the workers, there had been plenty 
>>>>>>>>>>> resources still
>>>>>>>>>>> available!
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Apr 21, 2017 at 10:04 PM, Casey Stella <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Seeing anything in the storm logs for the workers?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 21, 2017 at 07:41 Ali Nazemian <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> After I tried to tune the Metron performance I have noticed
>>>>>>>>>>>>> the rate of failure for the indexing/enrichment topologies are 
>>>>>>>>>>>>> very high
>>>>>>>>>>>>> (about 95%). However, I can see the messages in Elasticsearch. I 
>>>>>>>>>>>>> have tried
>>>>>>>>>>>>> to increase the timeout value for the acknowledgement. It didn't 
>>>>>>>>>>>>> fix the
>>>>>>>>>>>>> problem. I can set the number of acker executors to 0 to 
>>>>>>>>>>>>> temporarily fix
>>>>>>>>>>>>> the problem which is not a good idea at all. Do you have any idea 
>>>>>>>>>>>>> what have
>>>>>>>>>>>>> caused such issue? The percentage of failure decreases by 
>>>>>>>>>>>>> reducing the
>>>>>>>>>>>>> number of parallelism, but even without any parallelism, it is 
>>>>>>>>>>>>> still high!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Ali
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> A.Nazemian
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> A.Nazemian
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> A.Nazemian
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> A.Nazemian
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> A.Nazemian
>

Reply via email to