Re: Stellar and distinct count

2020-01-28 Thread Yerex, Tom
Hi Nick, Thank you for putting time into this example. I am beginning to suspect the only thing that you and others on the list are worried about is kryptonite. Thanks for your help, Tom. From: Nick Allen Reply-To: "user@metron.apache.org" Date: Tuesday, January 28, 2020 at 8:15

RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
I only restarted before running th first test, since all the configurations are the same in the three tests.

Re: Profiler doubt

2020-01-28 Thread Nick Allen
Are you restarting the topology between all of these tests? On Tue, Jan 28, 2020 at 11:09 AM Gonçalo Pedras wrote: > Hi, > > This profiler is really inconsistent, i’m going crazy right now. > > I’ve made a further investigation and this is really bugging my mind: > > 1. I’m not expecting

Re: Stellar and distinct count

2020-01-28 Thread Nick Allen
Hi Tom - > If the login points are geographic outliers, then: > - Check if there is more than two distinct IP addresses that the person has used; and > - If there are more than two distinct IP addresses, increase the score. I believe you could do something like the following. This would touch

RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
Hi, This profiler is really inconsistent, i’m going crazy right now. I’ve made a further investigation and this is really bugging my mind: 1. I’m not expecting to receive15 hours old messages. In fact I’m the one who’s picking the messages from the current time and sending them to Kafka,

Re: How to choose the "topology.max.spout.pending" value for the profiler topology depending on the number of events?

2020-01-28 Thread Nick Allen
Hi Vladimir - > Why we didn't see any emitted data from builderBolt with small "topology.max.spout.pending"? This is because the topology.max.spout.pending setting prevented the spout from consuming additional messages before there were enough messages for Storm's event time queue to flush a set

Re: Profiler doubt

2020-01-28 Thread Nick Allen
Prior to this point in time, the Profiler had received a message indicating that the current time is Mon Jan 27 2020 17:46:44 GMT. It then received a message with a timestamp of Tue Jan 28 2020 09:02:52 GMT, about 15 hours in the future. Since this time gap is significantly larger than your

How to choose the "topology.max.spout.pending" value for the profiler topology depending on the number of events?

2020-01-28 Thread Vladimir Mikhailov
We are trying to tune performance for profiler topology now. In config file for profiler there are no many parameters to do this. Therefore we've tried to change "topology.max.spot.pending". And we can't undestand how profiler performance depend on this parameter. We have about 6000-7000

RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
Hi again, I found something in the profiler storm logs that proves the delay: “2020-01-28 09:46:37.061 o.a.m.p.s.FixedFrequencyFlushSignal watermark-event-generator-0 [WARN] Timestamp out-of-order by -54968000 ms. This may indicate a problem in the data. timestamp=1580202172000,