RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
I only restarted before running th first test, since all the configurations are 
the same in the three tests.



Re: Profiler doubt

2020-01-28 Thread Nick Allen
Are you restarting the topology between all of these tests?

On Tue, Jan 28, 2020 at 11:09 AM Gonçalo Pedras 
wrote:

> Hi,
>
> This profiler is really inconsistent, i’m going crazy right now.
>
> I’ve made a further investigation and this is really bugging my mind:
>
> 1.   I’m not expecting to receive15 hours old messages. In fact I’m
> the one who’s picking the messages from the current time and sending them
> to Kafka, for instance, let’s say it’s 15h33 GMT, I would pick a message
> like this one: “<182>Jan 28 2020 15:33:14 # : %ASA-6-305011: Built
> dynamic TCP translation from ###/48678 to /48678” and
> send it to Kafka.
>
> 2.   These messages are successfully parsed because I can find them
> in the “enrichments” topic in Kafka. And the messages have the right
> “timestamp” field when parsed. So the problem is not in the messages
> themselves. (The syslog timestamp is the value of the timestamp).
>
> 3.   The results of the Profile Client are really off.
>
>
>
> I ran a test:
>
> · I sent 4 messages at 14h18; and 5 messages at 14h25; All the
> messages have the same syslog severity.
>
> If my profiler runs every 15 minutes than the range of 14h15 to 14h30 the
> result must be 9:
>
>
>
> {period.start=158022090, period=1755801,
> profile=ClientA_syslog_severety_count, period.end=158022180, groups=[],
> value=9, entity=info}
>
>
>
> Surprisingly it’s right. Than I ran a second test:
>
> · I sent 4 messages at 14h41; and 3 messages at 14h48; all the
> messages have the same syslog severity.
>
> With that said the result must be 7. Here’s the result:
>
>
>
> {period.start=158022180, period=1755802,
> profile=ClientA_syslog_severety_count, period.end=158022270, groups=[],
> value=9, entity=info}
>
>
>
> I ran a third test:
>
> · Sent 3 messages at 15h51.
>
> The profiler returned none:
>
>
>
> {period.start=158022630, period=1755807,
> profile=ClientA_syslog_severety_count, period.end=158022720, groups=[],
> value=0, entity=info}
>
>
>
> I checked the Kafka topics to make sure there weren’t more messages than
> it was supposed to. Everything is consistent except the profiler. I’m about
> to nuke myself.
>
>
>
> Thanks
>
>
>
>


RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
Hi,
This profiler is really inconsistent, i’m going crazy right now.
I’ve made a further investigation and this is really bugging my mind:

1.   I’m not expecting to receive15 hours old messages. In fact I’m the one 
who’s picking the messages from the current time and sending them to Kafka, for 
instance, let’s say it’s 15h33 GMT, I would pick a message like this one: 
“<182>Jan 28 2020 15:33:14 # : %ASA-6-305011: Built dynamic TCP 
translation from ###/48678 to /48678” and send it to Kafka.

2.   These messages are successfully parsed because I can find them in the 
“enrichments” topic in Kafka. And the messages have the right “timestamp” field 
when parsed. So the problem is not in the messages themselves. (The syslog 
timestamp is the value of the timestamp).

3.   The results of the Profile Client are really off.

I ran a test:

· I sent 4 messages at 14h18; and 5 messages at 14h25; All the messages 
have the same syslog severity.
If my profiler runs every 15 minutes than the range of 14h15 to 14h30 the 
result must be 9:

{period.start=158022090, period=1755801, 
profile=ClientA_syslog_severety_count, period.end=158022180, groups=[], 
value=9, entity=info}

Surprisingly it’s right. Than I ran a second test:

· I sent 4 messages at 14h41; and 3 messages at 14h48; all the messages 
have the same syslog severity.
With that said the result must be 7. Here’s the result:

{period.start=158022180, period=1755802, 
profile=ClientA_syslog_severety_count, period.end=158022270, groups=[], 
value=9, entity=info}

I ran a third test:

· Sent 3 messages at 15h51.
The profiler returned none:

{period.start=158022630, period=1755807, 
profile=ClientA_syslog_severety_count, period.end=158022720, groups=[], 
value=0, entity=info}

I checked the Kafka topics to make sure there weren’t more messages than it was 
supposed to. Everything is consistent except the profiler. I’m about to nuke 
myself.

Thanks



Re: Profiler doubt

2020-01-28 Thread Nick Allen
Prior to this point in time, the Profiler had received a message indicating
that the current time is Mon Jan 27 2020 17:46:44 GMT.  It then received a
message with a timestamp of Tue Jan 28 2020 09:02:52 GMT, about 15 hours in
the future.  Since this time gap is significantly larger than your profile
period, a log message is written to warn you of a data quality issue. I can
think of a few possible causes of this that you will want to investigate
further.

(1) There is a large time gap in the data that you are processing.  Is this
expected?  Was there an outage that caused an interruption in your data
feed?  If the outage is expected, then you can safely ignore this warning.

(2) Your data is significantly out-of-order for some reason. You can
accommodate some out-of-order data by adjusting the Profiler's time lag
,
but it does not seem reasonable to account for 15 hours.

(3) If your profile is processing telemetry from different sources, perhaps
there is a timestamp in one of these sources that is significantly
different than all the others.

Hope this helps


On Tue, Jan 28, 2020 at 5:00 AM Gonçalo Pedras 
wrote:

> Hi again,
>
> I found something in the profiler storm logs that proves the delay:
>
> “2020-01-28 09:46:37.061 o.a.m.p.s.FixedFrequencyFlushSignal
> watermark-event-generator-0 [WARN] Timestamp out-of-order by -54968000 ms.
> This may indicate a problem in the data. timestamp=1580202172000,
> maxKnown=1580147204000, flushFreq=90 ms”
>
>
>
> The profiler is delayed 15 hours and a half.
>


RE: Profiler doubt

2020-01-28 Thread Gonçalo Pedras
Hi again,
I found something in the profiler storm logs that proves the delay:
“2020-01-28 09:46:37.061 o.a.m.p.s.FixedFrequencyFlushSignal 
watermark-event-generator-0 [WARN] Timestamp out-of-order by -54968000 ms. This 
may indicate a problem in the data. timestamp=1580202172000, 
maxKnown=1580147204000, flushFreq=90 ms”

The profiler is delayed 15 hours and a half.


RE: Profiler doubt

2020-01-27 Thread Gonçalo Pedras
Hi Allen,

Thanks for the reply by the way.
I’ve been checking my profiler, tweaking some options and whatnot. I’ve set the 
“timestampField” and solved half the issue.
I ran the spark batch profiler and it rectified the counts. Then I started the 
Storm profiler once again. Now the profiler is delayed one hour and a half and 
counting the same record twice. I’ve restarted it, reinstalled it and it keeps 
delayed somehow. I even droped the HBase table and created a new one and the 
Storm profiler is still delayed.
I’m sending the records myself, so it’s just 2 to 10 records at a time. And the 
topology logs in the Storm UI tells it’s actually doing its job. The system 
time is synchronized and the ASA records are generated in the same timezone. 
I’m running out of options here.

Thanks


Re: Profiler doubt

2020-01-23 Thread Nick Allen
Hi Gonçalo -

What could be happening is that your Profiler is not tuned to be able to
keep up with the amount of incoming data that you have. I would guess that
the Profiler "keeps counting beyond that period of time" because it is
still processing old data that is queued up.

   - How much data is hitting your indexing topic (events per second)?
   - Have you tried to tune any of the Profiler settings?

Here are some other things to keep in mind.

   - Check the Storm UI to see what the Profiler topology is doing and
   whether it is keeping up.
   - Increase the resources available to the Profiler so that it can keep
   up with the incoming data.
   - Use "event time" processing (set the "timestamp field")
   

   which will ensure that the profiles are being written using the timestamps
   contained within your data, rather than wall clock time.  This will prevent
   your profiles from becoming skewed if processing falls behind.

Best of luck

On Mon, Jan 20, 2020 at 6:49 AM Gonçalo Pedras 
wrote:

> Hi,
>
> I’ve deployed Apache Metron with HDP 3.1 support provided by the GitHub
> repository (
> https://github.com/apache/metron/blob/feature/METRON-2088-support-hdp-3.1
> ).
>
> I’ve some questions about the Profiler and somehow confused. I’m testing
> the ASA parser and i’ve deployed two profiles:
>
> 1.   Counting ip_src_addr.
>
> 2.   Counting syslog_severity.
>
> The profiler properties have the default settings.
>
> I ran the parser last friday for a couple of seconds and it generated
> about three thousand records. Today I ran the ‘PROFILER_GET’ in Stellar for
> a ‘PROFILE_FIXED’ of 72 hours and I checked it against the Elasticsearch
> index and I realised the counts don’t match. For exemple, for a specific IP
> source “a” in that period of time I got 21 hits and in the result of
> ‘PROFILER_GET’ returned a stream of results that make no sense to me. My
> source of the ASA parser wasn’t sending any records to Kafka and somehow
> the profiler managed to keep counting beyond that period of time. Where it
> should be something like: [21], it returned [27, 27, 27, 54, 27, 27, …] .
> My question is:
>
> · Is the Profiler working fine? And if it is, can someone explain
> it to me?
>
> · And if it is not woking well, what is the problem, and how to
> fix it?
>
>
>
> Thanks
>
>
>