[GitHub] incubator-metron issue #408: METRON-608 Mpack to install a single-node test ...

2017-01-24 Thread mattf-horton
Github user mattf-horton commented on the issue:

https://github.com/apache/incubator-metron/pull/408
  
Hi @dlyle65535 , from a design perspective, I totally agree that we should 
have a single MPack that works on any size cluster, and I hope to replace both 
the 4(+)-node and single-node versions in the future with a more general 
solution.  I even have a prototype of such.  My reasons for proceeding with the 
split-off version at this time were entirely pragmatic, due to time pressure:

- I really needed a single-node that works with Ambari, for testing other 
PRs.  
- I spent several days trying to make Elasticsearch work on a single node 
installation, while retaining the generality of the multi-node case, but 
couldn't make it work.  This chewed up a lot of time.
- I put together a unified Mpack that used two different template files for 
Elasticsearch, depending whether it was for a single-node or multi-node 
install.  That worked just fine, on the surface, but was clearly going to 
require in-depth testing of 1, 2, 3, and 4 node cases.  I do not currently have 
time to thoroughly test all the special cases.
- Doing it this way I only had to test the single-node case, which was 
fairly fast.
- Having made one that works, I'd like to make it available to others.
- Now, if it is not accepted into the code, the high rate of change in 
metron-deployment will soon obsolete it.

That's pretty much it.  It's a throw-away, but not until we do the work of 
actually producing the generalized solution and testing it thoroughly.  Which I 
evaluate as lower priority than a number of other more urgent needs.  Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] How to do Sliding Windows in Profiler

2017-01-24 Thread Matt Foley
Hi all,

Casey and I had an interesting chat yesterday, during which we agreed that the 
example code for Outlier Analysis in 
https://github.com/apache/incubator-metron/blob/master/metron-analytics/metron-statistics/README.md
 and the revised example code in 
https://issues.apache.org/jira/browse/METRON-668 (as of 23 January) both do not 
correctly implement the desired Sliding Window model.  This email gives the 
argument for why, and proposes a couple ways to do it right.  Your input and 
preferences are requested.

 

First a couple statements about the STATS object that underlies most 
interesting Profile work:

· The STATS object is a t-digest.  It summarizes a set of data points, 
such as those received during a sampling period, in a way that is nominally 
O(1) regardless of the input number of data points, and preserves the info 
about the “shape” of the statistical distribution of those points.  Not only 
info about averages and standard deviations, but also about medians and 
percentiles (which, btw, is a very different kind of information), is preserved 
and can be calculated correctly to within a given error epsilon.  Since it is a 
summary, however, time information is lost.

· STATS objects, these digests of sampling periods, are MERGEABLE, 
meaning if you have a digest from time(1) to time(2), and another digest from 
time(2) to time (3), you can merge them and get a digest that is statistically 
equivalent to a digest from time(1) to time(3) continuously.

· They are sort of idempotent, in that if you take two identical 
digests and merge them, you get almost the same object.  However, the result 
object will be scaled as summarizing twice the number of input data points.

· Which is why it DOESN’T work to try to merge overlapping sampling 
periods.  To give a crude example, if you have a digest from time(1) to time(3) 
and another digest from time(2) to time(4), and merge them, the samples from 
time(2) to time(3) will be over-represented by a factor of 2x, which should be 
expected to skew the distribution (unless the distribution really is constant 
for all sub-windows – which would mean we don’t need Sliding Windows because 
nothing changes).

 

The Outlier Analysis profiles linked above try to implement a sliding window, 
in which each profile period summarizes the Median Absolute Deviation 
distribution of the last five profile periods only.  An “Outlier MAD Score” can 
then be determined by comparing the deviation of a new data point to the last 
MAD distribution recorded in the Profile.  This allows for changes over time in 
the statistical distribution of inputs, but does not make the measure unduly 
sensitive to just the last minute or two.  This is a typical use case for 
Sliding Windows.

 

Both example codes trip on how to do the sliding window in the context of a 
Profile.  At sampling period boundaries, both either compose the “result” 
digest or initialize the “next” digest by reading the previous 5 result 
digests.  That is wrong, because it ignores the fact that those digests aren’t 
just for their time periods.  They too were composed with THEIR preceding 5 
digests, each of which were composed with their preceding 5 digests, which in 
turn… etc.  The end result is sort of like the way Madieras or some brandies 
are aged via the Solera process with fractional blending.  You don’t get a true 
sliding window, which sharply drops the past, you get a continuous dilution of 
the past. In fact, it’s wrong to assume that the “tail” of the far past is more 
diluted than the near past! – It would be with some algorithms, but the alg 
used in these two examples causes the far past to become an exponentially MORE 
important fraction of the overall data than the near past – much worse than 
simply turning on digesting at time(0) and leaving it on, with no attempt at 
windowing.  (Simulate it in a spreadsheet and you’ll see.)

 

We need a Profiler structure that assists in creating Sliding Window profiles.  
The problem is that Profiles let you save only one thing (quantity or object) 
per sampling period, and that’s typically a different “thing” (object type or 
scale) than you want to use to compose the result for each windowed span.  One 
way to do it correctly would be with two Profiles, like this:

 

(SOLUTION A)

{

  "profiles": [

{

  "profile": "sketchy_mad",

  "foreach": "'global'",

  "init" : {

"s": "OUTLIER_MAD_INIT()"

},

  "update": {

"s": "OUTLIER_MAD_ADD(s, value)"

},

  "result": "s"

},

{

  "profile": "windowed_mad",

  "foreach": "'global'",

  "init" : { },

  "update": { },

  "result": "OUTLIER_MAD_STATE_MERGE(PROFILE_GET('sketchy_mad',

'global', 5, 'MINUTES'))"

}

  ]

}

 

This is typical.  You have a fine-grain sampling period that you want to 
“tumble”, and a broader window that you want to “slide” or “roll” along the 

[GitHub] incubator-metron issue #316: METRON-503: Metron REST API

2017-01-24 Thread jjmeyer0
Github user jjmeyer0 commented on the issue:

https://github.com/apache/incubator-metron/pull/316
  
@merrimanr this is looking good. I like how you broke out some of the 
services and started a constants class. There were two very minor things that I 
noticed.

1. `@Override` should be used wherever possible on all service class method 
that implement an interface.
2. I don't believe `@Service` is needed on the interfaces.

I continued to test some more endpoints and things are continuing to look 
good. I still want to test out some more endpoints.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
Assuming we're going to write all errors to a single error topic, I think
it makes sense to agree on an error message schema and handle errors across
the 3 different topologies in the same way with a single implementation.
The implementation in ParserBolt (ErrorUtils.handleError) produces the most
verbose error object so I think it's a good candidate for the single
implementation.  Here is the message structure it currently produces:

{
  "exception": "java.lang.Exception: there was an error",
  "hostname": "host",
  "stack": "java.lang.Exception: ...",
  "time": 1485295416563,
  "message": "there was an error",
  "rawMessage": "raw message",
  "rawMessage_bytes": [],
  "source.type": "bro_error"
}

>From our discussion so far we need to add a couple fields:  an error type
and hash id.  Adding these to the message looks like:

{
  "exception": "java.lang.Exception: there was an error",
  "hostname": "host",
  "stack": "java.lang.Exception: ...",
  "time": 1485295416563,
  "message": "there was an error",
  "rawMessage": "raw message",
  "rawMessage_bytes": [],
  "source.type": "bro_error",
  "error.type": "parser_error",
  "rawMessage_hash": "dde41b9920954f94066daf6291fb58a9"
}

We should also consider expanding the error types I listed earlier.
Instead of just having "indexing_error" we could have
"elasticsearch_indexing_error", "hdfs_indexing_error" and so on.

Jon, if an exception happens in an enrichment or threat intel bolt the
message is passed along with no error thrown (only logged).  Everywhere
else I'm having trouble identifying specific fields that should be hashed.
Would hashing the message in every case be acceptable?  Do you know of a
place where we could hash a field instead?  On the topic of exceptions in
enrichments, are we ok with an error only being logged and not added to the
message or emitted to the error queue?



On Tue, Jan 24, 2017 at 3:10 PM, Ryan Merriman  wrote:

> That use case makes sense to me.  I don't think it will require that much
> additional effort either.
>
> On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com 
> wrote:
>
>> Regarding error vs validation - Either way I'm not very concerned.  I
>> initially assumed they would be combined and agree with that approach, but
>> splitting them out isn't a very big deal to me either.
>>
>> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
>> where it's not possible to pick out the exact thing causing the issue) it
>> would be a hash of the complete message.
>>
>> Regarding the architecture, I mostly agree with James except that I think
>> step 3 needs to also be able to somehow group errors via the original
>> data (identify
>> replays, identify repeat issues with data in a specific field, issues with
>> consistently different data, etc.).  This is essentially the first step of
>> troubleshooting, which I assume you are doing if you're looking at the
>> error dashboard.
>>
>> If the hash gets moved out of the initial implementation, I'm fairly
>> certain you lose this ability.  The point here isn't to handle long fields
>> (although that's a benefit of this approach), it's to attach a unique
>> identifier to the error/validation issue message that links it to the
>> original problem.  I'd be happy to consider alternative solutions to this
>> problem (for instance, actually sending across the data itself) I just
>> haven't been able to think of another way to do this that I like better.
>>
>> Jon
>>
>> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman 
>> wrote:
>>
>> > We also need a JIRA for any install/Ansible/MPack work needed.
>> >
>> > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota 
>> wrote:
>> >
>> > > Now that I had some time to think about it I would collapse all error
>> and
>> > > validation topics into one.  We can differentiate between different
>> views
>> > > of the data (split by error source etc) via Kibana dashboards.  I
>> would
>> > > implement this feature incrementally.  First I would modify all the
>> bolts
>> > > to log to a single topic.  Second, I would get the error indexing
>> done by
>> > > attaching the indexing topology to the error topic. Third I would
>> create
>> > > the necessary dashboards to view errors and validation failures by
>> > source.
>> > > Lastly, I would file a follow-on JIRA to introduce hashing of errors
>> or
>> > > fields that are too long.  It seems like a separate feature that we
>> need
>> > to
>> > > think through.  We may need a stellar function around that.
>> > >
>> > > Thanks,
>> > > James
>> > >
>> > > 24.01.2017, 10:25, "Ryan Merriman" :
>> > > > I understand what Jon is talking about. He's proposing we hash the
>> > value
>> > > > that caused the error, not necessarily the error message itself.
>> For an
>> > > > enrichment this is easy. Just pass along the field value that failed
>> > > > enrichment. For other cases the field that caused the 

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
That use case makes sense to me.  I don't think it will require that much
additional effort either.

On Tue, Jan 24, 2017 at 1:02 PM, zeo...@gmail.com  wrote:

> Regarding error vs validation - Either way I'm not very concerned.  I
> initially assumed they would be combined and agree with that approach, but
> splitting them out isn't a very big deal to me either.
>
> Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
> where it's not possible to pick out the exact thing causing the issue) it
> would be a hash of the complete message.
>
> Regarding the architecture, I mostly agree with James except that I think
> step 3 needs to also be able to somehow group errors via the original
> data (identify
> replays, identify repeat issues with data in a specific field, issues with
> consistently different data, etc.).  This is essentially the first step of
> troubleshooting, which I assume you are doing if you're looking at the
> error dashboard.
>
> If the hash gets moved out of the initial implementation, I'm fairly
> certain you lose this ability.  The point here isn't to handle long fields
> (although that's a benefit of this approach), it's to attach a unique
> identifier to the error/validation issue message that links it to the
> original problem.  I'd be happy to consider alternative solutions to this
> problem (for instance, actually sending across the data itself) I just
> haven't been able to think of another way to do this that I like better.
>
> Jon
>
> On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman  wrote:
>
> > We also need a JIRA for any install/Ansible/MPack work needed.
> >
> > On Tue, Jan 24, 2017 at 12:06 PM, James Sirota 
> wrote:
> >
> > > Now that I had some time to think about it I would collapse all error
> and
> > > validation topics into one.  We can differentiate between different
> views
> > > of the data (split by error source etc) via Kibana dashboards.  I would
> > > implement this feature incrementally.  First I would modify all the
> bolts
> > > to log to a single topic.  Second, I would get the error indexing done
> by
> > > attaching the indexing topology to the error topic. Third I would
> create
> > > the necessary dashboards to view errors and validation failures by
> > source.
> > > Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> > > fields that are too long.  It seems like a separate feature that we
> need
> > to
> > > think through.  We may need a stellar function around that.
> > >
> > > Thanks,
> > > James
> > >
> > > 24.01.2017, 10:25, "Ryan Merriman" :
> > > > I understand what Jon is talking about. He's proposing we hash the
> > value
> > > > that caused the error, not necessarily the error message itself. For
> an
> > > > enrichment this is easy. Just pass along the field value that failed
> > > > enrichment. For other cases the field that caused the error may not
> be
> > so
> > > > obvious. Take parser validation for example. The message is validated
> > as
> > > > a whole and it may not be easy to determine which field is the cause.
> > In
> > > > that case would a hash of the whole message work?
> > > >
> > > > There is a broader architectural discussion that needs to happen
> before
> > > we
> > > > can implement this. Currently we have an indexing topology that reads
> > > from
> > > > 1 topic and writes messages to ES but errors are written to several
> > > > different topics:
> > > >
> > > >- parser_error
> > > >- parser_invalid
> > > >- enrichments_error
> > > >- threatintel_error
> > > >- indexing_error
> > > >
> > > > I can see 4 possible approaches to implementing this:
> > > >
> > > >1. Create an index topology for each error topic
> > > >   1. Good because we can easily reuse the indexing topology and
> > would
> > > >   require the least development effort
> > > >   2. Bad because it would consume a lot of extra worker slots
> > > >2. Move the topic name into the error JSON message as a new
> > > "error_type"
> > > >field and write all messages to the indexing topic
> > > >   1. Good because we don't need to create a new topology
> > > >   2. Bad because we would be flowing data and errors through the
> > same
> > > >   topology. A spike in errors could affect message indexing.
> > > >3. Compromise between 1 and 2. Create another indexing topology
> that
> > > is
> > > >dedicated to indexing errors. Move the topic name into the error
> > JSON
> > > >message as a new "error_type" field and write all errors to a
> single
> > > error
> > > >topic.
> > > >4. Write a completely new topology with multiple spouts (1 for
> each
> > > >error type listed above) that all feed into a single
> > > BulkMessageWriterBolt.
> > > >   1. Good because the current topologies would not need to change
> > > >   2. Bad because it would require the most 

Re: [DISCUSS] Error Indexing

2017-01-24 Thread zeo...@gmail.com
Regarding error vs validation - Either way I'm not very concerned.  I
initially assumed they would be combined and agree with that approach, but
splitting them out isn't a very big deal to me either.

Re: Ryan.  Yes, exactly.  In the case of a parser issue (or anywhere else
where it's not possible to pick out the exact thing causing the issue) it
would be a hash of the complete message.

Regarding the architecture, I mostly agree with James except that I think
step 3 needs to also be able to somehow group errors via the original
data (identify
replays, identify repeat issues with data in a specific field, issues with
consistently different data, etc.).  This is essentially the first step of
troubleshooting, which I assume you are doing if you're looking at the
error dashboard.

If the hash gets moved out of the initial implementation, I'm fairly
certain you lose this ability.  The point here isn't to handle long fields
(although that's a benefit of this approach), it's to attach a unique
identifier to the error/validation issue message that links it to the
original problem.  I'd be happy to consider alternative solutions to this
problem (for instance, actually sending across the data itself) I just
haven't been able to think of another way to do this that I like better.

Jon

On Tue, Jan 24, 2017 at 1:13 PM Ryan Merriman  wrote:

> We also need a JIRA for any install/Ansible/MPack work needed.
>
> On Tue, Jan 24, 2017 at 12:06 PM, James Sirota  wrote:
>
> > Now that I had some time to think about it I would collapse all error and
> > validation topics into one.  We can differentiate between different views
> > of the data (split by error source etc) via Kibana dashboards.  I would
> > implement this feature incrementally.  First I would modify all the bolts
> > to log to a single topic.  Second, I would get the error indexing done by
> > attaching the indexing topology to the error topic. Third I would create
> > the necessary dashboards to view errors and validation failures by
> source.
> > Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> > fields that are too long.  It seems like a separate feature that we need
> to
> > think through.  We may need a stellar function around that.
> >
> > Thanks,
> > James
> >
> > 24.01.2017, 10:25, "Ryan Merriman" :
> > > I understand what Jon is talking about. He's proposing we hash the
> value
> > > that caused the error, not necessarily the error message itself. For an
> > > enrichment this is easy. Just pass along the field value that failed
> > > enrichment. For other cases the field that caused the error may not be
> so
> > > obvious. Take parser validation for example. The message is validated
> as
> > > a whole and it may not be easy to determine which field is the cause.
> In
> > > that case would a hash of the whole message work?
> > >
> > > There is a broader architectural discussion that needs to happen before
> > we
> > > can implement this. Currently we have an indexing topology that reads
> > from
> > > 1 topic and writes messages to ES but errors are written to several
> > > different topics:
> > >
> > >- parser_error
> > >- parser_invalid
> > >- enrichments_error
> > >- threatintel_error
> > >- indexing_error
> > >
> > > I can see 4 possible approaches to implementing this:
> > >
> > >1. Create an index topology for each error topic
> > >   1. Good because we can easily reuse the indexing topology and
> would
> > >   require the least development effort
> > >   2. Bad because it would consume a lot of extra worker slots
> > >2. Move the topic name into the error JSON message as a new
> > "error_type"
> > >field and write all messages to the indexing topic
> > >   1. Good because we don't need to create a new topology
> > >   2. Bad because we would be flowing data and errors through the
> same
> > >   topology. A spike in errors could affect message indexing.
> > >3. Compromise between 1 and 2. Create another indexing topology that
> > is
> > >dedicated to indexing errors. Move the topic name into the error
> JSON
> > >message as a new "error_type" field and write all errors to a single
> > error
> > >topic.
> > >4. Write a completely new topology with multiple spouts (1 for each
> > >error type listed above) that all feed into a single
> > BulkMessageWriterBolt.
> > >   1. Good because the current topologies would not need to change
> > >   2. Bad because it would require the most development effort,
> would
> > >   not reuse existing topologies and takes up more worker slots
> than 3
> > >
> > > Are there other approaches I haven't thought of? I think 1 and 2 are
> off
> > > the table because they are shortcuts and not good long-term solutions.
> 3
> > > would be my choice because it introduces less complexity than 4.
> > Thoughts?
> > >
> > > Ryan
> > >
> > > On Mon, Jan 23, 

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
We also need a JIRA for any install/Ansible/MPack work needed.

On Tue, Jan 24, 2017 at 12:06 PM, James Sirota  wrote:

> Now that I had some time to think about it I would collapse all error and
> validation topics into one.  We can differentiate between different views
> of the data (split by error source etc) via Kibana dashboards.  I would
> implement this feature incrementally.  First I would modify all the bolts
> to log to a single topic.  Second, I would get the error indexing done by
> attaching the indexing topology to the error topic. Third I would create
> the necessary dashboards to view errors and validation failures by source.
> Lastly, I would file a follow-on JIRA to introduce hashing of errors or
> fields that are too long.  It seems like a separate feature that we need to
> think through.  We may need a stellar function around that.
>
> Thanks,
> James
>
> 24.01.2017, 10:25, "Ryan Merriman" :
> > I understand what Jon is talking about. He's proposing we hash the value
> > that caused the error, not necessarily the error message itself. For an
> > enrichment this is easy. Just pass along the field value that failed
> > enrichment. For other cases the field that caused the error may not be so
> > obvious. Take parser validation for example. The message is validated as
> > a whole and it may not be easy to determine which field is the cause. In
> > that case would a hash of the whole message work?
> >
> > There is a broader architectural discussion that needs to happen before
> we
> > can implement this. Currently we have an indexing topology that reads
> from
> > 1 topic and writes messages to ES but errors are written to several
> > different topics:
> >
> >- parser_error
> >- parser_invalid
> >- enrichments_error
> >- threatintel_error
> >- indexing_error
> >
> > I can see 4 possible approaches to implementing this:
> >
> >1. Create an index topology for each error topic
> >   1. Good because we can easily reuse the indexing topology and would
> >   require the least development effort
> >   2. Bad because it would consume a lot of extra worker slots
> >2. Move the topic name into the error JSON message as a new
> "error_type"
> >field and write all messages to the indexing topic
> >   1. Good because we don't need to create a new topology
> >   2. Bad because we would be flowing data and errors through the same
> >   topology. A spike in errors could affect message indexing.
> >3. Compromise between 1 and 2. Create another indexing topology that
> is
> >dedicated to indexing errors. Move the topic name into the error JSON
> >message as a new "error_type" field and write all errors to a single
> error
> >topic.
> >4. Write a completely new topology with multiple spouts (1 for each
> >error type listed above) that all feed into a single
> BulkMessageWriterBolt.
> >   1. Good because the current topologies would not need to change
> >   2. Bad because it would require the most development effort, would
> >   not reuse existing topologies and takes up more worker slots than 3
> >
> > Are there other approaches I haven't thought of? I think 1 and 2 are off
> > the table because they are shortcuts and not good long-term solutions. 3
> > would be my choice because it introduces less complexity than 4.
> Thoughts?
> >
> > Ryan
> >
> > On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com 
> wrote:
> >
> >>  In that case the hash would be of the value in the IP field, such as
> >>  sha3(8.8.8.8).
> >>
> >>  Jon
> >>
> >>  On Mon, Jan 23, 2017, 6:41 PM James Sirota  wrote:
> >>
> >>  > Jon,
> >>  >
> >>  > I am still not entirely following why we would want to use hashing.
> For
> >>  > example if my error is "Your IP field is invalid and failed
> validation"
> >>  > hashing this error string will always result in the same hash. Why
> not
> >>  > just use the actual error string? Can you provide an example where
> you
> >>  > would use it?
> >>  >
> >>  > Thanks,
> >>  > James
> >>  >
> >>  > 23.01.2017, 16:29, "zeo...@gmail.com" :
> >>  > > For 1 - I'm good with that.
> >>  > >
> >>  > > I'm talking about hashing the relevant content itself not the
> error.
> >>  Some
> >>  > > benefits are (1) minimize load on search index (there's minimal
> benefit
> >>  > in
> >>  > > spending the CPU and disk to keep it at full fidelity (tokenize and
> >>  > store))
> >>  > > (2) provide something to key on for dashboards (assuming a good
> hash
> >>  > > algorithm that avoids collisions and is second preimage resistant)
> and
> >>  > (3)
> >>  > > specific to errors, if the issue is that it failed to index, a hash
> >>  gives
> >>  > > us some protection that the issue will not occur twice.
> >>  > >
> >>  > > Jon
> >>  > >
> >>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota 
> wrote:
> >>  > >
> >>  > > 

Re: [DISCUSS] Error Indexing

2017-01-24 Thread James Sirota
Now that I had some time to think about it I would collapse all error and 
validation topics into one.  We can differentiate between different views of 
the data (split by error source etc) via Kibana dashboards.  I would implement 
this feature incrementally.  First I would modify all the bolts to log to a 
single topic.  Second, I would get the error indexing done by attaching the 
indexing topology to the error topic. Third I would create the necessary 
dashboards to view errors and validation failures by source.  Lastly, I would 
file a follow-on JIRA to introduce hashing of errors or fields that are too 
long.  It seems like a separate feature that we need to think through.  We may 
need a stellar function around that.

Thanks,
James 

24.01.2017, 10:25, "Ryan Merriman" :
> I understand what Jon is talking about. He's proposing we hash the value
> that caused the error, not necessarily the error message itself. For an
> enrichment this is easy. Just pass along the field value that failed
> enrichment. For other cases the field that caused the error may not be so
> obvious. Take parser validation for example. The message is validated as
> a whole and it may not be easy to determine which field is the cause. In
> that case would a hash of the whole message work?
>
> There is a broader architectural discussion that needs to happen before we
> can implement this. Currently we have an indexing topology that reads from
> 1 topic and writes messages to ES but errors are written to several
> different topics:
>
>    - parser_error
>    - parser_invalid
>    - enrichments_error
>    - threatintel_error
>    - indexing_error
>
> I can see 4 possible approaches to implementing this:
>
>    1. Create an index topology for each error topic
>   1. Good because we can easily reuse the indexing topology and would
>   require the least development effort
>   2. Bad because it would consume a lot of extra worker slots
>    2. Move the topic name into the error JSON message as a new "error_type"
>    field and write all messages to the indexing topic
>   1. Good because we don't need to create a new topology
>   2. Bad because we would be flowing data and errors through the same
>   topology. A spike in errors could affect message indexing.
>    3. Compromise between 1 and 2. Create another indexing topology that is
>    dedicated to indexing errors. Move the topic name into the error JSON
>    message as a new "error_type" field and write all errors to a single error
>    topic.
>    4. Write a completely new topology with multiple spouts (1 for each
>    error type listed above) that all feed into a single BulkMessageWriterBolt.
>   1. Good because the current topologies would not need to change
>   2. Bad because it would require the most development effort, would
>   not reuse existing topologies and takes up more worker slots than 3
>
> Are there other approaches I haven't thought of? I think 1 and 2 are off
> the table because they are shortcuts and not good long-term solutions. 3
> would be my choice because it introduces less complexity than 4. Thoughts?
>
> Ryan
>
> On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com  wrote:
>
>>  In that case the hash would be of the value in the IP field, such as
>>  sha3(8.8.8.8).
>>
>>  Jon
>>
>>  On Mon, Jan 23, 2017, 6:41 PM James Sirota  wrote:
>>
>>  > Jon,
>>  >
>>  > I am still not entirely following why we would want to use hashing. For
>>  > example if my error is "Your IP field is invalid and failed validation"
>>  > hashing this error string will always result in the same hash. Why not
>>  > just use the actual error string? Can you provide an example where you
>>  > would use it?
>>  >
>>  > Thanks,
>>  > James
>>  >
>>  > 23.01.2017, 16:29, "zeo...@gmail.com" :
>>  > > For 1 - I'm good with that.
>>  > >
>>  > > I'm talking about hashing the relevant content itself not the error.
>>  Some
>>  > > benefits are (1) minimize load on search index (there's minimal benefit
>>  > in
>>  > > spending the CPU and disk to keep it at full fidelity (tokenize and
>>  > store))
>>  > > (2) provide something to key on for dashboards (assuming a good hash
>>  > > algorithm that avoids collisions and is second preimage resistant) and
>>  > (3)
>>  > > specific to errors, if the issue is that it failed to index, a hash
>>  gives
>>  > > us some protection that the issue will not occur twice.
>>  > >
>>  > > Jon
>>  > >
>>  > > On Mon, Jan 23, 2017, 2:47 PM James Sirota  wrote:
>>  > >
>>  > > Jon,
>>  > >
>>  > > With regards to 1, collapsing to a single dashboard for each would be
>>  > > fine. So we would have one error index and one "failed to validate"
>>  > > index. The distinction is that errors would be things that went wrong
>>  > > during stream processing (failed to parse, etc...), while validation
>>  > > failures are messages that 

Re: [DISCUSS] Error Indexing

2017-01-24 Thread Ryan Merriman
I understand what Jon is talking about.  He's proposing we hash the value
that caused the error, not necessarily the error message itself.  For an
enrichment this is easy.  Just pass along the field value that failed
enrichment.  For other cases the field that caused the error may not be so
obvious.  Take parser validation for example.  The message is validated as
a whole and it may not be easy to determine which field is the cause.  In
that case would a hash of the whole message work?

There is a broader architectural discussion that needs to happen before we
can implement this.  Currently we have an indexing topology that reads from
1 topic and writes messages to ES but errors are written to several
different topics:

   - parser_error
   - parser_invalid
   - enrichments_error
   - threatintel_error
   - indexing_error

I can see 4 possible approaches to implementing this:

   1. Create an index topology for each error topic
  1. Good because we can easily reuse the indexing topology and would
  require the least development effort
  2. Bad because it would consume a lot of extra worker slots
   2. Move the topic name into the error JSON message as a new "error_type"
   field and write all messages to the indexing topic
  1. Good because we don't need to create a new topology
  2. Bad because we would be flowing data and errors through the same
  topology.  A spike in errors could affect message indexing.
   3. Compromise between 1 and 2.  Create another indexing topology that is
   dedicated to indexing errors.  Move the topic name into the error JSON
   message as a new "error_type" field and write all errors to a single error
   topic.
   4. Write a completely new topology with multiple spouts (1 for each
   error type listed above) that all feed into a single BulkMessageWriterBolt.
  1. Good because the current topologies would not need to change
  2. Bad because it would require the most development effort, would
  not reuse existing topologies and takes up more worker slots than 3

Are there other approaches I haven't thought of?  I think 1 and 2 are off
the table because they are shortcuts and not good long-term solutions.  3
would be my choice because it introduces less complexity than 4.  Thoughts?

Ryan


On Mon, Jan 23, 2017 at 5:44 PM, zeo...@gmail.com  wrote:

> In that case the hash would be of the value in the IP field, such as
> sha3(8.8.8.8).
>
> Jon
>
> On Mon, Jan 23, 2017, 6:41 PM James Sirota  wrote:
>
> > Jon,
> >
> > I am still not entirely following why we would want to use hashing.  For
> > example if my error is "Your IP field is invalid and failed validation"
> > hashing this error string will always result in the same hash.  Why not
> > just use the actual error string? Can you provide an example where you
> > would use it?
> >
> > Thanks,
> > James
> >
> > 23.01.2017, 16:29, "zeo...@gmail.com" :
> > > For 1 - I'm good with that.
> > >
> > > I'm talking about hashing the relevant content itself not the error.
> Some
> > > benefits are (1) minimize load on search index (there's minimal benefit
> > in
> > > spending the CPU and disk to keep it at full fidelity (tokenize and
> > store))
> > > (2) provide something to key on for dashboards (assuming a good hash
> > > algorithm that avoids collisions and is second preimage resistant) and
> > (3)
> > > specific to errors, if the issue is that it failed to index, a hash
> gives
> > > us some protection that the issue will not occur twice.
> > >
> > > Jon
> > >
> > > On Mon, Jan 23, 2017, 2:47 PM James Sirota  wrote:
> > >
> > > Jon,
> > >
> > > With regards to 1, collapsing to a single dashboard for each would be
> > > fine. So we would have one error index and one "failed to validate"
> > > index. The distinction is that errors would be things that went wrong
> > > during stream processing (failed to parse, etc...), while validation
> > > failures are messages that explicitly failed stellar validation/schema
> > > enforcement. There should be relatively few of the second type.
> > >
> > > With respect to 3, why do you want the error hashed? Why not just
> search
> > > for the error text?
> > >
> > > Thanks,
> > > James
> > >
> > > 20.01.2017, 14:01, "zeo...@gmail.com" :
> > >>  As someone who currently fills the platform engineer role, I can give
> > this
> > >>  idea a huge +1. My thoughts:
> > >>
> > >>  1. I think it depends on exactly what data is pushed into the index
> > (#3).
> > >>  However, assuming the errors you proposed recording, I can't see huge
> > >>  benefits to having more than one dashboard. I would be happy to be
> > >>  persuaded otherwise.
> > >>
> > >>  2. I would say yes, storing the errors in HDFS in addition to
> indexing
> > is
> > >>  a good thing. Using METRON-510
> > >>   as a case study,
> > there
> > >>  is the potential 

Re: [Discuss] Situational Awareness Zeppelin Dashboard

2017-01-24 Thread Nick Allen
I should clarify, the examples above would be for the YAF flows.  The other
default sensors, would obviously be different.

On Tue, Jan 24, 2017 at 10:09 AM, Nick Allen  wrote:

> I would like to create a Zeppelin dashboard that provides some level of
> situational awareness for each of the data sources.  What do you guys think
> that should look-like?  A few thoughts on what could be included.
>
>- Top external hosts with geo-location
>- Number of total flows per hour
>- Geo-location of flows
>- Number of internal flows per hour
>- Number of internal-external flows per hour
>- Average flow length per hour
>- Centrality and betweenness measures
>
>
>


-- 
Nick Allen 


[Discuss] Situational Awareness Zeppelin Dashboard

2017-01-24 Thread Nick Allen
I would like to create a Zeppelin dashboard that provides some level of
situational awareness for each of the data sources.  What do you guys think
that should look-like?  A few thoughts on what could be included.

   - Top external hosts with geo-location
   - Number of total flows per hour
   - Geo-location of flows
   - Number of internal flows per hour
   - Number of internal-external flows per hour
   - Average flow length per hour
   - Centrality and betweenness measures


[GitHub] incubator-metron issue #422: METRON-670 Monit Incorrectly Reports Status

2017-01-24 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/422
  
There are also problems with shutting down topologies, since we don't pass 
in the wait to storm ( at least we didn't.   I need to find the pr/jira for 
that.  All external calls from monit need to have clear timeout accounting from 
both monit's pov and the external agent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Build failing

2017-01-24 Thread Justin Leet
Created a Jira:
https://issues.apache.org/jira/browse/METRON-672

Feel free to add / correct anything in that ticket.

Justin

On Tue, Jan 24, 2017 at 8:09 AM, Casey Stella  wrote:

> One thing that I would caution though is that this is likely a heisenbug.
> The more logging I added earlier made it less likely to occur. It seems
> more likely to occur on Travis than locally and I made it happen by
> repeatedly running mvn install on Metron-solr (after a mvn install of the
> whole project).
> On Tue, Jan 24, 2017 at 07:59 Casey Stella  wrote:
>
> > Agreed to both counts. I was able to reproduce it locally, but not in an
> > IDE by the way.
> > On Tue, Jan 24, 2017 at 07:57 Justin Leet  wrote:
> >
> > I definitely agree that this isn't a fluke.
> >
> > Do we have a Jira for this?  If not, I can create one and I would like to
> > propose that part of that ticket is adding logging.  Right now, I'm
> > concerned we don't have enough info from the Travis builds to be able to
> > (easily) debug failure or reproduce locally.
> >
> > Justin
> >
> > On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella 
> wrote:
> >
> > > One more thing, just for posterity here, it always freezes at 6 records
> > > written to HDFS.  That's the reason I thought it was a flushing issue.
> > >
> > > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella 
> > wrote:
> > >
> > > > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt
> > from
> > > > the failing logs on travis for my PR with substantially longer
> > timeouts (
> > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/
> 194575474/log.txt)
> > > >
> > > > Running org.apache.metron.solr.integration.
> SolrIndexingIntegrationTest
> > > > 0 vs 10 vs 0
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 317.056
> > > sec <<< FAILURE!
> > > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > > Time elapsed: 316.949 sec  <<< ERROR!
> > > > java.lang.RuntimeException: Took too long to complete: 300783 >
> 30
> > > >   at org.apache.metron.integration.ComponentRunner.process(
> > > ComponentRunner.java:131)
> > > >   at org.apache.metron.indexing.integration.
> > > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > > >
> > > >
> > > > I'm getting the impression that this isn't the timeout and we have a
> > > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > > 

Re: Build failing

2017-01-24 Thread Casey Stella
One thing that I would caution though is that this is likely a heisenbug.
The more logging I added earlier made it less likely to occur. It seems
more likely to occur on Travis than locally and I made it happen by
repeatedly running mvn install on Metron-solr (after a mvn install of the
whole project).
On Tue, Jan 24, 2017 at 07:59 Casey Stella  wrote:

> Agreed to both counts. I was able to reproduce it locally, but not in an
> IDE by the way.
> On Tue, Jan 24, 2017 at 07:57 Justin Leet  wrote:
>
> I definitely agree that this isn't a fluke.
>
> Do we have a Jira for this?  If not, I can create one and I would like to
> propose that part of that ticket is adding logging.  Right now, I'm
> concerned we don't have enough info from the Travis builds to be able to
> (easily) debug failure or reproduce locally.
>
> Justin
>
> On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella  wrote:
>
> > One more thing, just for posterity here, it always freezes at 6 records
> > written to HDFS.  That's the reason I thought it was a flushing issue.
> >
> > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella 
> wrote:
> >
> > > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt
> from
> > > the failing logs on travis for my PR with substantially longer
> timeouts (
> > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> > >
> > > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> > > 0 vs 10 vs 0
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 317.056
> > sec <<< FAILURE!
> > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > Time elapsed: 316.949 sec  <<< ERROR!
> > > java.lang.RuntimeException: Took too long to complete: 300783 > 30
> > >   at org.apache.metron.integration.ComponentRunner.process(
> > ComponentRunner.java:131)
> > >   at org.apache.metron.indexing.integration.
> > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > >
> > >
> > > I'm getting the impression that this isn't the timeout and we have a
> > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > seconds apart.  That line means that it read 10 entries from kafka, 10
> > entries from the indexed data and 6 entries from HDFS.  It's that 6
> entries
> > that is the problem.   Also of note, this does not seem to happen to me
> > locally AND it's not consistent on Travis.  Given all that I'd say that
> > it's a problem with the HDFS Writer not getting flushed, but I verified
> > that it is indeed flushed per message.
> > >
> > >
> > > Anyway, tl;dr we have a mystery unit test 

Re: [VOTE] Release Process Documentation

2017-01-24 Thread Justin Leet
Another minor correction: the link at "*Note: Per Apache policy
,
the hardware used to create the candidate tarball must be owned by the
release manager." is broken.

Justin

On Mon, Jan 23, 2017 at 12:38 PM, James Sirota  wrote:

> Sorry that's a typo. Was meant to be 10.  Just fixed
>
> 20.01.2017, 13:22, "zeo...@gmail.com" :
> > -1 (non-binding).
> >
> > There appears to be a minor oversight where it goes from Step 9 to Step
> 14.
> >
> > Jon
> >
> > On Fri, Jan 20, 2017 at 11:56 AM James Sirota 
> wrote:
> >
> >>  The document is available here:
> >>  https://cwiki.apache.org/confluence/display/METRON/Release+Process
> >>
> >>  and is also pasted in this email for your convenience
> >>
> >>  Please vote +1, -1, or 0 for neutral. The vote will be open for 72
> hours
> >>
> >>  Metron Release Types
> >>  There are two types of Metron releases:
> >>  Feature Release (FR) - this is a release that has a significant step
> >>  forward in feature capability and is denoted by an upgrade of the
> second
> >>  digit
> >>  Maintenance Release (MR) - this is a set of patches and fixes that are
> >>  issued following the FR and is denoted by an upgrade of the third digit
> >>  Release Naming Convention
> >>  Metron build naming convention is as follows: 0.[FR].[MR]. We keep the
> 0.
> >>  notation to signify that the project is still under active development
> and
> >>  we will hold a community vote to go to 1.x at a future time
> >>  Initiating a New Metron Release
> >>  Create the MR branch for the previous Metron release by incrementing
> the
> >>  *third* digit of the previous release like so 0.[FR].[*MR++*]. All
> patches
> >>  to the previous Metron release will be checked in under the MR branch
> and
> >>  where it makes sense also under the FR branch. All new features will be
> >>  checked in under the FR branch.
> >>  Creating a Feature Release
> >>  Step 1 - Initiate a discuss thread
> >>  Prior to the release The Release manager should do the following
> >>  (preferably a month before the release):
> >>  Make sure that the list of JIRAs slated for the release accurately
> >>  reflects to reflects the pull requests that are currently in master
> >>  Construct an email to the Metron dev board (
> >>  dev@metron.incubator.apache.org) which discusses with the community
> the
> >>  desire to do a release. This email should contain the following:
> >>  The list of JIRAs slated for the release with descriptions (use the
> output
> >>  of git log and remove all the JIRAs from the last release’s changelog)
> >>  A solicitation of JIRAs that should be included with the next release.
> >>  Users should rate them as must/need/good to have as well as
> volunteering.
> >>  A release email template is provided here.
> >>  Step 2 - Monitor and Verify JIRAs
> >>  Once the community votes for additional JIRAs they want included in the
> >>  release verify that the pull requests are in before the release, close
> >>  these JIRAs and tag them with the release name. All pull requests and
> JIRAs
> >>  that were not slated for this release will go into the next releases.
> The
> >>  release manager should continue to monitor the JIRA to ensure that the
> >>  timetable is on track until the release date. On the release date the
> >>  release manager should message the Metron dev board (
> >>  dev@metron.incubator.apache.org) announcing the code freeze for the
> >>  release.
> >>  Step 3 - Create the Release Branch and Increment Metron version
> >>  Create an branch for the release (from a repo cloned from
> >>  https://git-wip-us.apache.org/repos/asf/incubator-metron.git).
> (assuming
> >>  the release is 0.[FR++].0 and working from master):
> >>  git checkout -b Metron_0.[FR++].0
> >>  git push --set-upstream origin Metron_0.[FR++].0
> >>  File a JIRA to increment the Metron version to 0.[FR++].0. Either do it
> >>  yourself or have a community member increment the build version for
> you.
> >>  You can look at a pull request for a previous build to see how this is
> >>  done. METRON-533 - Up the version for release DONE
> >>  Also, the release manager should have a couple of things set up:
> >>  A SVN clone of the repo at
> >>  https://dist.apache.org/repos/dist/dev/incubator/metron, We will
> refer to
> >>  this as the dev repo. It will hold the release candidate artifacts
> >>  A SVN clone of the repo at
> >>  https://dist.apache.org/repos/dist/release/incubator/metron, We will
> >>  refer to this as the release repo. It will hold the release artifacts.
> >>  Step 4 - Create the Release Candidate
> >>
> >>  Now, for each release candidate, we will tag from that branch. Assuming
> >>  that this is RC1:
> >>  

Re: Build failing

2017-01-24 Thread Casey Stella
Agreed to both counts. I was able to reproduce it locally, but not in an
IDE by the way.
On Tue, Jan 24, 2017 at 07:57 Justin Leet  wrote:

> I definitely agree that this isn't a fluke.
>
> Do we have a Jira for this?  If not, I can create one and I would like to
> propose that part of that ticket is adding logging.  Right now, I'm
> concerned we don't have enough info from the Travis builds to be able to
> (easily) debug failure or reproduce locally.
>
> Justin
>
> On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella  wrote:
>
> > One more thing, just for posterity here, it always freezes at 6 records
> > written to HDFS.  That's the reason I thought it was a flushing issue.
> >
> > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella 
> wrote:
> >
> > > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt
> from
> > > the failing logs on travis for my PR with substantially longer
> timeouts (
> > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> > >
> > > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> > > 0 vs 10 vs 0
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 317.056
> > sec <<< FAILURE!
> > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > Time elapsed: 316.949 sec  <<< ERROR!
> > > java.lang.RuntimeException: Took too long to complete: 300783 > 30
> > >   at org.apache.metron.integration.ComponentRunner.process(
> > ComponentRunner.java:131)
> > >   at org.apache.metron.indexing.integration.
> > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > >
> > >
> > > I'm getting the impression that this isn't the timeout and we have a
> > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > seconds apart.  That line means that it read 10 entries from kafka, 10
> > entries from the indexed data and 6 entries from HDFS.  It's that 6
> entries
> > that is the problem.   Also of note, this does not seem to happen to me
> > locally AND it's not consistent on Travis.  Given all that I'd say that
> > it's a problem with the HDFS Writer not getting flushed, but I verified
> > that it is indeed flushed per message.
> > >
> > >
> > > Anyway, tl;dr we have a mystery unit test bug that isn't deterministic
> > wrt the unit tests and may or may not manifest itself outside of the unit
> > tests.  So, yeah, I'll be looking at it, but would appreciate others
> taking
> > a gander too.
> > >
> > >
> > > Casey
> > >
> > >
> > > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella 
> > wrote:
> > >
> > >> Yeah, I adjusted the timeout on the indexing 

Re: Build failing

2017-01-24 Thread Justin Leet
I definitely agree that this isn't a fluke.

Do we have a Jira for this?  If not, I can create one and I would like to
propose that part of that ticket is adding logging.  Right now, I'm
concerned we don't have enough info from the Travis builds to be able to
(easily) debug failure or reproduce locally.

Justin

On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella  wrote:

> One more thing, just for posterity here, it always freezes at 6 records
> written to HDFS.  That's the reason I thought it was a flushing issue.
>
> On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella  wrote:
>
> > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt from
> > the failing logs on travis for my PR with substantially longer timeouts (
> > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> >
> > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> > 0 vs 10 vs 0
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> 1485200689038.json
> > 10 vs 10 vs 6
> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 317.056
> sec <<< FAILURE!
> > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> Time elapsed: 316.949 sec  <<< ERROR!
> > java.lang.RuntimeException: Took too long to complete: 300783 > 30
> >   at org.apache.metron.integration.ComponentRunner.process(
> ComponentRunner.java:131)
> >   at org.apache.metron.indexing.integration.
> IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> >
> >
> > I'm getting the impression that this isn't the timeout and we have a
> mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> seconds apart.  That line means that it read 10 entries from kafka, 10
> entries from the indexed data and 6 entries from HDFS.  It's that 6 entries
> that is the problem.   Also of note, this does not seem to happen to me
> locally AND it's not consistent on Travis.  Given all that I'd say that
> it's a problem with the HDFS Writer not getting flushed, but I verified
> that it is indeed flushed per message.
> >
> >
> > Anyway, tl;dr we have a mystery unit test bug that isn't deterministic
> wrt the unit tests and may or may not manifest itself outside of the unit
> tests.  So, yeah, I'll be looking at it, but would appreciate others taking
> a gander too.
> >
> >
> > Casey
> >
> >
> > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella 
> wrote:
> >
> >> Yeah, I adjusted the timeout on the indexing integration tests as part
> of
> >> https://github.com/apache/incubator-metron/pull/420 which I'll merge in
> >> today.
> >>
> >> On Mon, Jan 23, 2017 at 2:01 PM, zeo...@gmail.com 
> >> wrote:
> >>
> >>> Okay, now we have back to back failures, and it looks like it may have
> >>> been
> >>> a timeout issue?
> >>>  `test(org.apache.metron.solr.integration.
>