[GitHub] metron pull request #606: METRON-980: Short circuit operations for Stellar

2017-06-08 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/606#discussion_r121049130
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/stellar/StellarCompiler.java
 ---
@@ -76,14 +93,80 @@ public Expression(Deque tokenDeque) {
 
 public Object apply(ExpressionState state) {
   Deque instanceDeque = new ArrayDeque<>();
-  for(Iterator it = 
getTokenDeque().descendingIterator();it.hasNext();) {
-Token token = it.next();
-if(token.getUnderlyingType() == DeferredFunction.class) {
-  DeferredFunction func = (DeferredFunction) token.getValue();
-  func.apply(instanceDeque, state);
-}
-else {
-  instanceDeque.push(token);
+  {
+boolean skipElse = false;
+Token token = null;
+for (Iterator it = getTokenDeque().descendingIterator(); 
it.hasNext(); ) {
+  token = it.next();
+  //if we've skipped an else previously, then we need to skip the 
deferred tokens associated with the else.
+  if(skipElse && token.getUnderlyingType() == ElseExpr.class) {
+while(it.hasNext()) {
+  token = it.next();
+  if(token.getUnderlyingType() == EndConditional.class) {
+break;
+  }
+}
+skipElse = false;
+  }
+  /*
+  curr is the current value on the stack.  This is the 
non-deferred actual evaluation for this expression
+  and with the current context.
+   */
+  Token curr = instanceDeque.peek();
+  if( curr != null
+   && curr.getValue() != null && curr.getValue() instanceof Boolean
+   && 
ShortCircuitOp.class.isAssignableFrom(token.getUnderlyingType())
+  ) {
+//if we have a boolean as the current value and the next 
non-contextual token is a short circuit op
+//then we need to short circuit possibly
+if(token.getUnderlyingType() == BooleanArg.class) {
+  if (curr.getMultiArgContext() != null
+  && curr.getMultiArgContext().getVariety() == 
FrameContext.BOOLEAN_OR
+  && (Boolean) (curr.getValue())
+  ) {
+//short circuit the or
+FrameContext.Context context = curr.getMultiArgContext();
+shortCircuit(it, context);
+  } else if (curr.getMultiArgContext() != null
+  && curr.getMultiArgContext().getVariety() == 
FrameContext.BOOLEAN_AND
+  && !(Boolean) (curr.getValue())
+  ) {
+//short circuit the and
+FrameContext.Context context = curr.getMultiArgContext();
+shortCircuit(it, context);
+  }
+}
+else if(token.getUnderlyingType() == IfExpr.class) {
+  //short circuit the if/then/else
+  instanceDeque.pop();
+  if((Boolean)curr.getValue()) {
+//choose then
+skipElse = true;
+  }
+  else {
+//choose else
+while(it.hasNext()) {
+  Token t = it.next();
+  if(t.getUnderlyingType() == ElseExpr.class) {
+break;
+  }
+}
+  }
+}
+  }
+  if (token.getUnderlyingType() == DeferredFunction.class) {
+DeferredFunction func = (DeferredFunction) token.getValue();
+func.apply(instanceDeque, state);
+  }
+  else if(token.getUnderlyingType() == ShortCircuitFrame.class
+   || 
ShortCircuitOp.class.isAssignableFrom(token.getUnderlyingType())
+  ) {
+continue;
--- End diff --

agreed.  I also demorganified it to make it clearer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/609


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: performance benchmarks on the asa parser

2017-06-08 Thread Ali Nazemian
Hi Simon,

We have noticed those issues as well. Can you share the changes you have
made? so we can merge it with our version. We have implemented about 40-50
more ciscotags so far. It would be great if we can optimize it and
contribute back to the community. However, we may end up reimplement it
using via Java Parser.

Cheers,
Ali

On Fri, Jun 9, 2017 at 12:55 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> I thought about compile of first use and cache as an approach, but decided
> it would reduce the predictability of latency for a message, which is
> important in the metron enrichment context. As you say, we could end up
> growing a large number of Groks, but if the load of compile is all pushed
> to the (hopefully very rare) topology restart event, it feels like the
> performance trade off there is a good one, though the memory usage tradeoff
> could start to bite if we’re getting into the hundreds I guess.
>
> Simon
>
>
> > On 9 Jun 2017, at 03:32, Kyle Richardson 
> wrote:
> >
> > I like the pre-compile idea. One concern is I see the number of grok
> objects growing over time. This parser does not account for nearly all of
> the possible ASA message types, currently only the most common ones. Is
> there a middle ground implementation where we can compile on first use of a
> grok and then hold in memory? Avoids the up front burden but should also
> boost performance.
> >
> > -Kyle
> >
> >> On Jun 8, 2017, at 8:56 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
> >>
> >> The changes are pretty simple (pre-compile the grok, duh). Most other
> grok parser just use a single expression, which is already pre-compiled
> (/checks assumption in code) so really it’s just the ASA one because of
> it’s strange two stage grok.
> >>
> >> Shame, it would have been nice to find some more low hanging fruit.
> >>
> >> Simon
> >>
> >>> On 9 Jun 2017, at 01:52, Otto Fowler  wrote:
> >>>
> >>> Are these changes that all grok parsers can benefit from?  Are your
> changes to the base classes that they use or asa only?
> >>>
> >>>
> >>>
>  On June 8, 2017 at 20:49:49, Simon Elliston Ball (
> si...@simonellistonball.com ) wrote:
> 
>  I got mildly interested in parser performance as a result of some
> recent work on tuning, and did some very quick benchmarking with Predfix on
> the ASA parser (which I hadn’t really cared about enough due to relatively
> low volume previously).
> 
>  That said, it’s not exactly perf optimised. 3 runs of 1000 iterations
> on my laptop as a micro-benchmark in Predfix (I know, scientific, right),
> with some changes (basically pushing all the grok statements up to
> pre-compile in init (the parser currently uses one grok to do the syslog
> bit and figure out which grok it needs for the second half, so this makes
> for a large number of Grok objects upfront, which I think we can live with.
> 
>  Do you think we should do this benchmarking properly, and extend?
> Anyone have thoughts about how to build parser benchmarks in to our test
> suite properly?
> 
>  Also, since these are showing approx 20 times improvement on the P95
> interval, do we think it’s worth the memory (not measured, but 39 Grok
> objects hanging around? If so I’ll get it JIRAed up and push my new version.
> 
>  Run results:-
> 
>  Base line (current master as is)
>  |= Benchmark ==
> |
>  | - | unit | sum | min | max | avg | stddev | conf95 | runs |
>  |= TimeMeter
> ==|
>  |. AsaBenchmark ..
> .|
>  | parserBenchmark | ms | 5597.98 | 04.90 | 159.02 | 05.60 | 04.89 |
> [05.01-06.20] | 1000.00 |
>  | parserBenchmark | ms | 5503.91 | 04.82 | 149.60 | 05.50 | 04.59 |
> [05.00-05.90] | 1000.00 |
>  | parserBenchmark | ms | 5620.90 | 04.80 | 152.83 | 05.62 | 04.71 |
> [04.98-06.73] | 1000.00 |
>  |===
> ===|
> 
>  Syslog element of Grok pulled out and pre-compiled
> 
>  |= Benchmark ==
> |
>  | - | unit | sum | min | max | avg | stddev | conf95 | runs |
>  |= TimeMeter
> ==|
>  |. AsaBenchmark ..
> .|
>  | parserBenchmark | ms | 4299.91 | 03.29 | 120.06 | 04.30 | 03.89 |
> [03.36-07.10] | 1000.00 |
>  | parserBenchmark | ms | 4206.98 | 03.31 | 129.41 | 04.21 | 04.07 |
> [03.46-05.44] | 1000.00 |
>  | parserBenchmark | ms | 3843.05 | 03.28 | 

Re: performance benchmarks on the asa parser

2017-06-08 Thread Simon Elliston Ball
I thought about compile of first use and cache as an approach, but decided it 
would reduce the predictability of latency for a message, which is important in 
the metron enrichment context. As you say, we could end up growing a large 
number of Groks, but if the load of compile is all pushed to the (hopefully 
very rare) topology restart event, it feels like the performance trade off 
there is a good one, though the memory usage tradeoff could start to bite if 
we’re getting into the hundreds I guess. 

Simon


> On 9 Jun 2017, at 03:32, Kyle Richardson  wrote:
> 
> I like the pre-compile idea. One concern is I see the number of grok objects 
> growing over time. This parser does not account for nearly all of the 
> possible ASA message types, currently only the most common ones. Is there a 
> middle ground implementation where we can compile on first use of a grok and 
> then hold in memory? Avoids the up front burden but should also boost 
> performance.
> 
> -Kyle
> 
>> On Jun 8, 2017, at 8:56 PM, Simon Elliston Ball 
>>  wrote:
>> 
>> The changes are pretty simple (pre-compile the grok, duh). Most other grok 
>> parser just use a single expression, which is already pre-compiled (/checks 
>> assumption in code) so really it’s just the ASA one because of it’s strange 
>> two stage grok. 
>> 
>> Shame, it would have been nice to find some more low hanging fruit.
>> 
>> Simon
>> 
>>> On 9 Jun 2017, at 01:52, Otto Fowler  wrote:
>>> 
>>> Are these changes that all grok parsers can benefit from?  Are your changes 
>>> to the base classes that they use or asa only?
>>> 
>>> 
>>> 
 On June 8, 2017 at 20:49:49, Simon Elliston Ball 
 (si...@simonellistonball.com ) wrote:
 
 I got mildly interested in parser performance as a result of some recent 
 work on tuning, and did some very quick benchmarking with Predfix on the 
 ASA parser (which I hadn’t really cared about enough due to relatively low 
 volume previously). 
 
 That said, it’s not exactly perf optimised. 3 runs of 1000 iterations on 
 my laptop as a micro-benchmark in Predfix (I know, scientific, right), 
 with some changes (basically pushing all the grok statements up to 
 pre-compile in init (the parser currently uses one grok to do the syslog 
 bit and figure out which grok it needs for the second half, so this makes 
 for a large number of Grok objects upfront, which I think we can live 
 with.  
 
 Do you think we should do this benchmarking properly, and extend? Anyone 
 have thoughts about how to build parser benchmarks in to our test suite 
 properly? 
 
 Also, since these are showing approx 20 times improvement on the P95 
 interval, do we think it’s worth the memory (not measured, but 39 Grok 
 objects hanging around? If so I’ll get it JIRAed up and push my new 
 version. 
 
 Run results:- 
 
 Base line (current master as is) 
 |= Benchmark 
 ==|
  
 | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
 |= TimeMeter 
 ==| 
 |. AsaBenchmark 
 ...|
  
 | parserBenchmark | ms | 5597.98 | 04.90 | 159.02 | 05.60 | 04.89 | 
 [05.01-06.20] | 1000.00 | 
 | parserBenchmark | ms | 5503.91 | 04.82 | 149.60 | 05.50 | 04.59 | 
 [05.00-05.90] | 1000.00 | 
 | parserBenchmark | ms | 5620.90 | 04.80 | 152.83 | 05.62 | 04.71 | 
 [04.98-06.73] | 1000.00 | 
 |==|
  
 
 Syslog element of Grok pulled out and pre-compiled 
 
 |= Benchmark 
 ==|
  
 | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
 |= TimeMeter 
 ==| 
 |. AsaBenchmark 
 ...|
  
 | parserBenchmark | ms | 4299.91 | 03.29 | 120.06 | 04.30 | 03.89 | 
 [03.36-07.10] | 1000.00 | 
 | parserBenchmark | ms | 4206.98 | 03.31 | 129.41 | 04.21 | 04.07 | 
 [03.46-05.44] | 1000.00 | 
 | parserBenchmark | ms | 3843.05 | 03.28 | 119.39 | 03.84 | 03.79 | 
 [03.33-04.55] | 1000.00 | 
 |==|
  
 
 With all precompiled in a hash map (more memory use, but not by a lot) 
 
 |= Benchmark 
 

Re: performance benchmarks on the asa parser

2017-06-08 Thread Kyle Richardson
I like the pre-compile idea. One concern is I see the number of grok objects 
growing over time. This parser does not account for nearly all of the possible 
ASA message types, currently only the most common ones. Is there a middle 
ground implementation where we can compile on first use of a grok and then hold 
in memory? Avoids the up front burden but should also boost performance.

-Kyle

> On Jun 8, 2017, at 8:56 PM, Simon Elliston Ball  
> wrote:
> 
> The changes are pretty simple (pre-compile the grok, duh). Most other grok 
> parser just use a single expression, which is already pre-compiled (/checks 
> assumption in code) so really it’s just the ASA one because of it’s strange 
> two stage grok. 
> 
> Shame, it would have been nice to find some more low hanging fruit.
> 
> Simon
> 
>> On 9 Jun 2017, at 01:52, Otto Fowler  wrote:
>> 
>> Are these changes that all grok parsers can benefit from?  Are your changes 
>> to the base classes that they use or asa only?
>> 
>> 
>> 
>>> On June 8, 2017 at 20:49:49, Simon Elliston Ball 
>>> (si...@simonellistonball.com ) wrote:
>>> 
>>> I got mildly interested in parser performance as a result of some recent 
>>> work on tuning, and did some very quick benchmarking with Predfix on the 
>>> ASA parser (which I hadn’t really cared about enough due to relatively low 
>>> volume previously). 
>>> 
>>> That said, it’s not exactly perf optimised. 3 runs of 1000 iterations on my 
>>> laptop as a micro-benchmark in Predfix (I know, scientific, right), with 
>>> some changes (basically pushing all the grok statements up to pre-compile 
>>> in init (the parser currently uses one grok to do the syslog bit and figure 
>>> out which grok it needs for the second half, so this makes for a large 
>>> number of Grok objects upfront, which I think we can live with.  
>>> 
>>> Do you think we should do this benchmarking properly, and extend? Anyone 
>>> have thoughts about how to build parser benchmarks in to our test suite 
>>> properly? 
>>> 
>>> Also, since these are showing approx 20 times improvement on the P95 
>>> interval, do we think it’s worth the memory (not measured, but 39 Grok 
>>> objects hanging around? If so I’ll get it JIRAed up and push my new 
>>> version. 
>>> 
>>> Run results:- 
>>> 
>>> Base line (current master as is) 
>>> |= Benchmark 
>>> ==|
>>>  
>>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>>> |= TimeMeter 
>>> ==| 
>>> |. AsaBenchmark 
>>> ...|
>>>  
>>> | parserBenchmark | ms | 5597.98 | 04.90 | 159.02 | 05.60 | 04.89 | 
>>> [05.01-06.20] | 1000.00 | 
>>> | parserBenchmark | ms | 5503.91 | 04.82 | 149.60 | 05.50 | 04.59 | 
>>> [05.00-05.90] | 1000.00 | 
>>> | parserBenchmark | ms | 5620.90 | 04.80 | 152.83 | 05.62 | 04.71 | 
>>> [04.98-06.73] | 1000.00 | 
>>> |==|
>>>  
>>> 
>>> Syslog element of Grok pulled out and pre-compiled 
>>> 
>>> |= Benchmark 
>>> ==|
>>>  
>>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>>> |= TimeMeter 
>>> ==| 
>>> |. AsaBenchmark 
>>> ...|
>>>  
>>> | parserBenchmark | ms | 4299.91 | 03.29 | 120.06 | 04.30 | 03.89 | 
>>> [03.36-07.10] | 1000.00 | 
>>> | parserBenchmark | ms | 4206.98 | 03.31 | 129.41 | 04.21 | 04.07 | 
>>> [03.46-05.44] | 1000.00 | 
>>> | parserBenchmark | ms | 3843.05 | 03.28 | 119.39 | 03.84 | 03.79 | 
>>> [03.33-04.55] | 1000.00 | 
>>> |==|
>>>  
>>> 
>>> With all precompiled in a hash map (more memory use, but not by a lot) 
>>> 
>>> |= Benchmark 
>>> =|
>>>  
>>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>>> |= TimeMeter 
>>> =| 
>>> |. AsaBenchmark 
>>> ..|
>>>  
>>> | parserBenchmark | ms | 514.68 | 00.22 | 112.35 | 00.51 | 03.55 | 
>>> [00.24-00.79] | 1000.00 | 
>>> | parserBenchmark | ms | 472.42 | 00.22 | 105.19 | 00.47 | 03.32 | 
>>> [00.23-00.70] | 1000.00 | 
>>> | parserBenchmark | ms | 484.40 | 00.21 | 103.71 | 00.48 | 03.27 | 
>>> [00.24-00.76] | 1000.00 | 
>>> |==|
>>>  

Re: performance benchmarks on the asa parser

2017-06-08 Thread Simon Elliston Ball
The changes are pretty simple (pre-compile the grok, duh). Most other grok 
parser just use a single expression, which is already pre-compiled (/checks 
assumption in code) so really it’s just the ASA one because of it’s strange two 
stage grok. 

Shame, it would have been nice to find some more low hanging fruit.

Simon

> On 9 Jun 2017, at 01:52, Otto Fowler  wrote:
> 
> Are these changes that all grok parsers can benefit from?  Are your changes 
> to the base classes that they use or asa only?
> 
> 
> 
> On June 8, 2017 at 20:49:49, Simon Elliston Ball (si...@simonellistonball.com 
> ) wrote:
> 
>> I got mildly interested in parser performance as a result of some recent 
>> work on tuning, and did some very quick benchmarking with Predfix on the ASA 
>> parser (which I hadn’t really cared about enough due to relatively low 
>> volume previously). 
>> 
>> That said, it’s not exactly perf optimised. 3 runs of 1000 iterations on my 
>> laptop as a micro-benchmark in Predfix (I know, scientific, right), with 
>> some changes (basically pushing all the grok statements up to pre-compile in 
>> init (the parser currently uses one grok to do the syslog bit and figure out 
>> which grok it needs for the second half, so this makes for a large number of 
>> Grok objects upfront, which I think we can live with.  
>> 
>> Do you think we should do this benchmarking properly, and extend? Anyone 
>> have thoughts about how to build parser benchmarks in to our test suite 
>> properly? 
>> 
>> Also, since these are showing approx 20 times improvement on the P95 
>> interval, do we think it’s worth the memory (not measured, but 39 Grok 
>> objects hanging around? If so I’ll get it JIRAed up and push my new version. 
>> 
>> Run results:- 
>> 
>> Base line (current master as is) 
>> |= Benchmark 
>> ==|
>>  
>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>> |= TimeMeter 
>> ==| 
>> |. AsaBenchmark 
>> ...|
>>  
>> | parserBenchmark | ms | 5597.98 | 04.90 | 159.02 | 05.60 | 04.89 | 
>> [05.01-06.20] | 1000.00 | 
>> | parserBenchmark | ms | 5503.91 | 04.82 | 149.60 | 05.50 | 04.59 | 
>> [05.00-05.90] | 1000.00 | 
>> | parserBenchmark | ms | 5620.90 | 04.80 | 152.83 | 05.62 | 04.71 | 
>> [04.98-06.73] | 1000.00 | 
>> |==|
>>  
>> 
>> Syslog element of Grok pulled out and pre-compiled 
>> 
>> |= Benchmark 
>> ==|
>>  
>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>> |= TimeMeter 
>> ==| 
>> |. AsaBenchmark 
>> ...|
>>  
>> | parserBenchmark | ms | 4299.91 | 03.29 | 120.06 | 04.30 | 03.89 | 
>> [03.36-07.10] | 1000.00 | 
>> | parserBenchmark | ms | 4206.98 | 03.31 | 129.41 | 04.21 | 04.07 | 
>> [03.46-05.44] | 1000.00 | 
>> | parserBenchmark | ms | 3843.05 | 03.28 | 119.39 | 03.84 | 03.79 | 
>> [03.33-04.55] | 1000.00 | 
>> |==|
>>  
>> 
>> With all precompiled in a hash map (more memory use, but not by a lot) 
>> 
>> |= Benchmark 
>> =|
>>  
>> | - | unit | sum | min | max | avg | stddev | conf95 | runs | 
>> |= TimeMeter 
>> =| 
>> |. AsaBenchmark 
>> ..|
>>  
>> | parserBenchmark | ms | 514.68 | 00.22 | 112.35 | 00.51 | 03.55 | 
>> [00.24-00.79] | 1000.00 | 
>> | parserBenchmark | ms | 472.42 | 00.22 | 105.19 | 00.47 | 03.32 | 
>> [00.23-00.70] | 1000.00 | 
>> | parserBenchmark | ms | 484.40 | 00.21 | 103.71 | 00.48 | 03.27 | 
>> [00.24-00.76] | 1000.00 | 
>> |==|
>>  
>> 
>> Simon



Re: performance benchmarks on the asa parser

2017-06-08 Thread Otto Fowler
Are these changes that all grok parsers can benefit from?  Are your changes
to the base classes that they use or asa only?



On June 8, 2017 at 20:49:49, Simon Elliston Ball (
si...@simonellistonball.com) wrote:

I got mildly interested in parser performance as a result of some recent
work on tuning, and did some very quick benchmarking with Predfix on the
ASA parser (which I hadn’t really cared about enough due to relatively low
volume previously).

That said, it’s not exactly perf optimised. 3 runs of 1000 iterations on my
laptop as a micro-benchmark in Predfix (I know, scientific, right), with
some changes (basically pushing all the grok statements up to pre-compile
in init (the parser currently uses one grok to do the syslog bit and figure
out which grok it needs for the second half, so this makes for a large
number of Grok objects upfront, which I think we can live with.

Do you think we should do this benchmarking properly, and extend? Anyone
have thoughts about how to build parser benchmarks in to our test suite
properly?

Also, since these are showing approx 20 times improvement on the P95
interval, do we think it’s worth the memory (not measured, but 39 Grok
objects hanging around? If so I’ll get it JIRAed up and push my new
version.

Run results:-

Base line (current master as is)
|= Benchmark
==|

| - | unit | sum | min | max | avg | stddev | conf95 | runs |
|= TimeMeter
==|
|. AsaBenchmark
...|

| parserBenchmark | ms | 5597.98 | 04.90 | 159.02 | 05.60 | 04.89 |
[05.01-06.20] | 1000.00 |
| parserBenchmark | ms | 5503.91 | 04.82 | 149.60 | 05.50 | 04.59 |
[05.00-05.90] | 1000.00 |
| parserBenchmark | ms | 5620.90 | 04.80 | 152.83 | 05.62 | 04.71 |
[04.98-06.73] | 1000.00 |
|==|


Syslog element of Grok pulled out and pre-compiled

|= Benchmark
==|

| - | unit | sum | min | max | avg | stddev | conf95 | runs |
|= TimeMeter
==|
|. AsaBenchmark
...|

| parserBenchmark | ms | 4299.91 | 03.29 | 120.06 | 04.30 | 03.89 |
[03.36-07.10] | 1000.00 |
| parserBenchmark | ms | 4206.98 | 03.31 | 129.41 | 04.21 | 04.07 |
[03.46-05.44] | 1000.00 |
| parserBenchmark | ms | 3843.05 | 03.28 | 119.39 | 03.84 | 03.79 |
[03.33-04.55] | 1000.00 |
|==|


With all precompiled in a hash map (more memory use, but not by a lot)

|= Benchmark
=|

| - | unit | sum | min | max | avg | stddev | conf95 | runs |
|= TimeMeter
=|
|. AsaBenchmark
..|

| parserBenchmark | ms | 514.68 | 00.22 | 112.35 | 00.51 | 03.55 |
[00.24-00.79] | 1000.00 |
| parserBenchmark | ms | 472.42 | 00.22 | 105.19 | 00.47 | 03.32 |
[00.23-00.70] | 1000.00 |
| parserBenchmark | ms | 484.40 | 00.21 | 103.71 | 00.48 | 03.27 |
[00.24-00.76] | 1000.00 |
|==|


Simon


[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread mattf-horton
Github user mattf-horton commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r121001410
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are up and 

[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread mattf-horton
Github user mattf-horton commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r121000712
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are up and 

[GitHub] metron issue #610: METRON-877 Extract core implementation and UDF support, c...

2017-06-08 Thread mattf-horton
Github user mattf-horton commented on the issue:

https://github.com/apache/metron/pull/610
  
The Metron RPM build is broken with this patch, which interferes with 
testing.  Will have it fixed shortly. --Matt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #609: METRON-987: Allow stellar enrichments to be specified by ...

2017-06-08 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/609
  
Do we want to add a version number to these configuration now?  Then we can 
say no version is 0, this is 1 etc.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r120979346
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are up and 

[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r120979059
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are up and 

[GitHub] metron issue #609: METRON-987: Allow stellar enrichments to be specified by ...

2017-06-08 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/609
  
@cestella Perfect.  Definitely didn't want to imply that belongs in this 
ticket at all.  I'm in full support of it being backwards compatible for now, 
and keeping the discussion of how we prune old implementations separate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #609: METRON-987: Allow stellar enrichments to be specified by ...

2017-06-08 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/609
  
@justinleet yes, agreed.  I wanted to get this functionality in and have it 
be backwards compatible for the short term, but ultimately, I think that it's a 
better approach and we should work toward phasing out the map-based approach.  
The issue is that there are a few other places that also need to be normalized:
* the field transformations
* the importer config
* possibly others that I am not thinking about

Anyway, to answer your question, yes I'll probably start a discuss thread 
on that tomorrow and construct some JIRAs to see if 
* other people think deprecation is a good thing to work towards
* how in the heck we should accomplish it 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread simonellistonball
Github user simonellistonball commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r120961228
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are 

[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r120964479
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
+As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that 
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=
+export ZOOKEEPER=
+export KAFKA_HOME=
+export METRON_HOME=
+export HDP_HOME=
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+--command-config=/tmp/consumergroup.config \
+--describe \
+--group enrichments \
+--bootstrap-server $BROKERLIST \
+--new-consumer
+```
+
+This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
+enrichmentsenrichments9  29746066  
  297460671   consumer-2_/xxx.xxx.xxx.xxx
+enrichmentsenrichments3  29754325  
  297543261   consumer-1_/xxx.xxx.xxx.xxx
+enrichmentsenrichments43 29754331  
  297543321   consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
+spouts are up and 

[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/614#discussion_r120964036
  
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
--- End diff --

"Tunining" -> "Tuning"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #614: METRON-992: Create performance tuning guide

2017-06-08 Thread mmiklavc
GitHub user mmiklavc opened a pull request:

https://github.com/apache/metron/pull/614

METRON-992: Create performance tuning guide

https://issues.apache.org/jira/browse/METRON-992

This guide covers performance tuning the Metron topologies. I will be 
leaving this up for a month or so to gather additional insight and feedback 
from the community.

I have a few additional tweaks to make around formatting and links to other 
README files, but I wanted to get this in front of the committers sooner than 
later.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mmiklavc/metron METRON-992

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/614.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #614


commit 2f2386c57750cdbf2da2780f001eae0db9f28a9c
Author: Michael Miklavcic 
Date:   2017-06-08T16:53:36Z

METRON-992: Create performance tuning guide




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #613: METRON-990: Clean up and organize flux properties

2017-06-08 Thread merrimanr
GitHub user merrimanr reopened a pull request:

https://github.com/apache/metron/pull/613

METRON-990: Clean up and organize flux properties

## Contributor Comments
This PR is mainly a refactor of the enrichment and indexing flux files 
along with their matching property files.  The changes include:

- moving important settings with hardcoded values in flux files to property 
files so that they are configurable through Ambari
- removing old and unused properties
- elasticsearch.properties file is now implemented as a jinja2 template in 
the mpack, matching the enrichment.properties implementation
- global.json is now implemented as a jinja2 template in the mpack
- properties are now organized in Ambari as separate tabs and sub sections, 
hopefully making them easier to find
- changed a couple properties to use a dropdown widget in Ambari

I wrote descriptions for new properties in a similar style as existing 
properties.  I feel the descriptions are a little short, curious if others 
agree.  I also stopped short of improving ALL our properties with better 
widgets than just a text box.  I imagine people will have opinions on how to 
best present properties in Ambari but this is a start.

This has been tested on full dev with the usual process.  When reviewing, 
spin up full dev and navigate to the Metron service in Ambari.  The Config 
section should look as described in the section above.  The 
enrichment.properties and elasticsearch.properties should look much shorter and 
easier to read.

One more thing.  I added a rat exception for *.json.j2 since we already 
have an exception for *.json (no comments in json is the reason?).  Let me know 
if that's wrong.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
 
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root incubating-metron folder via:
  ```
  mvn -q clean integration-test install && build_utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merrimanr/incubator-metron METRON-990

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #613


commit 9cfa0181a95d0b2b21bd665530119e7b261962dc
Author: 

[GitHub] metron pull request #613: METRON-990: Clean up and organize flux properties

2017-06-08 Thread merrimanr
Github user merrimanr closed the pull request at:

https://github.com/apache/metron/pull/613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120932742
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/message/JSONFromPosition.java
 ---
@@ -40,10 +40,12 @@ public JSONFromPosition(int position) {
 
   @Override
   public JSONObject get(Tuple tuple) {
+String s = null;
 try {
-  return (JSONObject) parser.get().parse(new 
String(tuple.getBinary(position), "UTF8"));
+  s =  new String(tuple.getBinary(position), "UTF8");
--- End diff --

Can we just use StandardCharsets.UTF_8 then, and not even worry about the 
string being right?  I'm not sure where that alias is, so I'm not convinced 
it's guaranteed (although a source proving me wrong would also be acceptable)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120932266
  
--- Diff: metron-platform/metron-enrichment/README.md ---
@@ -71,40 +73,94 @@ The `fieldMap`contents are of interest because they 
contain the routing and conf
   ]
   }
 ```
-Based on this sample config, both ip_src_addr and ip_dst_addr will go to 
the `geo`, `host`, and `hbaseEnrichment` adapter bolts. For the `geo`, `host` 
and `hbaseEnrichment`, this is sufficient.  However, more complex enrichments 
may contain their own configuration.  Currently, the `stellar` enrichment 
requires a more complex configuration, such as:
+Based on this sample config, both `ip_src_addr` and `ip_dst_addr` will go 
to the `geo`, `host`, and 
+`hbaseEnrichment` adapter bolts. 
+ 
+ Stellar Enrichment Configuration
+For the `geo`, `host` and `hbaseEnrichment`, this is sufficient. However, 
more complex enrichments 
+may contain their own configuration.  Currently, the `stellar` enrichment 
is more adaptable and thus
+requires a more nuanced configuration.
+
+At its most basic, we want to take a message and apply a couple of 
enrichments, such as converting the
+`hostname` field to lowercase. We do this by specifying the transformation 
inside of the 
+`config` for the `stellar` fieldMap.  There are two syntaxes that are 
supported, specifying the transformations
+as a map with the key as the field and the value the stellar expression:
 ```
 "fieldMap": {
...
   "stellar" : {
 "config" : {
-  "numeric" : {
-  "foo": "1 + 1"
-  }
-  ,"ALL_CAPS" : "TO_UPPER(source.type)"
+  "hostname" : "TO_LOWER(hostname)"
 }
   }
 }
 ```
 
-Whereas the simpler enrichments just need a set of fields explicitly 
stated so they can be separated from the message and sent to the enrichment 
adapter bolt for enrichment and ultimately joined back in the join bolt, the 
stellar enrichment has its set of required fields implicitly stated through 
usage.  For instance, if your stellar statement references a field, it should 
be included and if not, then it should not be included.  We did not want to 
require users to make explicit the implicit.
+Another approach is to make the transformations as a list with the same 
`var := expr` syntax as is used
+in the Stellar REPL, such as:
+```
+"fieldMap": {
+   ...
+  "stellar" : {
+"config" : [
+  "hostname := TO_LOWER(hostname)"
+]
+  }
+}
+```
+
+Sometimes arbitrary stellar enrichments may take enough time that you 
would prefer to split some of them
+into groups and execute the groups of stellar enrichments in parallel.  
Take, for instance, if you wanted
+to do an HBase enrichment and a profiler call which were independent of 
one another.  This usecase is 
+supported by splitting the enrichments up as groups.
 
-The other way in which the stellar enrichment is somewhat more complex is 
in how the statements are executed.  In the general purpose case for a list of 
fields, those fields are used to create a message to send to the enrichment 
adapter bolt and that bolt's worker will handle the fields one by one in serial 
for a given message.  For stellar enrichment, we wanted to have a more complex 
design so that users could specify the groups of stellar statements sent to the 
same worker in the same message (and thus executed sequentially).  Consider the 
following configuration:
+Consider the following example:
 ```
 "fieldMap": {
+   ...
   "stellar" : {
 "config" : {
-  "numeric" : {
-  "foo": "1 + 1"
-  "bar" : TO_LOWER(source.type)"
-  }
- ,"text" : {
-   "ALL_CAPS" : "TO_UPPER(source.type)"
-   }
+  "malicious_domain_enrichment" : {
+"is_bad_domain" : "ENRICHMENT_EXISTS('malicious_domains', 
ip_dst_addr, 'enrichments', 'cf')"
+  },
+  "login_profile" : [
+"profile_window := PROFILE_WINDOW('from 6 months ago')", 
+"global_login_profile := 
PROFILE_GET('distinct_login_attempts', 'global', profile_window)",
+"stats := STATS_MERGE(global_login_profile)",
+"auth_attempts_median := STATS_PERCENTILE(stats, 0.5)", 
+"auth_attempts_sd := STATS_SD(stats)",
+"profile_window := null", 
+"global_login_profile := null", 
+"stats := null"
+  ]
 }
   }
 }
 ```
-We have a group called `numeric` whose stellar 

[GitHub] metron pull request #606: METRON-980: Short circuit operations for Stellar

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/606#discussion_r120926071
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/stellar/StellarCompiler.java
 ---
@@ -76,14 +93,80 @@ public Expression(Deque tokenDeque) {
 
 public Object apply(ExpressionState state) {
   Deque instanceDeque = new ArrayDeque<>();
-  for(Iterator it = 
getTokenDeque().descendingIterator();it.hasNext();) {
-Token token = it.next();
-if(token.getUnderlyingType() == DeferredFunction.class) {
-  DeferredFunction func = (DeferredFunction) token.getValue();
-  func.apply(instanceDeque, state);
-}
-else {
-  instanceDeque.push(token);
+  {
+boolean skipElse = false;
+Token token = null;
+for (Iterator it = getTokenDeque().descendingIterator(); 
it.hasNext(); ) {
+  token = it.next();
+  //if we've skipped an else previously, then we need to skip the 
deferred tokens associated with the else.
+  if(skipElse && token.getUnderlyingType() == ElseExpr.class) {
+while(it.hasNext()) {
+  token = it.next();
+  if(token.getUnderlyingType() == EndConditional.class) {
+break;
+  }
+}
+skipElse = false;
+  }
+  /*
+  curr is the current value on the stack.  This is the 
non-deferred actual evaluation for this expression
+  and with the current context.
+   */
+  Token curr = instanceDeque.peek();
+  if( curr != null
+   && curr.getValue() != null && curr.getValue() instanceof Boolean
+   && 
ShortCircuitOp.class.isAssignableFrom(token.getUnderlyingType())
+  ) {
+//if we have a boolean as the current value and the next 
non-contextual token is a short circuit op
+//then we need to short circuit possibly
+if(token.getUnderlyingType() == BooleanArg.class) {
+  if (curr.getMultiArgContext() != null
+  && curr.getMultiArgContext().getVariety() == 
FrameContext.BOOLEAN_OR
+  && (Boolean) (curr.getValue())
+  ) {
+//short circuit the or
+FrameContext.Context context = curr.getMultiArgContext();
+shortCircuit(it, context);
+  } else if (curr.getMultiArgContext() != null
+  && curr.getMultiArgContext().getVariety() == 
FrameContext.BOOLEAN_AND
+  && !(Boolean) (curr.getValue())
+  ) {
+//short circuit the and
+FrameContext.Context context = curr.getMultiArgContext();
+shortCircuit(it, context);
+  }
+}
+else if(token.getUnderlyingType() == IfExpr.class) {
+  //short circuit the if/then/else
+  instanceDeque.pop();
+  if((Boolean)curr.getValue()) {
+//choose then
+skipElse = true;
+  }
+  else {
+//choose else
+while(it.hasNext()) {
+  Token t = it.next();
+  if(t.getUnderlyingType() == ElseExpr.class) {
+break;
+  }
+}
+  }
+}
+  }
+  if (token.getUnderlyingType() == DeferredFunction.class) {
+DeferredFunction func = (DeferredFunction) token.getValue();
+func.apply(instanceDeque, state);
+  }
+  else if(token.getUnderlyingType() == ShortCircuitFrame.class
+   || 
ShortCircuitOp.class.isAssignableFrom(token.getUnderlyingType())
+  ) {
+continue;
--- End diff --

`continue;` can be dropped here. There's nothing afterwards to skip


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120924836
  
--- Diff: 
metron-platform/metron-enrichment/src/test/java/org/apache/metron/enrichment/adapters/stellar/StellarAdapterTest.java
 ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.adapters.stellar;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.metron.common.configuration.StellarEnrichmentTest;
+import org.apache.metron.common.configuration.enrichment.EnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.dsl.Context;
+import org.apache.metron.common.dsl.MapVariableResolver;
+import org.apache.metron.common.dsl.VariableResolver;
+import org.apache.metron.common.stellar.StellarProcessor;
+import org.apache.metron.common.utils.JSONUtils;
+import org.json.simple.JSONObject;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class StellarAdapterTest extends StellarEnrichmentTest {
+  StellarProcessor processor = new StellarProcessor();
+
+  private JSONObject enrich(JSONObject message, String field, 
ConfigHandler handler) {
+VariableResolver resolver = new MapVariableResolver(message);
+return StellarAdapter.process( message
+ , handler
+ , field
+ , 1000l
--- End diff --

But the confusion is part of the charm!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120924708
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/message/JSONFromPosition.java
 ---
@@ -40,10 +40,12 @@ public JSONFromPosition(int position) {
 
   @Override
   public JSONObject get(Tuple tuple) {
+String s = null;
 try {
-  return (JSONObject) parser.get().parse(new 
String(tuple.getBinary(position), "UTF8"));
+  s =  new String(tuple.getBinary(position), "UTF8");
--- End diff --

`UTF-8` and `UTF8` are aliases in the `StandardCharsets` from looking 
through the java API code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120919459
  
--- Diff: 
metron-platform/metron-common/src/main/java/org/apache/metron/common/message/JSONFromPosition.java
 ---
@@ -40,10 +40,12 @@ public JSONFromPosition(int position) {
 
   @Override
   public JSONObject get(Tuple tuple) {
+String s = null;
 try {
-  return (JSONObject) parser.get().parse(new 
String(tuple.getBinary(position), "UTF8"));
+  s =  new String(tuple.getBinary(position), "UTF8");
--- End diff --

Isn't the identifying string "UTF-8", not "UTF8"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #609: METRON-987: Allow stellar enrichments to be specif...

2017-06-08 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/609#discussion_r120921756
  
--- Diff: 
metron-platform/metron-enrichment/src/test/java/org/apache/metron/enrichment/adapters/stellar/StellarAdapterTest.java
 ---
@@ -0,0 +1,134 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.adapters.stellar;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.metron.common.configuration.StellarEnrichmentTest;
+import org.apache.metron.common.configuration.enrichment.EnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.dsl.Context;
+import org.apache.metron.common.dsl.MapVariableResolver;
+import org.apache.metron.common.dsl.VariableResolver;
+import org.apache.metron.common.stellar.StellarProcessor;
+import org.apache.metron.common.utils.JSONUtils;
+import org.json.simple.JSONObject;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class StellarAdapterTest extends StellarEnrichmentTest {
+  StellarProcessor processor = new StellarProcessor();
+
+  private JSONObject enrich(JSONObject message, String field, 
ConfigHandler handler) {
+VariableResolver resolver = new MapVariableResolver(message);
+return StellarAdapter.process( message
+ , handler
+ , field
+ , 1000l
--- End diff --

Out of my personality preference, can you make this 1000L, so I don't look 
at it again and think it's 10001?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #609: METRON-987: Allow stellar enrichments to be specified by ...

2017-06-08 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/609
  
A lot of this brings up the discussions around deprecating / dropping 
functionality (both for Stellar and in general).  It seems like ideally, we'd 
be deprecating the map functionality in favor of the list functionality, 
probably letting it ride a few releases, and eventually dropping it after a 
migration period.

Do we need to start getting together a set of discussions and plans for 
handling this sort of deprecation as we build out enhanced versions of older 
features?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #602: METRON-906: Rest service storm configuration does ...

2017-06-08 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/602#discussion_r120890884
  
--- Diff: metron-deployment/roles/ambari_config/vars/single_node_vm.yml ---
@@ -102,7 +102,7 @@ configurations:
 
 required_configurations:
   - metron-env:
-  storm_rest_addr: "{{ groups.ambari_slave[0] }}:8744"
+  storm_rest_addr: "http://{{ groups.ambari_slave[0] }}:8744"
--- End diff --

That's fine; I get it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #600: METRON-976 KafkaUtils doesn't handle SASL_PLAINTEX...

2017-06-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/600


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Installation problem with Docker and processor that does not support virtualization

2017-06-08 Thread smlabs
ok

> 
> Il 8 giugno 2017 alle 11.54 "zeo...@gmail.com"  ha 
> scritto:
> 
> Moving to user list
> 
> Jon
> 
> On Thu, Jun 8, 2017, 5:51 AM  wrote:
> 
> > > 
> > Hei Jon,
> > 
> > thank you.
> > 
> > I don't know which are the differences between 0.3.1 and 0.4.0.
> > 
> > Unfortunately for me, my CPU does not support virtualization. That 
> > means
> > that I cannot use Docker.
> > 
> > The only workaround that I found is to use AWS directly but for me 
> > that I
> > have never used Mentor it could be a so big step...
> > 
> > So the question is, do I lose many things if I start with Mentor 
> > 0.3.1
> > into a single VM without Docker?
> > 
> > Best regards,
> > 
> > Simone
> > 
> > Il 8 giugno 2017 alle 11.37 "zeo...@gmail.com"  ha
> > scritto:
> > 
> > If I recall properly, 0.3.1 does not require docker yet. That will 
> > come
> > with 0.4.0/master. It still does require virtualization, however, 
> > to spin
> > up the dev environments (excluding if you were to run it on AWS).
> > 
> > Jon
> > 
> > On Thu, Jun 8, 2017, 2:03 AM  wrote:
> > 
> > Hello,
> > 
> > I didn't check the BIOS if I can enable it.
> > 
> > Then, I see that yesterday has been updated the installation 
> > guideline for
> > Metron 0.3.1 as follows:
> > 
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68718548
> > 
> > Reading that, there is no mention to Docker.
> > Maybe for my experiments (I do have to test some ML algo's) and 
> > improve my
> > knowledge on this tool that installation should be enough.
> > 
> > What do you think about?
> > 
> > Thanks.
> > 
> > Simone
> > 
> > Il 7 giugno 2017 alle 23.32 "zeo...@gmail.com"  ha
> > scritto:
> > 
> > If your processor doesn't support virtualization right now I would 
> > suggest
> > looking into if it is simply disabled in your BIOS/UEFI (most 
> > processers
> > have supported this for 10+ years, excluding some processors of 
> > course).
> > Docker is integrated into the build process right now and is 
> > considered
> > mandatory (although you technically could work around it with some 
> > effort).
> > 
> > Assuming you are spinning up full-dev, vagrant should create a 
> > centos 6 VM
> > to run Metron in.
> > 
> > Metron is the repository that you cloned from GitHub, downloaded 
> > from
> > Apache, etc. If you didn't do this, you will need to. Here is our 
> > last
> > release - http://metron.apache.org/documentation/#releases
> > 
> > Hope that helps
> > 
> > Jon
> > 
> > On Wed, Jun 7, 2017, 4:00 PM  wrote:
> > 
> > Dear All,
> > 
> > I'm installing Metron, following the instructions found here:
> > 
> > 
> > https://github.com/apache/metron/tree/master/metron-deployment/vagrant/full-dev-platform
> > 
> > Unfortunately, my processor does not support virtualization and I'm 
> > not
> > able to launch Docker.
> > 
> > Is there any workaround?
> > 
> > I installed Vagrant on my OSX and I assumed to use Vagrant to 
> > create a VM
> > with Ubuntu in which I would run Metron. Is it right?
> > 
> > Another question about the instructions, I do not really understand 
> > where
> > I get Metron.
> > 
> > In this point:
> > 
> >1. Deploy Metron
> > 
> > cd metron-deployment/vagrant/full-dev-platform
> > vagrant up
> > 
> > I understood that I should have already downloaded Metron, but I 
> > don't.
> > Where is Metron?
> > 
> > Thank you.
> > Simone
> > 
> > --
> > 
> > Jon
> > 
> > --
> > 
> > Jon
> > 
> > --
> > 
> > > 
> Jon
> 


Re: Installation problem with Docker and processor that does not support virtualization

2017-06-08 Thread zeo...@gmail.com
Moving to user list

Jon

On Thu, Jun 8, 2017, 5:51 AM  wrote:

> Hei Jon,
>
> thank you.
>
> I don't know which are the differences between 0.3.1 and 0.4.0.
>
> Unfortunately for me, my CPU does not support virtualization.  That means
> that I cannot use Docker.
>
> The only workaround that I found is to use AWS directly but for me that I
> have never used Mentor it could be a so big step...
>
> So the question is, do I lose many things if I start with Mentor 0.3.1
> into a single VM without Docker?
>
> Best regards,
>
> Simone
>
>
> Il 8 giugno 2017 alle 11.37 "zeo...@gmail.com"  ha
> scritto:
>
> If I recall properly, 0.3.1 does not require docker yet.  That will come
> with 0.4.0/master.  It still does require virtualization, however, to spin
> up the dev environments (excluding if you were to run it on AWS).
>
> Jon
>
> On Thu, Jun 8, 2017, 2:03 AM  wrote:
>
> Hello,
>
> I didn't check the BIOS if I can enable it.
>
> Then, I see that yesterday has been updated the installation guideline for
> Metron 0.3.1 as follows:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68718548
>
> Reading that, there is no mention to Docker.
> Maybe for my experiments (I do have to test some ML algo's) and improve my
> knowledge on this tool that installation should be enough.
>
> What do you think about?
>
> Thanks.
>
> Simone
>
>
> Il 7 giugno 2017 alle 23.32 "zeo...@gmail.com"  ha
> scritto:
>
> If your processor doesn't support virtualization right now I would suggest
> looking into if it is simply disabled in your BIOS/UEFI (most processers
> have supported this for 10+ years, excluding some processors of course).
> Docker is integrated into the build process right now and is considered
> mandatory (although you technically could work around it with some effort).
>
> Assuming you are spinning up full-dev, vagrant should create a centos 6 VM
> to run Metron in.
>
> Metron is the repository that you cloned from GitHub, downloaded from
> Apache, etc. If you didn't do this, you will need to. Here is our last
> release - http://metron.apache.org/documentation/#releases
>
> Hope that helps
>
> Jon
>
> On Wed, Jun 7, 2017, 4:00 PM  wrote:
>
> Dear All,
>
> I'm installing Metron, following the instructions found here:
>
>
> https://github.com/apache/metron/tree/master/metron-deployment/vagrant/full-dev-platform
>
> Unfortunately, my processor does not support virtualization and I'm not
> able to launch Docker.
>
> Is there any workaround?
>
> I installed Vagrant on my OSX and I assumed to use Vagrant to create a VM
> with Ubuntu in which I would run Metron. Is it right?
>
> Another question about the instructions, I do not really understand where
> I get Metron.
>
> In this point:
>
>
>1. Deploy Metron
>
> cd metron-deployment/vagrant/full-dev-platform
> vagrant up
>
> I understood that I should have already downloaded Metron, but I don't.
> Where is Metron?
>
> Thank you.
> Simone
>
> --
>
> Jon
>
> --
>
> Jon
>
> --

Jon


Re: Installation problem with Docker and processor that does not support virtualization

2017-06-08 Thread smlabs
Hei Jon,

thank you.

I don't know which are the differences between 0.3.1 and 0.4.0.

Unfortunately for me, my CPU does not support virtualization.  That means that 
I cannot use Docker.

The only workaround that I found is to use AWS directly but for me that I have 
never used Mentor it could be a so big step...

So the question is, do I lose many things if I start with Mentor 0.3.1 into a 
single VM without Docker?

Best regards,

Simone


> Il 8 giugno 2017 alle 11.37 "zeo...@gmail.com"  ha scritto:
> 
> 
> If I recall properly, 0.3.1 does not require docker yet.  That will come 
> with 0.4.0/master.  It still does require virtualization, however, to spin up 
> the dev environments (excluding if you were to run it on AWS).
> 
> Jon
> 
> 
> On Thu, Jun 8, 2017, 2:03 AM  
> wrote:
> 
> > > 
> > Hello,
> > 
> > I didn't check the BIOS if I can enable it.
> > 
> > Then, I see that yesterday has been updated the installation 
> > guideline for Metron 0.3.1 as follows:
> > 
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68718548
> > 
> > Reading that, there is no mention to Docker.
> > Maybe for my experiments (I do have to test some ML algo's) and 
> > improve my knowledge on this tool that installation should be enough.
> > 
> > What do you think about?
> > 
> > Thanks.
> > 
> > Simone
> > 
> > 
> > > > > 
> > > > > 
> > > > > 
> > > Il 7 giugno 2017 alle 23.32 "zeo...@gmail.com" 
> > >  ha scritto:
> > > 
> > > If your processor doesn't support virtualization right now I 
> > > would suggest
> > > looking into if it is simply disabled in your BIOS/UEFI (most 
> > > processers
> > > have supported this for 10+ years, excluding some processors 
> > > of course).
> > > Docker is integrated into the build process right now and is 
> > > considered
> > > mandatory (although you technically could work around it with 
> > > some effort).
> > > 
> > > Assuming you are spinning up full-dev, vagrant should create 
> > > a centos 6 VM
> > > to run Metron in.
> > > 
> > > Metron is the repository that you cloned from GitHub, 
> > > downloaded from
> > > Apache, etc. If you didn't do this, you will need to. Here is 
> > > our last
> > > release - http://metron.apache.org/documentation/#releases
> > > 
> > > Hope that helps
> > > 
> > > Jon
> > > 
> > > On Wed, Jun 7, 2017, 4:00 PM  > > mailto:sml...@libero.it > wrote:
> > > 
> > > > > 
> > > > > 
> > > > > > > 
> > > > Dear All,
> > > > 
> > > > I'm installing Metron, following the instructions found 
> > > > here:
> > > > 
> > > > 
> > > > https://github.com/apache/metron/tree/master/metron-deployment/vagrant/full-dev-platform
> > > > 
> > > > Unfortunately, my processor does not support 
> > > > virtualization and I'm not
> > > > able to launch Docker.
> > > > 
> > > > Is there any workaround?
> > > > 
> > > > I installed Vagrant on my OSX and I assumed to use 
> > > > Vagrant to create a VM
> > > > with Ubuntu in which I would run Metron. Is it right?
> > > > 
> > > > Another question about the instructions, I do not 
> > > > really understand where
> > > > I get Metron.
> > > > 
> > > > In this point:
> > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > > > > > >1. Deploy Metron
> > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > > > > > > 
> > > > cd metron-deployment/vagrant/full-dev-platform
> > > > vagrant up
> > > > 
> > > > I understood that I should have already downloaded 
> > > > Metron, but I don't.
> > > > Where is Metron?
> > > > 
> > > > Thank you.
> > > > Simone
> > > > 
> > > > --
> > > > 
> > > > > > > 
> > > > > 
> > > > > 
> > > > > > > 
> > > > > > > 
> > > Jon
> > > 
> > > > > 
> > > --
> 
> Jon
> 


Re: Installation problem with Docker and processor that does not support virtualization

2017-06-08 Thread smlabs
Hello,

I didn't check the BIOS if I can enable it.

Then, I see that yesterday has been updated the installation guideline for 
Metron 0.3.1 as follows:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68718548

Reading that, there is no mention to Docker.
Maybe for my experiments (I do have to test some ML algo's) and improve my 
knowledge on this tool that installation should be enough.

What do you think about?

Thanks.

Simone


> 
> Il 7 giugno 2017 alle 23.32 "zeo...@gmail.com"  ha 
> scritto:
> 
> If your processor doesn't support virtualization right now I would suggest
> looking into if it is simply disabled in your BIOS/UEFI (most processers
> have supported this for 10+ years, excluding some processors of course).
> Docker is integrated into the build process right now and is considered
> mandatory (although you technically could work around it with some 
> effort).
> 
> Assuming you are spinning up full-dev, vagrant should create a centos 6 VM
> to run Metron in.
> 
> Metron is the repository that you cloned from GitHub, downloaded from
> Apache, etc. If you didn't do this, you will need to. Here is our last
> release - http://metron.apache.org/documentation/#releases
> 
> Hope that helps
> 
> Jon
> 
> On Wed, Jun 7, 2017, 4:00 PM  wrote:
> 
> > > 
> > Dear All,
> > 
> > I'm installing Metron, following the instructions found here:
> > 
> > 
> > https://github.com/apache/metron/tree/master/metron-deployment/vagrant/full-dev-platform
> > 
> > Unfortunately, my processor does not support virtualization and I'm 
> > not
> > able to launch Docker.
> > 
> > Is there any workaround?
> > 
> > I installed Vagrant on my OSX and I assumed to use Vagrant to 
> > create a VM
> > with Ubuntu in which I would run Metron. Is it right?
> > 
> > Another question about the instructions, I do not really understand 
> > where
> > I get Metron.
> > 
> > In this point:
> > 
> >1. Deploy Metron
> > 
> > cd metron-deployment/vagrant/full-dev-platform
> > vagrant up
> > 
> > I understood that I should have already downloaded Metron, but I 
> > don't.
> > Where is Metron?
> > 
> > Thank you.
> > Simone
> > 
> > --
> > 
> > > 
> Jon
>