Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread zeo...@gmail.com
The field stub also gives something that can potentially be used in the
error dashboard (or similar) to graph, allowing failed enrichments to
"shout" louder to the end user.

Jon

On Tue, May 16, 2017 at 12:34 PM Nick Allen  wrote:

> > but also adds a field stub to indicate failed enrichment. This is then an
> indicator to an operator or investigator as well that something is missing,
> and could drive things like replay of the message to retrospectively enrich
> when things have calmed down.
>
> Yes, I like the idea of a "field stub".  You need some way to distinguish
> "did I configure this wrong" versus "something bad happened outside of my
> control".
>
>
>
> On Tue, May 16, 2017 at 12:27 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
> > Nick, I’d tend to agree with you there.
> >
> > How about:
> > If an enrichment fails / effectively times out, the join bolt emits the
> > message before cache eviction (as Nick’s point 2), but also adds a field
> > stub to indicate failed enrichment. This is then an indicator to an
> > operator or investigator as well that something is missing, and could
> drive
> > things like replay of the message to retrospectively enrich when things
> > have calmed down.
> >
> > Simon
> >
> > > On 16 May 2017, at 17:25, Nick Allen  wrote:
> > >
> > > Ah, yes.  Makes sense and I can see the value in the parallelism that
> the
> > > split/join provides.  Personally, I would like to see the code do the
> > > following.
> > >
> > > (1) Scream and shout when something in the cache expires.  We have to
> > make
> > > sure that it is blatantly obvious to a user what happened.  We also
> need
> > to
> > > make it blatantly obvious to the user what knobs they can turn to
> correct
> > > the problem.
> > >
> > > (2) Enrichments should be treated as best-effort.  When the cache
> > expires,
> > > it should pass on the message without the enrichments that have not
> > > completed.  If I am relying on an external system for an enrichment, I
> > > don't want an external system outage to fail all of my telemetry.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, May 16, 2017 at 12:05 PM, Casey Stella 
> > wrote:
> > >
> > >> We still do use split/join even within stellar enrichments.  Take for
> > >> instance the following enrichment:
> > >> {
> > >>  "enrichment" : {
> > >>"fieldMap" : {
> > >>  "stellar" : {
> > >> "config" : {
> > >> "parallel-task-1" : {
> > >> "my_field" : "PROFILE_GET()"
> > >> },
> > >> "parallel-task-2" : {
> > >> "my_field2" : "PROFILE_GET()"
> > >> }
> > >> }
> > >>  }
> > >>}
> > >>  }
> > >>
> > >> Messages will get split between two tasks of the Stellar enrichment
> bolt
> > >> and the stellar statements in "parallel-task-1" will be executed in
> > >> parallel to those in "parallel-task-2".  This is to enable people to
> > >> separate computationally intensive or otherwise high latency tasks
> that
> > are
> > >> independent across nodes in the cluster.
> > >>
> > >> I will agree wholeheartedly, though, that my personal desire would be
> to
> > >> have just stellar enrichments, though.  You can do every one of the
> > other
> > >> enrichments in Stellar and it would greatly simplify that config
> above.
> > >>
> > >>
> > >>
> > >> On Tue, May 16, 2017 at 11:59 AM, Nick Allen 
> > wrote:
> > >>
> > >>> I would like to see us just migrate wholly to Stellar enrichments and
> > >>> remove the separate HBase and Geo enrichment bolts from the
> Enrichment
> > >>> topology.  Stellar provides a user with much greater flexibility than
> > the
> > >>> existing HBase and Geo enrichment bolts.
> > >>>
> > >>> A side effect of this would be to greatly simplify the Enrichment
> > >>> topology.  I don't think we would not need the split/join pattern if
> we
> > >> did
> > >>> this. No?
> > >>>
> > >>> On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
> > >> wrote:
> > >>>
> >  The problem is that an enrichment type won't necessarily have a
> fixed
> >  performance characteristic.  Take stellar enrichments, for instance.
> > >>> Doing
> >  a HBase call for one sensor vs doing simple string munging will have
> > >>> vastly
> >  differing performance.  Both of them are functioning within the
> > stellar
> >  enrichment bolt.  Also, some enrichments may call for multiple calls
> > to
> >  HBase.  Parallelizing those, would make some sense, I think.
> > 
> >  I do take your point, though, that it's not as though it's strictly
> > >>> serial,
> >  it's just that the unit of parallelism is the message, rather than
> the
> >  enrichment per message.
> > 
> >  On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> > >> tramn...@trasec.de
> > 
> >  wrote:
> > 
> > > I’m glad you bring this up. 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
> but also adds a field stub to indicate failed enrichment. This is then an
indicator to an operator or investigator as well that something is missing,
and could drive things like replay of the message to retrospectively enrich
when things have calmed down.

Yes, I like the idea of a "field stub".  You need some way to distinguish
"did I configure this wrong" versus "something bad happened outside of my
control".



On Tue, May 16, 2017 at 12:27 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Nick, I’d tend to agree with you there.
>
> How about:
> If an enrichment fails / effectively times out, the join bolt emits the
> message before cache eviction (as Nick’s point 2), but also adds a field
> stub to indicate failed enrichment. This is then an indicator to an
> operator or investigator as well that something is missing, and could drive
> things like replay of the message to retrospectively enrich when things
> have calmed down.
>
> Simon
>
> > On 16 May 2017, at 17:25, Nick Allen  wrote:
> >
> > Ah, yes.  Makes sense and I can see the value in the parallelism that the
> > split/join provides.  Personally, I would like to see the code do the
> > following.
> >
> > (1) Scream and shout when something in the cache expires.  We have to
> make
> > sure that it is blatantly obvious to a user what happened.  We also need
> to
> > make it blatantly obvious to the user what knobs they can turn to correct
> > the problem.
> >
> > (2) Enrichments should be treated as best-effort.  When the cache
> expires,
> > it should pass on the message without the enrichments that have not
> > completed.  If I am relying on an external system for an enrichment, I
> > don't want an external system outage to fail all of my telemetry.
> >
> >
> >
> >
> >
> > On Tue, May 16, 2017 at 12:05 PM, Casey Stella 
> wrote:
> >
> >> We still do use split/join even within stellar enrichments.  Take for
> >> instance the following enrichment:
> >> {
> >>  "enrichment" : {
> >>"fieldMap" : {
> >>  "stellar" : {
> >> "config" : {
> >> "parallel-task-1" : {
> >> "my_field" : "PROFILE_GET()"
> >> },
> >> "parallel-task-2" : {
> >> "my_field2" : "PROFILE_GET()"
> >> }
> >> }
> >>  }
> >>}
> >>  }
> >>
> >> Messages will get split between two tasks of the Stellar enrichment bolt
> >> and the stellar statements in "parallel-task-1" will be executed in
> >> parallel to those in "parallel-task-2".  This is to enable people to
> >> separate computationally intensive or otherwise high latency tasks that
> are
> >> independent across nodes in the cluster.
> >>
> >> I will agree wholeheartedly, though, that my personal desire would be to
> >> have just stellar enrichments, though.  You can do every one of the
> other
> >> enrichments in Stellar and it would greatly simplify that config above.
> >>
> >>
> >>
> >> On Tue, May 16, 2017 at 11:59 AM, Nick Allen 
> wrote:
> >>
> >>> I would like to see us just migrate wholly to Stellar enrichments and
> >>> remove the separate HBase and Geo enrichment bolts from the Enrichment
> >>> topology.  Stellar provides a user with much greater flexibility than
> the
> >>> existing HBase and Geo enrichment bolts.
> >>>
> >>> A side effect of this would be to greatly simplify the Enrichment
> >>> topology.  I don't think we would not need the split/join pattern if we
> >> did
> >>> this. No?
> >>>
> >>> On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
> >> wrote:
> >>>
>  The problem is that an enrichment type won't necessarily have a fixed
>  performance characteristic.  Take stellar enrichments, for instance.
> >>> Doing
>  a HBase call for one sensor vs doing simple string munging will have
> >>> vastly
>  differing performance.  Both of them are functioning within the
> stellar
>  enrichment bolt.  Also, some enrichments may call for multiple calls
> to
>  HBase.  Parallelizing those, would make some sense, I think.
> 
>  I do take your point, though, that it's not as though it's strictly
> >>> serial,
>  it's just that the unit of parallelism is the message, rather than the
>  enrichment per message.
> 
>  On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> >> tramn...@trasec.de
> 
>  wrote:
> 
> > I’m glad you bring this up. This is a huge architectural difference
> >>> from
> > the original OpenSOC topology and one that we have been warned to
> >> take
>  back
> > then.
> > To be perfectly honest, I don’t see the big perfomance improvement
> >> from
> > parallel processing. If a specific enrichment is a little more i/o
> > dependent than the other you can tweak parallelism to address this.
> >>> Also
> > there can be dependencies that make parallel enrichment virtually
> > impossible or at least less 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
If we are timing out things from the cache, we have that latency already


On May 16, 2017 at 12:09:32, Casey Stella (ceste...@gmail.com) wrote:

We could definitely parallelize within the bolt, but you're right, it does
break the storm model. I also like making things other people's problems
(it's called working "smart" not "hard", right? not laziness, surely. ;),
but yeah, using windowing for this seems like it might introduce some
artificial latency. It's also not going to eliminate the problem, but
rather just make the knob to tweak things have a different characteristic.
Whereas before we have knobs around how many messages, now it's a knob
around how long an enrichment is going to take maximally (which, I think is
more natural, honestly).

On Tue, May 16, 2017 at 12:05 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Would you then parallelise within Stellar to handle things like multiple
> lookups? This feels like it would be breaking the storm model somewhat,
and
> could lead to bad things with threads for example. Or would you think of
> doing something like the grouping Stellar uses today to parallelise
across
> something like a pool of Stellar bolts and join?
>
> I like the idea of Otto’s solution (making it someone else's problem,
> storm’s specifically :) ) but that also assumes we insert the artificial
> latency of a time windowed join. If we’re going down that route, we might
> as well just use spark and run everything on yarn. At that point though
we
> lose a lot of the benefits of low latency for time to detection, and
> real-time enrichment in things like the streaming enrichment writer.
>
> Simon
>
> > On 16 May 2017, at 16:59, Nick Allen  wrote:
> >
> > I would like to see us just migrate wholly to Stellar enrichments and
> > remove the separate HBase and Geo enrichment bolts from the Enrichment
> > topology. Stellar provides a user with much greater flexibility than
the
> > existing HBase and Geo enrichment bolts.
> >
> > A side effect of this would be to greatly simplify the Enrichment
> > topology. I don't think we would not need the split/join pattern if we
> did
> > this. No?
> >
> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
> wrote:
> >
> >> The problem is that an enrichment type won't necessarily have a fixed
> >> performance characteristic. Take stellar enrichments, for instance.
> Doing
> >> a HBase call for one sensor vs doing simple string munging will have
> vastly
> >> differing performance. Both of them are functioning within the stellar
> >> enrichment bolt. Also, some enrichments may call for multiple calls to
> >> HBase. Parallelizing those, would make some sense, I think.
> >>
> >> I do take your point, though, that it's not as though it's strictly
> serial,
> >> it's just that the unit of parallelism is the message, rather than the
> >> enrichment per message.
> >>
> >> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> tramn...@trasec.de>
> >> wrote:
> >>
> >>> I’m glad you bring this up. This is a huge architectural difference
> from
> >>> the original OpenSOC topology and one that we have been warned to
take
> >> back
> >>> then.
> >>> To be perfectly honest, I don’t see the big perfomance improvement
from
> >>> parallel processing. If a specific enrichment is a little more i/o
> >>> dependent than the other you can tweak parallelism to address this.
> Also
> >>> there can be dependencies that make parallel enrichment virtually
> >>> impossible or at least less efficient (i.e. first labeling, and
> >>> “completing” a message and then dependent of label and completeness
do
> >>> different other enrichments).
> >>>
> >>> So you have a +1 from me for serial rather than parallel enrichment.
> >>>
> >>>
> >>> BR,
> >>> Christian
> >>>
> >>> On 16.05.17, 16:58, "Casey Stella"  wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Last week, I encountered some weirdness in the Enrichment topology.
> >>> Doing
> >>> some somewhat high-latency enrichment work, I noticed that at some
> >>> point,
> >>> data stopped flowing through the enrichment topology. I tracked
> down
> >>> the
> >>> problem to the join bolt. For those who aren't aware, we do a
> >>> split/join
> >>> pattern so that enrichments can be done in parallel. It works as
> >>> follows:
> >>>
> >>> - A split bolt sends the appropriate subset of the message to
> each
> >>> enrichment bolt as well as the whole message to the join bolt
> >>> - The join bolt will receive each of the pieces of the message
> and
> >>> then,
> >>> when fully joined, it will send the message on.
> >>>
> >>>
> >>> What is happening under load or high velocity, however, is that the
> >>> cache
> >>> is evicting the partially joined message before it can be fully
> >> joined
> >>> due
> >>> to the volume of traffic. This is obviously not ideal. As such, it
> >> is
> >>> clear that adjusting the size of the cache and the characteristics
> of
> >>> eviction is likely a 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Nick, I’d tend to agree with you there.

How about:
If an enrichment fails / effectively times out, the join bolt emits the message 
before cache eviction (as Nick’s point 2), but also adds a field stub to 
indicate failed enrichment. This is then an indicator to an operator or 
investigator as well that something is missing, and could drive things like 
replay of the message to retrospectively enrich when things have calmed down. 

Simon

> On 16 May 2017, at 17:25, Nick Allen  wrote:
> 
> Ah, yes.  Makes sense and I can see the value in the parallelism that the
> split/join provides.  Personally, I would like to see the code do the
> following.
> 
> (1) Scream and shout when something in the cache expires.  We have to make
> sure that it is blatantly obvious to a user what happened.  We also need to
> make it blatantly obvious to the user what knobs they can turn to correct
> the problem.
> 
> (2) Enrichments should be treated as best-effort.  When the cache expires,
> it should pass on the message without the enrichments that have not
> completed.  If I am relying on an external system for an enrichment, I
> don't want an external system outage to fail all of my telemetry.
> 
> 
> 
> 
> 
> On Tue, May 16, 2017 at 12:05 PM, Casey Stella  wrote:
> 
>> We still do use split/join even within stellar enrichments.  Take for
>> instance the following enrichment:
>> {
>>  "enrichment" : {
>>"fieldMap" : {
>>  "stellar" : {
>> "config" : {
>> "parallel-task-1" : {
>> "my_field" : "PROFILE_GET()"
>> },
>> "parallel-task-2" : {
>> "my_field2" : "PROFILE_GET()"
>> }
>> }
>>  }
>>}
>>  }
>> 
>> Messages will get split between two tasks of the Stellar enrichment bolt
>> and the stellar statements in "parallel-task-1" will be executed in
>> parallel to those in "parallel-task-2".  This is to enable people to
>> separate computationally intensive or otherwise high latency tasks that are
>> independent across nodes in the cluster.
>> 
>> I will agree wholeheartedly, though, that my personal desire would be to
>> have just stellar enrichments, though.  You can do every one of the other
>> enrichments in Stellar and it would greatly simplify that config above.
>> 
>> 
>> 
>> On Tue, May 16, 2017 at 11:59 AM, Nick Allen  wrote:
>> 
>>> I would like to see us just migrate wholly to Stellar enrichments and
>>> remove the separate HBase and Geo enrichment bolts from the Enrichment
>>> topology.  Stellar provides a user with much greater flexibility than the
>>> existing HBase and Geo enrichment bolts.
>>> 
>>> A side effect of this would be to greatly simplify the Enrichment
>>> topology.  I don't think we would not need the split/join pattern if we
>> did
>>> this. No?
>>> 
>>> On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
>> wrote:
>>> 
 The problem is that an enrichment type won't necessarily have a fixed
 performance characteristic.  Take stellar enrichments, for instance.
>>> Doing
 a HBase call for one sensor vs doing simple string munging will have
>>> vastly
 differing performance.  Both of them are functioning within the stellar
 enrichment bolt.  Also, some enrichments may call for multiple calls to
 HBase.  Parallelizing those, would make some sense, I think.
 
 I do take your point, though, that it's not as though it's strictly
>>> serial,
 it's just that the unit of parallelism is the message, rather than the
 enrichment per message.
 
 On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
>> tramn...@trasec.de
 
 wrote:
 
> I’m glad you bring this up. This is a huge architectural difference
>>> from
> the original OpenSOC topology and one that we have been warned to
>> take
 back
> then.
> To be perfectly honest, I don’t see the big perfomance improvement
>> from
> parallel processing. If a specific enrichment is a little more i/o
> dependent than the other you can tweak parallelism to address this.
>>> Also
> there can be dependencies that make parallel enrichment virtually
> impossible or at least less efficient (i.e. first labeling, and
> “completing” a message and then dependent of label and completeness
>> do
> different other enrichments).
> 
> So you have a +1 from me for serial rather than parallel enrichment.
> 
> 
> BR,
>   Christian
> 
> On 16.05.17, 16:58, "Casey Stella"  wrote:
> 
>Hi All,
> 
>Last week, I encountered some weirdness in the Enrichment
>> topology.
> Doing
>some somewhat high-latency enrichment work, I noticed that at
>> some
> point,
>data stopped flowing through the enrichment topology.  I tracked
>>> down
> the
>problem to the join bolt.  For those who aren't aware, we 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
Ah, yes.  Makes sense and I can see the value in the parallelism that the
split/join provides.  Personally, I would like to see the code do the
following.

(1) Scream and shout when something in the cache expires.  We have to make
sure that it is blatantly obvious to a user what happened.  We also need to
make it blatantly obvious to the user what knobs they can turn to correct
the problem.

(2) Enrichments should be treated as best-effort.  When the cache expires,
it should pass on the message without the enrichments that have not
completed.  If I am relying on an external system for an enrichment, I
don't want an external system outage to fail all of my telemetry.





On Tue, May 16, 2017 at 12:05 PM, Casey Stella  wrote:

> We still do use split/join even within stellar enrichments.  Take for
> instance the following enrichment:
> {
>   "enrichment" : {
> "fieldMap" : {
>   "stellar" : {
>  "config" : {
>  "parallel-task-1" : {
>  "my_field" : "PROFILE_GET()"
>  },
>  "parallel-task-2" : {
>  "my_field2" : "PROFILE_GET()"
>  }
>  }
>   }
> }
>   }
>
> Messages will get split between two tasks of the Stellar enrichment bolt
> and the stellar statements in "parallel-task-1" will be executed in
> parallel to those in "parallel-task-2".  This is to enable people to
> separate computationally intensive or otherwise high latency tasks that are
> independent across nodes in the cluster.
>
> I will agree wholeheartedly, though, that my personal desire would be to
> have just stellar enrichments, though.  You can do every one of the other
> enrichments in Stellar and it would greatly simplify that config above.
>
>
>
> On Tue, May 16, 2017 at 11:59 AM, Nick Allen  wrote:
>
> > I would like to see us just migrate wholly to Stellar enrichments and
> > remove the separate HBase and Geo enrichment bolts from the Enrichment
> > topology.  Stellar provides a user with much greater flexibility than the
> > existing HBase and Geo enrichment bolts.
> >
> > A side effect of this would be to greatly simplify the Enrichment
> > topology.  I don't think we would not need the split/join pattern if we
> did
> > this. No?
> >
> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
> wrote:
> >
> > > The problem is that an enrichment type won't necessarily have a fixed
> > > performance characteristic.  Take stellar enrichments, for instance.
> > Doing
> > > a HBase call for one sensor vs doing simple string munging will have
> > vastly
> > > differing performance.  Both of them are functioning within the stellar
> > > enrichment bolt.  Also, some enrichments may call for multiple calls to
> > > HBase.  Parallelizing those, would make some sense, I think.
> > >
> > > I do take your point, though, that it's not as though it's strictly
> > serial,
> > > it's just that the unit of parallelism is the message, rather than the
> > > enrichment per message.
> > >
> > > On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> tramn...@trasec.de
> > >
> > > wrote:
> > >
> > > > I’m glad you bring this up. This is a huge architectural difference
> > from
> > > > the original OpenSOC topology and one that we have been warned to
> take
> > > back
> > > > then.
> > > > To be perfectly honest, I don’t see the big perfomance improvement
> from
> > > > parallel processing. If a specific enrichment is a little more i/o
> > > > dependent than the other you can tweak parallelism to address this.
> > Also
> > > > there can be dependencies that make parallel enrichment virtually
> > > > impossible or at least less efficient (i.e. first labeling, and
> > > > “completing” a message and then dependent of label and completeness
> do
> > > > different other enrichments).
> > > >
> > > > So you have a +1 from me for serial rather than parallel enrichment.
> > > >
> > > >
> > > > BR,
> > > >Christian
> > > >
> > > > On 16.05.17, 16:58, "Casey Stella"  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Last week, I encountered some weirdness in the Enrichment
> topology.
> > > > Doing
> > > > some somewhat high-latency enrichment work, I noticed that at
> some
> > > > point,
> > > > data stopped flowing through the enrichment topology.  I tracked
> > down
> > > > the
> > > > problem to the join bolt.  For those who aren't aware, we do a
> > > > split/join
> > > > pattern so that enrichments can be done in parallel.  It works as
> > > > follows:
> > > >
> > > >- A split bolt sends the appropriate subset of the message to
> > each
> > > >enrichment bolt as well as the whole message to the join bolt
> > > >- The join bolt will receive each of the pieces of the message
> > and
> > > > then,
> > > >when fully joined, it will send the message on.
> > > >
> > > >
> > > > What is happening under load or high 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
I do want to say here, that I don't mean to sound the alarm and say that
everything is broken.  I would not characterize the topology as "broken"
architecturally, but rather the lack of reporting when things go
pear-shaped is a bug in implementation.  With logging and documentation
about the knobs to tune, this architecture works, I believe.

On Tue, May 16, 2017 at 12:09 PM, Casey Stella  wrote:

> We could definitely parallelize within the bolt, but you're right, it does
> break the storm model.  I also like making things other people's problems
> (it's called working "smart" not "hard", right?  not laziness, surely. ;),
> but yeah, using windowing for this seems like it might introduce some
> artificial latency.  It's also not going to eliminate the problem, but
> rather just make the knob to tweak things have a different characteristic.
> Whereas before we have knobs around how many messages, now it's a knob
> around how long an enrichment is going to take maximally (which, I think is
> more natural, honestly).
>
> On Tue, May 16, 2017 at 12:05 PM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
>
>> Would you then parallelise within Stellar to handle things like multiple
>> lookups? This feels like it would be breaking the storm model somewhat, and
>> could lead to bad things with threads for example. Or would you think of
>> doing something like the grouping Stellar uses today to parallelise across
>> something like a pool of Stellar bolts and join?
>>
>> I like the idea of Otto’s solution (making it someone else's problem,
>> storm’s specifically :) ) but that also assumes we insert the artificial
>> latency of a time windowed join. If we’re going down that route, we might
>> as well just use spark and run everything on yarn. At that point though we
>> lose a lot of the benefits of low latency for time to detection, and
>> real-time enrichment in things like the streaming enrichment writer.
>>
>> Simon
>>
>> > On 16 May 2017, at 16:59, Nick Allen  wrote:
>> >
>> > I would like to see us just migrate wholly to Stellar enrichments and
>> > remove the separate HBase and Geo enrichment bolts from the Enrichment
>> > topology.  Stellar provides a user with much greater flexibility than
>> the
>> > existing HBase and Geo enrichment bolts.
>> >
>> > A side effect of this would be to greatly simplify the Enrichment
>> > topology.  I don't think we would not need the split/join pattern if we
>> did
>> > this. No?
>> >
>> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
>> wrote:
>> >
>> >> The problem is that an enrichment type won't necessarily have a fixed
>> >> performance characteristic.  Take stellar enrichments, for instance.
>> Doing
>> >> a HBase call for one sensor vs doing simple string munging will have
>> vastly
>> >> differing performance.  Both of them are functioning within the stellar
>> >> enrichment bolt.  Also, some enrichments may call for multiple calls to
>> >> HBase.  Parallelizing those, would make some sense, I think.
>> >>
>> >> I do take your point, though, that it's not as though it's strictly
>> serial,
>> >> it's just that the unit of parallelism is the message, rather than the
>> >> enrichment per message.
>> >>
>> >> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
>> tramn...@trasec.de>
>> >> wrote:
>> >>
>> >>> I’m glad you bring this up. This is a huge architectural difference
>> from
>> >>> the original OpenSOC topology and one that we have been warned to take
>> >> back
>> >>> then.
>> >>> To be perfectly honest, I don’t see the big perfomance improvement
>> from
>> >>> parallel processing. If a specific enrichment is a little more i/o
>> >>> dependent than the other you can tweak parallelism to address this.
>> Also
>> >>> there can be dependencies that make parallel enrichment virtually
>> >>> impossible or at least less efficient (i.e. first labeling, and
>> >>> “completing” a message and then dependent of label and completeness do
>> >>> different other enrichments).
>> >>>
>> >>> So you have a +1 from me for serial rather than parallel enrichment.
>> >>>
>> >>>
>> >>> BR,
>> >>>   Christian
>> >>>
>> >>> On 16.05.17, 16:58, "Casey Stella"  wrote:
>> >>>
>> >>>Hi All,
>> >>>
>> >>>Last week, I encountered some weirdness in the Enrichment topology.
>> >>> Doing
>> >>>some somewhat high-latency enrichment work, I noticed that at some
>> >>> point,
>> >>>data stopped flowing through the enrichment topology.  I tracked
>> down
>> >>> the
>> >>>problem to the join bolt.  For those who aren't aware, we do a
>> >>> split/join
>> >>>pattern so that enrichments can be done in parallel.  It works as
>> >>> follows:
>> >>>
>> >>>   - A split bolt sends the appropriate subset of the message to
>> each
>> >>>   enrichment bolt as well as the whole message to the join bolt
>> >>>   - The join bolt will receive each of the pieces of the message
>> 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We could definitely parallelize within the bolt, but you're right, it does
break the storm model.  I also like making things other people's problems
(it's called working "smart" not "hard", right?  not laziness, surely. ;),
but yeah, using windowing for this seems like it might introduce some
artificial latency.  It's also not going to eliminate the problem, but
rather just make the knob to tweak things have a different characteristic.
Whereas before we have knobs around how many messages, now it's a knob
around how long an enrichment is going to take maximally (which, I think is
more natural, honestly).

On Tue, May 16, 2017 at 12:05 PM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Would you then parallelise within Stellar to handle things like multiple
> lookups? This feels like it would be breaking the storm model somewhat, and
> could lead to bad things with threads for example. Or would you think of
> doing something like the grouping Stellar uses today to parallelise across
> something like a pool of Stellar bolts and join?
>
> I like the idea of Otto’s solution (making it someone else's problem,
> storm’s specifically :) ) but that also assumes we insert the artificial
> latency of a time windowed join. If we’re going down that route, we might
> as well just use spark and run everything on yarn. At that point though we
> lose a lot of the benefits of low latency for time to detection, and
> real-time enrichment in things like the streaming enrichment writer.
>
> Simon
>
> > On 16 May 2017, at 16:59, Nick Allen  wrote:
> >
> > I would like to see us just migrate wholly to Stellar enrichments and
> > remove the separate HBase and Geo enrichment bolts from the Enrichment
> > topology.  Stellar provides a user with much greater flexibility than the
> > existing HBase and Geo enrichment bolts.
> >
> > A side effect of this would be to greatly simplify the Enrichment
> > topology.  I don't think we would not need the split/join pattern if we
> did
> > this. No?
> >
> > On Tue, May 16, 2017 at 11:54 AM, Casey Stella 
> wrote:
> >
> >> The problem is that an enrichment type won't necessarily have a fixed
> >> performance characteristic.  Take stellar enrichments, for instance.
> Doing
> >> a HBase call for one sensor vs doing simple string munging will have
> vastly
> >> differing performance.  Both of them are functioning within the stellar
> >> enrichment bolt.  Also, some enrichments may call for multiple calls to
> >> HBase.  Parallelizing those, would make some sense, I think.
> >>
> >> I do take your point, though, that it's not as though it's strictly
> serial,
> >> it's just that the unit of parallelism is the message, rather than the
> >> enrichment per message.
> >>
> >> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz <
> tramn...@trasec.de>
> >> wrote:
> >>
> >>> I’m glad you bring this up. This is a huge architectural difference
> from
> >>> the original OpenSOC topology and one that we have been warned to take
> >> back
> >>> then.
> >>> To be perfectly honest, I don’t see the big perfomance improvement from
> >>> parallel processing. If a specific enrichment is a little more i/o
> >>> dependent than the other you can tweak parallelism to address this.
> Also
> >>> there can be dependencies that make parallel enrichment virtually
> >>> impossible or at least less efficient (i.e. first labeling, and
> >>> “completing” a message and then dependent of label and completeness do
> >>> different other enrichments).
> >>>
> >>> So you have a +1 from me for serial rather than parallel enrichment.
> >>>
> >>>
> >>> BR,
> >>>   Christian
> >>>
> >>> On 16.05.17, 16:58, "Casey Stella"  wrote:
> >>>
> >>>Hi All,
> >>>
> >>>Last week, I encountered some weirdness in the Enrichment topology.
> >>> Doing
> >>>some somewhat high-latency enrichment work, I noticed that at some
> >>> point,
> >>>data stopped flowing through the enrichment topology.  I tracked
> down
> >>> the
> >>>problem to the join bolt.  For those who aren't aware, we do a
> >>> split/join
> >>>pattern so that enrichments can be done in parallel.  It works as
> >>> follows:
> >>>
> >>>   - A split bolt sends the appropriate subset of the message to
> each
> >>>   enrichment bolt as well as the whole message to the join bolt
> >>>   - The join bolt will receive each of the pieces of the message
> and
> >>> then,
> >>>   when fully joined, it will send the message on.
> >>>
> >>>
> >>>What is happening under load or high velocity, however, is that the
> >>> cache
> >>>is evicting the partially joined message before it can be fully
> >> joined
> >>> due
> >>>to the volume of traffic.  This is obviously not ideal.  As such, it
> >> is
> >>>clear that adjusting the size of the cache and the characteristics
> of
> >>>eviction is likely a good idea and a necessary part to tuning
> >>> enrichments.
> >>>The 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Would you then parallelise within Stellar to handle things like multiple 
lookups? This feels like it would be breaking the storm model somewhat, and 
could lead to bad things with threads for example. Or would you think of doing 
something like the grouping Stellar uses today to parallelise across something 
like a pool of Stellar bolts and join? 

I like the idea of Otto’s solution (making it someone else's problem, storm’s 
specifically :) ) but that also assumes we insert the artificial latency of a 
time windowed join. If we’re going down that route, we might as well just use 
spark and run everything on yarn. At that point though we lose a lot of the 
benefits of low latency for time to detection, and real-time enrichment in 
things like the streaming enrichment writer.

Simon

> On 16 May 2017, at 16:59, Nick Allen  wrote:
> 
> I would like to see us just migrate wholly to Stellar enrichments and
> remove the separate HBase and Geo enrichment bolts from the Enrichment
> topology.  Stellar provides a user with much greater flexibility than the
> existing HBase and Geo enrichment bolts.
> 
> A side effect of this would be to greatly simplify the Enrichment
> topology.  I don't think we would not need the split/join pattern if we did
> this. No?
> 
> On Tue, May 16, 2017 at 11:54 AM, Casey Stella  wrote:
> 
>> The problem is that an enrichment type won't necessarily have a fixed
>> performance characteristic.  Take stellar enrichments, for instance.  Doing
>> a HBase call for one sensor vs doing simple string munging will have vastly
>> differing performance.  Both of them are functioning within the stellar
>> enrichment bolt.  Also, some enrichments may call for multiple calls to
>> HBase.  Parallelizing those, would make some sense, I think.
>> 
>> I do take your point, though, that it's not as though it's strictly serial,
>> it's just that the unit of parallelism is the message, rather than the
>> enrichment per message.
>> 
>> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz 
>> wrote:
>> 
>>> I’m glad you bring this up. This is a huge architectural difference from
>>> the original OpenSOC topology and one that we have been warned to take
>> back
>>> then.
>>> To be perfectly honest, I don’t see the big perfomance improvement from
>>> parallel processing. If a specific enrichment is a little more i/o
>>> dependent than the other you can tweak parallelism to address this. Also
>>> there can be dependencies that make parallel enrichment virtually
>>> impossible or at least less efficient (i.e. first labeling, and
>>> “completing” a message and then dependent of label and completeness do
>>> different other enrichments).
>>> 
>>> So you have a +1 from me for serial rather than parallel enrichment.
>>> 
>>> 
>>> BR,
>>>   Christian
>>> 
>>> On 16.05.17, 16:58, "Casey Stella"  wrote:
>>> 
>>>Hi All,
>>> 
>>>Last week, I encountered some weirdness in the Enrichment topology.
>>> Doing
>>>some somewhat high-latency enrichment work, I noticed that at some
>>> point,
>>>data stopped flowing through the enrichment topology.  I tracked down
>>> the
>>>problem to the join bolt.  For those who aren't aware, we do a
>>> split/join
>>>pattern so that enrichments can be done in parallel.  It works as
>>> follows:
>>> 
>>>   - A split bolt sends the appropriate subset of the message to each
>>>   enrichment bolt as well as the whole message to the join bolt
>>>   - The join bolt will receive each of the pieces of the message and
>>> then,
>>>   when fully joined, it will send the message on.
>>> 
>>> 
>>>What is happening under load or high velocity, however, is that the
>>> cache
>>>is evicting the partially joined message before it can be fully
>> joined
>>> due
>>>to the volume of traffic.  This is obviously not ideal.  As such, it
>> is
>>>clear that adjusting the size of the cache and the characteristics of
>>>eviction is likely a good idea and a necessary part to tuning
>>> enrichments.
>>>The cache size is sensitive to:
>>> 
>>>   - The latency of the *slowest* enrichment
>>>   - The number of tuples in flight at once
>>> 
>>>As such, the knobs you have to tune are either the parallelism of the
>>> join
>>>bolt or the size of the cache.
>>> 
>>>As it stands, I see a couple of things wrong here that we can correct
>>> with
>>>minimal issue:
>>> 
>>>   - We have no message of warning indicating that this is happening
>>>   - Changing cache sizes means changing flux.  We should promote
>> this
>>> to
>>>   the properties file.
>>>   - We should document the knobs mentioned above clearly in the
>>> enrichment
>>>   topology README
>>> 
>>>Those small changes, I think, are table stakes, but what I wanted to
>>>discuss more in depth is the lingering questions:
>>> 
>>>   - Is this an architectural pattern that we 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
I am not sure that you can say we wouldn’t ‘need’ it.  But we would not
‘have’ it rather.


On May 16, 2017 at 11:59:42, Nick Allen (n...@nickallen.org) wrote:

I would like to see us just migrate wholly to Stellar enrichments and
remove the separate HBase and Geo enrichment bolts from the Enrichment
topology. Stellar provides a user with much greater flexibility than the
existing HBase and Geo enrichment bolts.

A side effect of this would be to greatly simplify the Enrichment
topology. I don't think we would not need the split/join pattern if we did
this. No?

On Tue, May 16, 2017 at 11:54 AM, Casey Stella  wrote:

> The problem is that an enrichment type won't necessarily have a fixed
> performance characteristic. Take stellar enrichments, for instance. Doing
> a HBase call for one sensor vs doing simple string munging will have
vastly
> differing performance. Both of them are functioning within the stellar
> enrichment bolt. Also, some enrichments may call for multiple calls to
> HBase. Parallelizing those, would make some sense, I think.
>
> I do take your point, though, that it's not as though it's strictly
serial,
> it's just that the unit of parallelism is the message, rather than the
> enrichment per message.
>
> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz 
> wrote:
>
> > I’m glad you bring this up. This is a huge architectural difference
from
> > the original OpenSOC topology and one that we have been warned to take
> back
> > then.
> > To be perfectly honest, I don’t see the big perfomance improvement from
> > parallel processing. If a specific enrichment is a little more i/o
> > dependent than the other you can tweak parallelism to address this.
Also
> > there can be dependencies that make parallel enrichment virtually
> > impossible or at least less efficient (i.e. first labeling, and
> > “completing” a message and then dependent of label and completeness do
> > different other enrichments).
> >
> > So you have a +1 from me for serial rather than parallel enrichment.
> >
> >
> > BR,
> > Christian
> >
> > On 16.05.17, 16:58, "Casey Stella"  wrote:
> >
> > Hi All,
> >
> > Last week, I encountered some weirdness in the Enrichment topology.
> > Doing
> > some somewhat high-latency enrichment work, I noticed that at some
> > point,
> > data stopped flowing through the enrichment topology. I tracked down
> > the
> > problem to the join bolt. For those who aren't aware, we do a
> > split/join
> > pattern so that enrichments can be done in parallel. It works as
> > follows:
> >
> > - A split bolt sends the appropriate subset of the message to each
> > enrichment bolt as well as the whole message to the join bolt
> > - The join bolt will receive each of the pieces of the message and
> > then,
> > when fully joined, it will send the message on.
> >
> >
> > What is happening under load or high velocity, however, is that the
> > cache
> > is evicting the partially joined message before it can be fully
> joined
> > due
> > to the volume of traffic. This is obviously not ideal. As such, it
> is
> > clear that adjusting the size of the cache and the characteristics of
> > eviction is likely a good idea and a necessary part to tuning
> > enrichments.
> > The cache size is sensitive to:
> >
> > - The latency of the *slowest* enrichment
> > - The number of tuples in flight at once
> >
> > As such, the knobs you have to tune are either the parallelism of the
> > join
> > bolt or the size of the cache.
> >
> > As it stands, I see a couple of things wrong here that we can correct
> > with
> > minimal issue:
> >
> > - We have no message of warning indicating that this is happening
> > - Changing cache sizes means changing flux. We should promote
> this
> > to
> > the properties file.
> > - We should document the knobs mentioned above clearly in the
> > enrichment
> > topology README
> >
> > Those small changes, I think, are table stakes, but what I wanted to
> > discuss more in depth is the lingering questions:
> >
> > - Is this an architectural pattern that we can use as-is?
> > - Should we consider a persistent cache a la HBase or Apache
> > Ignite
> > as a pluggable component to Metron?
> > - Should we consider taking the performance hit and doing the
> > enrichments serially?
> > - When an eviction happens, what should we do?
> > - Fail the tuple, thereby making congestion worse
> > - Pass through the partially enriched results, thereby making
> > enrichments "best effort"
> >
> > Anyway, I wanted to talk this through and inform of some of the
> things
> > I'm
> > seeing.
> >
> > Sorry for the novel. ;)
> >
> > Casey
> >
> >
> >
>


Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We still do use split/join even within stellar enrichments.  Take for
instance the following enrichment:
{
  "enrichment" : {
"fieldMap" : {
  "stellar" : {
 "config" : {
 "parallel-task-1" : {
 "my_field" : "PROFILE_GET()"
 },
 "parallel-task-2" : {
 "my_field2" : "PROFILE_GET()"
 }
 }
  }
}
  }

Messages will get split between two tasks of the Stellar enrichment bolt
and the stellar statements in "parallel-task-1" will be executed in
parallel to those in "parallel-task-2".  This is to enable people to
separate computationally intensive or otherwise high latency tasks that are
independent across nodes in the cluster.

I will agree wholeheartedly, though, that my personal desire would be to
have just stellar enrichments, though.  You can do every one of the other
enrichments in Stellar and it would greatly simplify that config above.



On Tue, May 16, 2017 at 11:59 AM, Nick Allen  wrote:

> I would like to see us just migrate wholly to Stellar enrichments and
> remove the separate HBase and Geo enrichment bolts from the Enrichment
> topology.  Stellar provides a user with much greater flexibility than the
> existing HBase and Geo enrichment bolts.
>
> A side effect of this would be to greatly simplify the Enrichment
> topology.  I don't think we would not need the split/join pattern if we did
> this. No?
>
> On Tue, May 16, 2017 at 11:54 AM, Casey Stella  wrote:
>
> > The problem is that an enrichment type won't necessarily have a fixed
> > performance characteristic.  Take stellar enrichments, for instance.
> Doing
> > a HBase call for one sensor vs doing simple string munging will have
> vastly
> > differing performance.  Both of them are functioning within the stellar
> > enrichment bolt.  Also, some enrichments may call for multiple calls to
> > HBase.  Parallelizing those, would make some sense, I think.
> >
> > I do take your point, though, that it's not as though it's strictly
> serial,
> > it's just that the unit of parallelism is the message, rather than the
> > enrichment per message.
> >
> > On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz  >
> > wrote:
> >
> > > I’m glad you bring this up. This is a huge architectural difference
> from
> > > the original OpenSOC topology and one that we have been warned to take
> > back
> > > then.
> > > To be perfectly honest, I don’t see the big perfomance improvement from
> > > parallel processing. If a specific enrichment is a little more i/o
> > > dependent than the other you can tweak parallelism to address this.
> Also
> > > there can be dependencies that make parallel enrichment virtually
> > > impossible or at least less efficient (i.e. first labeling, and
> > > “completing” a message and then dependent of label and completeness do
> > > different other enrichments).
> > >
> > > So you have a +1 from me for serial rather than parallel enrichment.
> > >
> > >
> > > BR,
> > >Christian
> > >
> > > On 16.05.17, 16:58, "Casey Stella"  wrote:
> > >
> > > Hi All,
> > >
> > > Last week, I encountered some weirdness in the Enrichment topology.
> > > Doing
> > > some somewhat high-latency enrichment work, I noticed that at some
> > > point,
> > > data stopped flowing through the enrichment topology.  I tracked
> down
> > > the
> > > problem to the join bolt.  For those who aren't aware, we do a
> > > split/join
> > > pattern so that enrichments can be done in parallel.  It works as
> > > follows:
> > >
> > >- A split bolt sends the appropriate subset of the message to
> each
> > >enrichment bolt as well as the whole message to the join bolt
> > >- The join bolt will receive each of the pieces of the message
> and
> > > then,
> > >when fully joined, it will send the message on.
> > >
> > >
> > > What is happening under load or high velocity, however, is that the
> > > cache
> > > is evicting the partially joined message before it can be fully
> > joined
> > > due
> > > to the volume of traffic.  This is obviously not ideal.  As such,
> it
> > is
> > > clear that adjusting the size of the cache and the characteristics
> of
> > > eviction is likely a good idea and a necessary part to tuning
> > > enrichments.
> > > The cache size is sensitive to:
> > >
> > >- The latency of the *slowest* enrichment
> > >- The number of tuples in flight at once
> > >
> > > As such, the knobs you have to tune are either the parallelism of
> the
> > > join
> > > bolt or the size of the cache.
> > >
> > > As it stands, I see a couple of things wrong here that we can
> correct
> > > with
> > > minimal issue:
> > >
> > >- We have no message of warning indicating that this is
> happening
> > >- Changing cache sizes means changing flux.  We should promote
> 

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
I would like to see us just migrate wholly to Stellar enrichments and
remove the separate HBase and Geo enrichment bolts from the Enrichment
topology.  Stellar provides a user with much greater flexibility than the
existing HBase and Geo enrichment bolts.

A side effect of this would be to greatly simplify the Enrichment
topology.  I don't think we would not need the split/join pattern if we did
this. No?

On Tue, May 16, 2017 at 11:54 AM, Casey Stella  wrote:

> The problem is that an enrichment type won't necessarily have a fixed
> performance characteristic.  Take stellar enrichments, for instance.  Doing
> a HBase call for one sensor vs doing simple string munging will have vastly
> differing performance.  Both of them are functioning within the stellar
> enrichment bolt.  Also, some enrichments may call for multiple calls to
> HBase.  Parallelizing those, would make some sense, I think.
>
> I do take your point, though, that it's not as though it's strictly serial,
> it's just that the unit of parallelism is the message, rather than the
> enrichment per message.
>
> On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz 
> wrote:
>
> > I’m glad you bring this up. This is a huge architectural difference from
> > the original OpenSOC topology and one that we have been warned to take
> back
> > then.
> > To be perfectly honest, I don’t see the big perfomance improvement from
> > parallel processing. If a specific enrichment is a little more i/o
> > dependent than the other you can tweak parallelism to address this. Also
> > there can be dependencies that make parallel enrichment virtually
> > impossible or at least less efficient (i.e. first labeling, and
> > “completing” a message and then dependent of label and completeness do
> > different other enrichments).
> >
> > So you have a +1 from me for serial rather than parallel enrichment.
> >
> >
> > BR,
> >Christian
> >
> > On 16.05.17, 16:58, "Casey Stella"  wrote:
> >
> > Hi All,
> >
> > Last week, I encountered some weirdness in the Enrichment topology.
> > Doing
> > some somewhat high-latency enrichment work, I noticed that at some
> > point,
> > data stopped flowing through the enrichment topology.  I tracked down
> > the
> > problem to the join bolt.  For those who aren't aware, we do a
> > split/join
> > pattern so that enrichments can be done in parallel.  It works as
> > follows:
> >
> >- A split bolt sends the appropriate subset of the message to each
> >enrichment bolt as well as the whole message to the join bolt
> >- The join bolt will receive each of the pieces of the message and
> > then,
> >when fully joined, it will send the message on.
> >
> >
> > What is happening under load or high velocity, however, is that the
> > cache
> > is evicting the partially joined message before it can be fully
> joined
> > due
> > to the volume of traffic.  This is obviously not ideal.  As such, it
> is
> > clear that adjusting the size of the cache and the characteristics of
> > eviction is likely a good idea and a necessary part to tuning
> > enrichments.
> > The cache size is sensitive to:
> >
> >- The latency of the *slowest* enrichment
> >- The number of tuples in flight at once
> >
> > As such, the knobs you have to tune are either the parallelism of the
> > join
> > bolt or the size of the cache.
> >
> > As it stands, I see a couple of things wrong here that we can correct
> > with
> > minimal issue:
> >
> >- We have no message of warning indicating that this is happening
> >- Changing cache sizes means changing flux.  We should promote
> this
> > to
> >the properties file.
> >- We should document the knobs mentioned above clearly in the
> > enrichment
> >topology README
> >
> > Those small changes, I think, are table stakes, but what I wanted to
> > discuss more in depth is the lingering questions:
> >
> >- Is this an architectural pattern that we can use as-is?
> >   - Should we consider a persistent cache a la HBase or Apache
> > Ignite
> >   as a pluggable component to Metron?
> >   - Should we consider taking the performance hit and doing the
> >   enrichments serially?
> >- When an eviction happens, what should we do?
> >   - Fail the tuple, thereby making congestion worse
> >   - Pass through the partially enriched results, thereby making
> >   enrichments "best effort"
> >
> > Anyway, I wanted to talk this through and inform of some of the
> things
> > I'm
> > seeing.
> >
> > Sorry for the novel. ;)
> >
> > Casey
> >
> >
> >
>


Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
The problem is that an enrichment type won't necessarily have a fixed
performance characteristic.  Take stellar enrichments, for instance.  Doing
a HBase call for one sensor vs doing simple string munging will have vastly
differing performance.  Both of them are functioning within the stellar
enrichment bolt.  Also, some enrichments may call for multiple calls to
HBase.  Parallelizing those, would make some sense, I think.

I do take your point, though, that it's not as though it's strictly serial,
it's just that the unit of parallelism is the message, rather than the
enrichment per message.

On Tue, May 16, 2017 at 11:47 AM, Christian Tramnitz 
wrote:

> I’m glad you bring this up. This is a huge architectural difference from
> the original OpenSOC topology and one that we have been warned to take back
> then.
> To be perfectly honest, I don’t see the big perfomance improvement from
> parallel processing. If a specific enrichment is a little more i/o
> dependent than the other you can tweak parallelism to address this. Also
> there can be dependencies that make parallel enrichment virtually
> impossible or at least less efficient (i.e. first labeling, and
> “completing” a message and then dependent of label and completeness do
> different other enrichments).
>
> So you have a +1 from me for serial rather than parallel enrichment.
>
>
> BR,
>Christian
>
> On 16.05.17, 16:58, "Casey Stella"  wrote:
>
> Hi All,
>
> Last week, I encountered some weirdness in the Enrichment topology.
> Doing
> some somewhat high-latency enrichment work, I noticed that at some
> point,
> data stopped flowing through the enrichment topology.  I tracked down
> the
> problem to the join bolt.  For those who aren't aware, we do a
> split/join
> pattern so that enrichments can be done in parallel.  It works as
> follows:
>
>- A split bolt sends the appropriate subset of the message to each
>enrichment bolt as well as the whole message to the join bolt
>- The join bolt will receive each of the pieces of the message and
> then,
>when fully joined, it will send the message on.
>
>
> What is happening under load or high velocity, however, is that the
> cache
> is evicting the partially joined message before it can be fully joined
> due
> to the volume of traffic.  This is obviously not ideal.  As such, it is
> clear that adjusting the size of the cache and the characteristics of
> eviction is likely a good idea and a necessary part to tuning
> enrichments.
> The cache size is sensitive to:
>
>- The latency of the *slowest* enrichment
>- The number of tuples in flight at once
>
> As such, the knobs you have to tune are either the parallelism of the
> join
> bolt or the size of the cache.
>
> As it stands, I see a couple of things wrong here that we can correct
> with
> minimal issue:
>
>- We have no message of warning indicating that this is happening
>- Changing cache sizes means changing flux.  We should promote this
> to
>the properties file.
>- We should document the knobs mentioned above clearly in the
> enrichment
>topology README
>
> Those small changes, I think, are table stakes, but what I wanted to
> discuss more in depth is the lingering questions:
>
>- Is this an architectural pattern that we can use as-is?
>   - Should we consider a persistent cache a la HBase or Apache
> Ignite
>   as a pluggable component to Metron?
>   - Should we consider taking the performance hit and doing the
>   enrichments serially?
>- When an eviction happens, what should we do?
>   - Fail the tuple, thereby making congestion worse
>   - Pass through the partially enriched results, thereby making
>   enrichments "best effort"
>
> Anyway, I wanted to talk this through and inform of some of the things
> I'm
> seeing.
>
> Sorry for the novel. ;)
>
> Casey
>
>
>


Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Christian Tramnitz
I’m glad you bring this up. This is a huge architectural difference from the 
original OpenSOC topology and one that we have been warned to take back then.
To be perfectly honest, I don’t see the big perfomance improvement from 
parallel processing. If a specific enrichment is a little more i/o dependent 
than the other you can tweak parallelism to address this. Also there can be 
dependencies that make parallel enrichment virtually impossible or at least 
less efficient (i.e. first labeling, and “completing” a message and then 
dependent of label and completeness do different other enrichments).

So you have a +1 from me for serial rather than parallel enrichment.


BR,
   Christian

On 16.05.17, 16:58, "Casey Stella"  wrote:

Hi All,

Last week, I encountered some weirdness in the Enrichment topology.  Doing
some somewhat high-latency enrichment work, I noticed that at some point,
data stopped flowing through the enrichment topology.  I tracked down the
problem to the join bolt.  For those who aren't aware, we do a split/join
pattern so that enrichments can be done in parallel.  It works as follows:

   - A split bolt sends the appropriate subset of the message to each
   enrichment bolt as well as the whole message to the join bolt
   - The join bolt will receive each of the pieces of the message and then,
   when fully joined, it will send the message on.


What is happening under load or high velocity, however, is that the cache
is evicting the partially joined message before it can be fully joined due
to the volume of traffic.  This is obviously not ideal.  As such, it is
clear that adjusting the size of the cache and the characteristics of
eviction is likely a good idea and a necessary part to tuning enrichments.
The cache size is sensitive to:

   - The latency of the *slowest* enrichment
   - The number of tuples in flight at once

As such, the knobs you have to tune are either the parallelism of the join
bolt or the size of the cache.

As it stands, I see a couple of things wrong here that we can correct with
minimal issue:

   - We have no message of warning indicating that this is happening
   - Changing cache sizes means changing flux.  We should promote this to
   the properties file.
   - We should document the knobs mentioned above clearly in the enrichment
   topology README

Those small changes, I think, are table stakes, but what I wanted to
discuss more in depth is the lingering questions:

   - Is this an architectural pattern that we can use as-is?
  - Should we consider a persistent cache a la HBase or Apache Ignite
  as a pluggable component to Metron?
  - Should we consider taking the performance hit and doing the
  enrichments serially?
   - When an eviction happens, what should we do?
  - Fail the tuple, thereby making congestion worse
  - Pass through the partially enriched results, thereby making
  enrichments "best effort"

Anyway, I wanted to talk this through and inform of some of the things I'm
seeing.

Sorry for the novel. ;)

Casey




[DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
Hi All,

Last week, I encountered some weirdness in the Enrichment topology.  Doing
some somewhat high-latency enrichment work, I noticed that at some point,
data stopped flowing through the enrichment topology.  I tracked down the
problem to the join bolt.  For those who aren't aware, we do a split/join
pattern so that enrichments can be done in parallel.  It works as follows:

   - A split bolt sends the appropriate subset of the message to each
   enrichment bolt as well as the whole message to the join bolt
   - The join bolt will receive each of the pieces of the message and then,
   when fully joined, it will send the message on.


What is happening under load or high velocity, however, is that the cache
is evicting the partially joined message before it can be fully joined due
to the volume of traffic.  This is obviously not ideal.  As such, it is
clear that adjusting the size of the cache and the characteristics of
eviction is likely a good idea and a necessary part to tuning enrichments.
The cache size is sensitive to:

   - The latency of the *slowest* enrichment
   - The number of tuples in flight at once

As such, the knobs you have to tune are either the parallelism of the join
bolt or the size of the cache.

As it stands, I see a couple of things wrong here that we can correct with
minimal issue:

   - We have no message of warning indicating that this is happening
   - Changing cache sizes means changing flux.  We should promote this to
   the properties file.
   - We should document the knobs mentioned above clearly in the enrichment
   topology README

Those small changes, I think, are table stakes, but what I wanted to
discuss more in depth is the lingering questions:

   - Is this an architectural pattern that we can use as-is?
  - Should we consider a persistent cache a la HBase or Apache Ignite
  as a pluggable component to Metron?
  - Should we consider taking the performance hit and doing the
  enrichments serially?
   - When an eviction happens, what should we do?
  - Fail the tuple, thereby making congestion worse
  - Pass through the partially enriched results, thereby making
  enrichments "best effort"

Anyway, I wanted to talk this through and inform of some of the things I'm
seeing.

Sorry for the novel. ;)

Casey