Re: [DISCUSS] Profiler Enhancement

2018-02-07 Thread zeo...@gmail.com
Scenario 2 is one that I'm specifically interested in, I have that exact
use case right now.  I can see Scenario 1 being useful in the future as
well.

I'm also interested in a conversation along the lines of what Otto brought
up (i.e. I would like to re-ingest data to redo parsing, enrichments, etc.)
but happy to keep that conversation separate or for the future.

Really just wanted to comment that this work effort has a huge +1 from me
and is something I've been following.

This work should interact nicely with METRON-1397
, because users may need to
ingest bulk data from time to time that is a result of a system export.

Jon

On Mon, Feb 5, 2018 at 9:38 AM Otto Fowler  wrote:

> I think that is fine,  we can use that and work out the UX to manage new or
> replace.  Maybe we can do Profile Compare down the line?
>
> On February 5, 2018 at 09:28:16, Nick Allen (n...@nickallen.org) wrote:
>
> > If we replay a set of data with a new version of a profile I think it
> will always have to be a new profile and not ‘replace’ the old one.
> Series1, Seriers2  etc?
>
> As part of this effort (unless there is a compelling reason) I wouldn't
> change that behavior.  The profile data is stored based on profile name +
> entity + timestamp (I'm glossing over some of the details, but that's
> effectively what happens).  If you change the definition of a profile, but
> the name does not change, then you would replace the existing profile
> data.  If you do not want to replace, then you should change the name of
> the profile.
>
> Now is this the best way to store the data?  I am not sure.  It is a
> complex discussion all by itself, but is something that I would rather
> handle as a separate effort.
>
>
>
> On Fri, Feb 2, 2018 at 5:42 PM, Otto Fowler 
> wrote:
>
> > You know, I am going to back this up.
> > I usually thing of replay as replay, profiler or not, but that is not
> true.
> > Replay of data through the full pipeline (parsers/enrichement) has more
> > consequences or concerns, so we can drop this.
> > I don’t want to expand the scope of your idea.  We can reuse/refactor to
> > the other case (parser + enrichment) later.
> > Sorry.
> >
> >
> > ——
> >
> > So, about re-writing.
> > If we replay a set of data with a new version of a profile I think it
> will
> > always have to be a new profile and not ‘replace’
> > the old one.   Series1, Seriers2  etc?
> >
> >
> >
> >
> > On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote:
> >
> > I think that is definitely a reasonable extension.
> >
> > In this case would we need any additional actions to indicate that data
> > will be overwritten?
> >
> > I am trying to think of other additional needs that this use case has
> over
> > the others.
> >
> > On Feb 2, 2018 12:38 PM, "Otto Fowler"  wrote:
> >
> >> Scenario 3:
> >> As a Security ?  I have modified a profile or parser configuration (
> >> replay is replay ), and I want to run the new version
> >> against my old data.
> >>
> >>
> >>
> >> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
> >>
> >> I have been thinking about an enhancement to the Profiler for quite some
> >> time. Actually, my first pass at defining this was called "Replay
> >> Telemetry through Profiler" back in METRON-594 [1].
> >>
> >> I'd like to first discuss the use case to make sure we start out on the
> >> right foot. Here is how I would define the use cases for this
> >> functionality.
> >>
> >> *> Scenario 1: Model Development*
> >>
> >> As a Security Data Scientist, I want to understand the historical
> >> behaviors
> >> and trends of a profile that I have created so that I can understand if
> it
> >> is valuable for model building.
> >>
> >> There are two possible negative outcomes that the Security Data
> Scientist
> >> must be aware of when creating profiles.
> >>
> >>
> >> - The profile might have been defined incorrectly resulting in a feature
> >> set that does not match reality (a bug in the profile definition).
> >>
> >>
> >> - The profile might have been defined correctly, but the feature set
> >> itself has no predictive value.
> >>
> >> Analyzing the profile over archived, historical telemetry allows the
> >> Security Data Scientist to better to mitigate both of these negative
> >> outcomes.
> >>
> >>
> >> *> Scenario 2: Model Deployment*
> >>
> >> As a Security Platform Engineer, I want to generate a profile using
> >> archived telemetry when I deploy a new model to production so that
> models
> >> depending on that profile can begin to function on day 1.
> >>
> >>
> >>
> >> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
> >>
> >> Once we nail down the use case(s), I'll delete the old JIRA and create a
> >> new JIRA with the use cases. That would give us a place to start on the
> >> technical details of the implementation.
> >>
> >> [1] https://issues.apache.org/jira/browse/METRON-594
> >>
> >>
>
-- 

Jon


Re: [DISCUSS] Profiler Enhancement

2018-02-05 Thread Otto Fowler
I think that is fine,  we can use that and work out the UX to manage new or
replace.  Maybe we can do Profile Compare down the line?

On February 5, 2018 at 09:28:16, Nick Allen (n...@nickallen.org) wrote:

> If we replay a set of data with a new version of a profile I think it
will always have to be a new profile and not ‘replace’ the old one.
Series1, Seriers2  etc?

As part of this effort (unless there is a compelling reason) I wouldn't
change that behavior.  The profile data is stored based on profile name +
entity + timestamp (I'm glossing over some of the details, but that's
effectively what happens).  If you change the definition of a profile, but
the name does not change, then you would replace the existing profile
data.  If you do not want to replace, then you should change the name of
the profile.

Now is this the best way to store the data?  I am not sure.  It is a
complex discussion all by itself, but is something that I would rather
handle as a separate effort.



On Fri, Feb 2, 2018 at 5:42 PM, Otto Fowler  wrote:

> You know, I am going to back this up.
> I usually thing of replay as replay, profiler or not, but that is not true.
> Replay of data through the full pipeline (parsers/enrichement) has more
> consequences or concerns, so we can drop this.
> I don’t want to expand the scope of your idea.  We can reuse/refactor to
> the other case (parser + enrichment) later.
> Sorry.
>
>
> ——
>
> So, about re-writing.
> If we replay a set of data with a new version of a profile I think it will
> always have to be a new profile and not ‘replace’
> the old one.   Series1, Seriers2  etc?
>
>
>
>
> On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote:
>
> I think that is definitely a reasonable extension.
>
> In this case would we need any additional actions to indicate that data
> will be overwritten?
>
> I am trying to think of other additional needs that this use case has over
> the others.
>
> On Feb 2, 2018 12:38 PM, "Otto Fowler"  wrote:
>
>> Scenario 3:
>> As a Security ?  I have modified a profile or parser configuration (
>> replay is replay ), and I want to run the new version
>> against my old data.
>>
>>
>>
>> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
>>
>> I have been thinking about an enhancement to the Profiler for quite some
>> time. Actually, my first pass at defining this was called "Replay
>> Telemetry through Profiler" back in METRON-594 [1].
>>
>> I'd like to first discuss the use case to make sure we start out on the
>> right foot. Here is how I would define the use cases for this
>> functionality.
>>
>> *> Scenario 1: Model Development*
>>
>> As a Security Data Scientist, I want to understand the historical
>> behaviors
>> and trends of a profile that I have created so that I can understand if it
>> is valuable for model building.
>>
>> There are two possible negative outcomes that the Security Data Scientist
>> must be aware of when creating profiles.
>>
>>
>> - The profile might have been defined incorrectly resulting in a feature
>> set that does not match reality (a bug in the profile definition).
>>
>>
>> - The profile might have been defined correctly, but the feature set
>> itself has no predictive value.
>>
>> Analyzing the profile over archived, historical telemetry allows the
>> Security Data Scientist to better to mitigate both of these negative
>> outcomes.
>>
>>
>> *> Scenario 2: Model Deployment*
>>
>> As a Security Platform Engineer, I want to generate a profile using
>> archived telemetry when I deploy a new model to production so that models
>> depending on that profile can begin to function on day 1.
>>
>>
>>
>> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
>>
>> Once we nail down the use case(s), I'll delete the old JIRA and create a
>> new JIRA with the use cases. That would give us a place to start on the
>> technical details of the implementation.
>>
>> [1] https://issues.apache.org/jira/browse/METRON-594
>>
>>


Re: [DISCUSS] Profiler Enhancement

2018-02-05 Thread Nick Allen
> If we replay a set of data with a new version of a profile I think it
will always have to be a new profile and not ‘replace’ the old one.
Series1, Seriers2  etc?

As part of this effort (unless there is a compelling reason) I wouldn't
change that behavior.  The profile data is stored based on profile name +
entity + timestamp (I'm glossing over some of the details, but that's
effectively what happens).  If you change the definition of a profile, but
the name does not change, then you would replace the existing profile
data.  If you do not want to replace, then you should change the name of
the profile.

Now is this the best way to store the data?  I am not sure.  It is a
complex discussion all by itself, but is something that I would rather
handle as a separate effort.



On Fri, Feb 2, 2018 at 5:42 PM, Otto Fowler  wrote:

> You know, I am going to back this up.
> I usually thing of replay as replay, profiler or not, but that is not true.
> Replay of data through the full pipeline (parsers/enrichement) has more
> consequences or concerns, so we can drop this.
> I don’t want to expand the scope of your idea.  We can reuse/refactor to
> the other case (parser + enrichment) later.
> Sorry.
>
>
> ——
>
> So, about re-writing.
> If we replay a set of data with a new version of a profile I think it will
> always have to be a new profile and not ‘replace’
> the old one.   Series1, Seriers2  etc?
>
>
>
>
> On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote:
>
> I think that is definitely a reasonable extension.
>
> In this case would we need any additional actions to indicate that data
> will be overwritten?
>
> I am trying to think of other additional needs that this use case has over
> the others.
>
> On Feb 2, 2018 12:38 PM, "Otto Fowler"  wrote:
>
>> Scenario 3:
>> As a Security ?  I have modified a profile or parser configuration (
>> replay is replay ), and I want to run the new version
>> against my old data.
>>
>>
>>
>> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
>>
>> I have been thinking about an enhancement to the Profiler for quite some
>> time. Actually, my first pass at defining this was called "Replay
>> Telemetry through Profiler" back in METRON-594 [1].
>>
>> I'd like to first discuss the use case to make sure we start out on the
>> right foot. Here is how I would define the use cases for this
>> functionality.
>>
>> *> Scenario 1: Model Development*
>>
>> As a Security Data Scientist, I want to understand the historical
>> behaviors
>> and trends of a profile that I have created so that I can understand if it
>> is valuable for model building.
>>
>> There are two possible negative outcomes that the Security Data Scientist
>> must be aware of when creating profiles.
>>
>>
>> - The profile might have been defined incorrectly resulting in a feature
>> set that does not match reality (a bug in the profile definition).
>>
>>
>> - The profile might have been defined correctly, but the feature set
>> itself has no predictive value.
>>
>> Analyzing the profile over archived, historical telemetry allows the
>> Security Data Scientist to better to mitigate both of these negative
>> outcomes.
>>
>>
>> *> Scenario 2: Model Deployment*
>>
>> As a Security Platform Engineer, I want to generate a profile using
>> archived telemetry when I deploy a new model to production so that models
>> depending on that profile can begin to function on day 1.
>>
>>
>>
>> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
>>
>> Once we nail down the use case(s), I'll delete the old JIRA and create a
>> new JIRA with the use cases. That would give us a place to start on the
>> technical details of the implementation.
>>
>> [1] https://issues.apache.org/jira/browse/METRON-594
>>
>>


Re: [DISCUSS] Profiler Enhancement

2018-02-02 Thread Otto Fowler
You know, I am going to back this up.
I usually thing of replay as replay, profiler or not, but that is not true.
Replay of data through the full pipeline (parsers/enrichement) has more
consequences or concerns, so we can drop this.
I don’t want to expand the scope of your idea.  We can reuse/refactor to
the other case (parser + enrichment) later.
Sorry.


——

So, about re-writing.
If we replay a set of data with a new version of a profile I think it will
always have to be a new profile and not ‘replace’
the old one.   Series1, Seriers2  etc?




On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote:

I think that is definitely a reasonable extension.

In this case would we need any additional actions to indicate that data
will be overwritten?

I am trying to think of other additional needs that this use case has over
the others.

On Feb 2, 2018 12:38 PM, "Otto Fowler"  wrote:

> Scenario 3:
> As a Security ?  I have modified a profile or parser configuration (
> replay is replay ), and I want to run the new version
> against my old data.
>
>
>
> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
>
> I have been thinking about an enhancement to the Profiler for quite some
> time. Actually, my first pass at defining this was called "Replay
> Telemetry through Profiler" back in METRON-594 [1].
>
> I'd like to first discuss the use case to make sure we start out on the
> right foot. Here is how I would define the use cases for this
> functionality.
>
> *> Scenario 1: Model Development*
>
> As a Security Data Scientist, I want to understand the historical behaviors
> and trends of a profile that I have created so that I can understand if it
> is valuable for model building.
>
> There are two possible negative outcomes that the Security Data Scientist
> must be aware of when creating profiles.
>
>
> - The profile might have been defined incorrectly resulting in a feature
> set that does not match reality (a bug in the profile definition).
>
>
> - The profile might have been defined correctly, but the feature set
> itself has no predictive value.
>
> Analyzing the profile over archived, historical telemetry allows the
> Security Data Scientist to better to mitigate both of these negative
> outcomes.
>
>
> *> Scenario 2: Model Deployment*
>
> As a Security Platform Engineer, I want to generate a profile using
> archived telemetry when I deploy a new model to production so that models
> depending on that profile can begin to function on day 1.
>
>
>
> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
>
> Once we nail down the use case(s), I'll delete the old JIRA and create a
> new JIRA with the use cases. That would give us a place to start on the
> technical details of the implementation.
>
> [1] https://issues.apache.org/jira/browse/METRON-594
>
>


Re: [DISCUSS] Profiler Enhancement

2018-02-02 Thread Nick Allen
I think that is definitely a reasonable extension.

In this case would we need any additional actions to indicate that data
will be overwritten?

I am trying to think of other additional needs that this use case has over
the others.

On Feb 2, 2018 12:38 PM, "Otto Fowler"  wrote:

> Scenario 3:
> As a Security ?  I have modified a profile or parser configuration (
> replay is replay ), and I want to run the new version
> against my old data.
>
>
>
> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
>
> I have been thinking about an enhancement to the Profiler for quite some
> time. Actually, my first pass at defining this was called "Replay
> Telemetry through Profiler" back in METRON-594 [1].
>
> I'd like to first discuss the use case to make sure we start out on the
> right foot. Here is how I would define the use cases for this
> functionality.
>
> *> Scenario 1: Model Development*
>
> As a Security Data Scientist, I want to understand the historical
> behaviors
> and trends of a profile that I have created so that I can understand if it
> is valuable for model building.
>
> There are two possible negative outcomes that the Security Data Scientist
> must be aware of when creating profiles.
>
>
> - The profile might have been defined incorrectly resulting in a feature
> set that does not match reality (a bug in the profile definition).
>
>
> - The profile might have been defined correctly, but the feature set
> itself has no predictive value.
>
> Analyzing the profile over archived, historical telemetry allows the
> Security Data Scientist to better to mitigate both of these negative
> outcomes.
>
>
> *> Scenario 2: Model Deployment*
>
> As a Security Platform Engineer, I want to generate a profile using
> archived telemetry when I deploy a new model to production so that models
> depending on that profile can begin to function on day 1.
>
>
>
> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
>
> Once we nail down the use case(s), I'll delete the old JIRA and create a
> new JIRA with the use cases. That would give us a place to start on the
> technical details of the implementation.
>
> [1] https://issues.apache.org/jira/browse/METRON-594
>
>


Re: [DISCUSS] Profiler Enhancement

2018-02-02 Thread Otto Fowler
Scenario 3:
As a Security ?  I have modified a profile or parser configuration ( replay
is replay ), and I want to run the new version
against my old data.



On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:

I have been thinking about an enhancement to the Profiler for quite some
time. Actually, my first pass at defining this was called "Replay
Telemetry through Profiler" back in METRON-594 [1].

I'd like to first discuss the use case to make sure we start out on the
right foot. Here is how I would define the use cases for this
functionality.

*> Scenario 1: Model Development*

As a Security Data Scientist, I want to understand the historical behaviors
and trends of a profile that I have created so that I can understand if it
is valuable for model building.

There are two possible negative outcomes that the Security Data Scientist
must be aware of when creating profiles.


- The profile might have been defined incorrectly resulting in a feature
set that does not match reality (a bug in the profile definition).


- The profile might have been defined correctly, but the feature set
itself has no predictive value.

Analyzing the profile over archived, historical telemetry allows the
Security Data Scientist to better to mitigate both of these negative
outcomes.


*> Scenario 2: Model Deployment*

As a Security Platform Engineer, I want to generate a profile using
archived telemetry when I deploy a new model to production so that models
depending on that profile can begin to function on day 1.



(Q) Do these make sense? Am I missing anything? Too broad or too narrow?

Once we nail down the use case(s), I'll delete the old JIRA and create a
new JIRA with the use cases. That would give us a place to start on the
technical details of the implementation.

[1] https://issues.apache.org/jira/browse/METRON-594


[DISCUSS] Profiler Enhancement

2018-02-02 Thread Nick Allen
I have been thinking about an enhancement to the Profiler for quite some
time.  Actually, my first pass at defining this was called "Replay
Telemetry through Profiler" back in METRON-594 [1].

I'd like to first discuss the use case to make sure we start out on the
right foot.  Here is how I would define the use cases for this
functionality.

*> Scenario 1:  Model Development*

As a Security Data Scientist, I want to understand the historical behaviors
and trends of a profile that I have created so that I can understand if it
is valuable for model building.

There are two possible negative outcomes that the Security Data Scientist
must be aware of when creating profiles.


   - The profile might have been defined incorrectly resulting in a feature
  set that does not match reality (a bug in the profile definition).


   - The profile might have been defined correctly, but the feature set
  itself has no predictive value.

Analyzing the profile over archived, historical telemetry allows the
Security Data Scientist to better to mitigate both of these negative
outcomes.


*> Scenario 2:  Model Deployment*

As a  Security Platform Engineer, I want to generate a profile using
archived telemetry when I deploy a new model to production so that models
depending on that profile can begin to function on day 1.



(Q) Do these make sense?  Am I missing anything?  Too broad or too narrow?

Once we nail down the use case(s), I'll delete the old JIRA and create a
new JIRA with the use cases.  That would give us a place to start on the
technical details of the implementation.

[1] https://issues.apache.org/jira/browse/METRON-594