Re: Metron nested object

2018-01-11 Thread Simon Elliston Ball
I’m all for adding extra stores, especially once we have separated indexing 
topologies.

Druid (and therefore a ui based on superset) seems an obvious logical store to 
me. That said, the scheme management starts to feel like it needs some thought 
once we have enough range of schema sensitive stores (though I guess Druid is 
no different from ES in that regard).

Simon 

> On 11 Jan 2018, at 20:34, Andre  wrote:
> 
> Simon,
> 
> With the risk of sounding like an heretic:
> 
> Is there any particular reason Metron still considers ES as the
> "default"[1] fast access data store?
> 
> Sometimes I wonder if we wouldn't be better off leveraging schema evolution
> friendly formats with UIs like SuperSets?
> 
> Probably not as fast as ES but at least it would be one less development
> front to handle.
> 
> Keen to hear your thoughts
> 
> 
> Cheers
> 
> 
> 
> [1] I appreciate the architecture is flexible...
> [-] Apologies for the delay but I suspect my previous message got stuck in
> moderation
> 
> On Fri, Dec 22, 2017 at 3:59 AM, Simon Elliston Ball <
> si...@simonellistonball.com> wrote:
> 
>> Correct, nested objects in lucene indexes lead to sub-documents, which
>> leads to a massive drop in ingest and query rates, this is why the JSONMap
>> parser for example deliberately flattens the Metorn JSON object. Before
>> this decision was made, very early versions of OpenSOC nested enrichments
>> for example, but performance became a challenge.
>> 
>> Simon
>> 
>> 
>>> On 21 Dec 2017, at 13:57, Ali Nazemian  wrote:
>>> 
>>> So Metron enrichment and indexer are not nested aware? Is there any plan
>> to
>>> add that to Metron in future?
>>> 
>>> Cheers,
>>> Ali
>>> 
>>> On Fri, Dec 22, 2017 at 12:46 AM, Otto Fowler 
>>> wrote:
>>> 
 I believe right now you have to flatten.
 The jsonMap parser does this.
 
 
 On December 21, 2017 at 08:28:13, Ali Nazemian (alinazem...@gmail.com)
 wrote:
 
 Hi all,
 
 
 We have recently faced some data sources that generate data in a nested
 format. For example, AWS Cloudtrail generates data in the following JSON
 format:
 
 {
 
 "Records": [
 
 {
 
 "eventVersion": *"2.0"*,
 
 "userIdentity": {
 
 "type": *"IAMUser"*,
 
 "principalId": *"EX_PRINCIPAL_ID"*,
 
 "arn": *"arn:aws:iam::123456789012:user/Alice"*,
 
 "accessKeyId": *"EXAMPLE_KEY_ID"*,
 
 "accountId": *"123456789012"*,
 
 "userName": *"Alice"*
 
 },
 
 "eventTime": *"2014-03-07T21:22:54Z"*,
 
 "eventSource": *"ec2.amazonaws.com "*,
 
 "eventName": *"StartInstances"*,
 
 "awsRegion": *"us-east-2"*,
 
 "sourceIPAddress": *"205.251.233.176"*,
 
 "userAgent": *"ec2-api-tools 1.6.12.2"*,
 
 "requestParameters": {
 
 "instancesSet": {
 
 "items": [
 
 {
 
 "instanceId": *"i-ebeaf9e2"*
 
 }
 
 ]
 
 }
 
 },
 
 "responseElements": {
 
 "instancesSet": {
 
 "items": [
 
 {
 
 "instanceId": *"i-ebeaf9e2"*,
 
 "currentState": {
 
 "code": 0,
 
 "name": *"pending"*
 
 },
 
 "previousState": {
 
 "code": 80,
 
 "name": *"stopped"*
 
 }
 
 }
 
 ]
 
 }
 
 }
 
 }
 
 ]
 
 }
 
 
 We are able to make this as a flat JSON file. However, a nested object
>> is
 supported by data backends in Metron (ES, ORC, etc.), so I was wondering
 whether with the current version of Metron we are able to index nested
 documents or we have to make it flat?
 
 
 
 Cheers,
 
 Ali
 
 
>>> 
>>> 
>>> --
>>> A.Nazemian
>> 
>> 


Re: Metron nested object

2018-01-11 Thread Andre
Simon,

With the risk of sounding like an heretic:

Is there any particular reason Metron still considers ES as the
"default"[1] fast access data store?

Sometimes I wonder if we wouldn't be better off leveraging schema evolution
friendly formats with UIs like SuperSets?

Probably not as fast as ES but at least it would be one less development
front to handle.

Keen to hear your thoughts


Cheers



[1] I appreciate the architecture is flexible...
[-] Apologies for the delay but I suspect my previous message got stuck in
moderation

On Fri, Dec 22, 2017 at 3:59 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:

> Correct, nested objects in lucene indexes lead to sub-documents, which
> leads to a massive drop in ingest and query rates, this is why the JSONMap
> parser for example deliberately flattens the Metorn JSON object. Before
> this decision was made, very early versions of OpenSOC nested enrichments
> for example, but performance became a challenge.
>
> Simon
>
>
> > On 21 Dec 2017, at 13:57, Ali Nazemian  wrote:
> >
> > So Metron enrichment and indexer are not nested aware? Is there any plan
> to
> > add that to Metron in future?
> >
> > Cheers,
> > Ali
> >
> > On Fri, Dec 22, 2017 at 12:46 AM, Otto Fowler 
> > wrote:
> >
> >> I believe right now you have to flatten.
> >> The jsonMap parser does this.
> >>
> >>
> >> On December 21, 2017 at 08:28:13, Ali Nazemian (alinazem...@gmail.com)
> >> wrote:
> >>
> >> Hi all,
> >>
> >>
> >> We have recently faced some data sources that generate data in a nested
> >> format. For example, AWS Cloudtrail generates data in the following JSON
> >> format:
> >>
> >> {
> >>
> >> "Records": [
> >>
> >> {
> >>
> >> "eventVersion": *"2.0"*,
> >>
> >> "userIdentity": {
> >>
> >> "type": *"IAMUser"*,
> >>
> >> "principalId": *"EX_PRINCIPAL_ID"*,
> >>
> >> "arn": *"arn:aws:iam::123456789012:user/Alice"*,
> >>
> >> "accessKeyId": *"EXAMPLE_KEY_ID"*,
> >>
> >> "accountId": *"123456789012"*,
> >>
> >> "userName": *"Alice"*
> >>
> >> },
> >>
> >> "eventTime": *"2014-03-07T21:22:54Z"*,
> >>
> >> "eventSource": *"ec2.amazonaws.com "*,
> >>
> >> "eventName": *"StartInstances"*,
> >>
> >> "awsRegion": *"us-east-2"*,
> >>
> >> "sourceIPAddress": *"205.251.233.176"*,
> >>
> >> "userAgent": *"ec2-api-tools 1.6.12.2"*,
> >>
> >> "requestParameters": {
> >>
> >> "instancesSet": {
> >>
> >> "items": [
> >>
> >> {
> >>
> >> "instanceId": *"i-ebeaf9e2"*
> >>
> >> }
> >>
> >> ]
> >>
> >> }
> >>
> >> },
> >>
> >> "responseElements": {
> >>
> >> "instancesSet": {
> >>
> >> "items": [
> >>
> >> {
> >>
> >> "instanceId": *"i-ebeaf9e2"*,
> >>
> >> "currentState": {
> >>
> >> "code": 0,
> >>
> >> "name": *"pending"*
> >>
> >> },
> >>
> >> "previousState": {
> >>
> >> "code": 80,
> >>
> >> "name": *"stopped"*
> >>
> >> }
> >>
> >> }
> >>
> >> ]
> >>
> >> }
> >>
> >> }
> >>
> >> }
> >>
> >> ]
> >>
> >> }
> >>
> >>
> >> We are able to make this as a flat JSON file. However, a nested object
> is
> >> supported by data backends in Metron (ES, ORC, etc.), so I was wondering
> >> whether with the current version of Metron we are able to index nested
> >> documents or we have to make it flat?
> >>
> >>
> >>
> >> Cheers,
> >>
> >> Ali
> >>
> >>
> >
> >
> > --
> > A.Nazemian
>
>


Re: Metron nested object

2017-12-21 Thread Simon Elliston Ball
Correct, nested objects in lucene indexes lead to sub-documents, which leads to 
a massive drop in ingest and query rates, this is why the JSONMap parser for 
example deliberately flattens the Metorn JSON object. Before this decision was 
made, very early versions of OpenSOC nested enrichments for example, but 
performance became a challenge. 

Simon


> On 21 Dec 2017, at 13:57, Ali Nazemian  wrote:
> 
> So Metron enrichment and indexer are not nested aware? Is there any plan to
> add that to Metron in future?
> 
> Cheers,
> Ali
> 
> On Fri, Dec 22, 2017 at 12:46 AM, Otto Fowler 
> wrote:
> 
>> I believe right now you have to flatten.
>> The jsonMap parser does this.
>> 
>> 
>> On December 21, 2017 at 08:28:13, Ali Nazemian (alinazem...@gmail.com)
>> wrote:
>> 
>> Hi all,
>> 
>> 
>> We have recently faced some data sources that generate data in a nested
>> format. For example, AWS Cloudtrail generates data in the following JSON
>> format:
>> 
>> {
>> 
>> "Records": [
>> 
>> {
>> 
>> "eventVersion": *"2.0"*,
>> 
>> "userIdentity": {
>> 
>> "type": *"IAMUser"*,
>> 
>> "principalId": *"EX_PRINCIPAL_ID"*,
>> 
>> "arn": *"arn:aws:iam::123456789012:user/Alice"*,
>> 
>> "accessKeyId": *"EXAMPLE_KEY_ID"*,
>> 
>> "accountId": *"123456789012"*,
>> 
>> "userName": *"Alice"*
>> 
>> },
>> 
>> "eventTime": *"2014-03-07T21:22:54Z"*,
>> 
>> "eventSource": *"ec2.amazonaws.com "*,
>> 
>> "eventName": *"StartInstances"*,
>> 
>> "awsRegion": *"us-east-2"*,
>> 
>> "sourceIPAddress": *"205.251.233.176"*,
>> 
>> "userAgent": *"ec2-api-tools 1.6.12.2"*,
>> 
>> "requestParameters": {
>> 
>> "instancesSet": {
>> 
>> "items": [
>> 
>> {
>> 
>> "instanceId": *"i-ebeaf9e2"*
>> 
>> }
>> 
>> ]
>> 
>> }
>> 
>> },
>> 
>> "responseElements": {
>> 
>> "instancesSet": {
>> 
>> "items": [
>> 
>> {
>> 
>> "instanceId": *"i-ebeaf9e2"*,
>> 
>> "currentState": {
>> 
>> "code": 0,
>> 
>> "name": *"pending"*
>> 
>> },
>> 
>> "previousState": {
>> 
>> "code": 80,
>> 
>> "name": *"stopped"*
>> 
>> }
>> 
>> }
>> 
>> ]
>> 
>> }
>> 
>> }
>> 
>> }
>> 
>> ]
>> 
>> }
>> 
>> 
>> We are able to make this as a flat JSON file. However, a nested object is
>> supported by data backends in Metron (ES, ORC, etc.), so I was wondering
>> whether with the current version of Metron we are able to index nested
>> documents or we have to make it flat?
>> 
>> 
>> 
>> Cheers,
>> 
>> Ali
>> 
>> 
> 
> 
> -- 
> A.Nazemian



Re: Metron nested object

2017-12-21 Thread Ali Nazemian
So Metron enrichment and indexer are not nested aware? Is there any plan to
add that to Metron in future?

Cheers,
Ali

On Fri, Dec 22, 2017 at 12:46 AM, Otto Fowler 
wrote:

> I believe right now you have to flatten.
> The jsonMap parser does this.
>
>
> On December 21, 2017 at 08:28:13, Ali Nazemian (alinazem...@gmail.com)
> wrote:
>
> Hi all,
>
>
> We have recently faced some data sources that generate data in a nested
> format. For example, AWS Cloudtrail generates data in the following JSON
> format:
>
> {
>
> "Records": [
>
> {
>
> "eventVersion": *"2.0"*,
>
> "userIdentity": {
>
> "type": *"IAMUser"*,
>
> "principalId": *"EX_PRINCIPAL_ID"*,
>
> "arn": *"arn:aws:iam::123456789012:user/Alice"*,
>
> "accessKeyId": *"EXAMPLE_KEY_ID"*,
>
> "accountId": *"123456789012"*,
>
> "userName": *"Alice"*
>
> },
>
> "eventTime": *"2014-03-07T21:22:54Z"*,
>
> "eventSource": *"ec2.amazonaws.com "*,
>
> "eventName": *"StartInstances"*,
>
> "awsRegion": *"us-east-2"*,
>
> "sourceIPAddress": *"205.251.233.176"*,
>
> "userAgent": *"ec2-api-tools 1.6.12.2"*,
>
> "requestParameters": {
>
> "instancesSet": {
>
> "items": [
>
> {
>
> "instanceId": *"i-ebeaf9e2"*
>
> }
>
> ]
>
> }
>
> },
>
> "responseElements": {
>
> "instancesSet": {
>
> "items": [
>
> {
>
> "instanceId": *"i-ebeaf9e2"*,
>
> "currentState": {
>
> "code": 0,
>
> "name": *"pending"*
>
> },
>
> "previousState": {
>
> "code": 80,
>
> "name": *"stopped"*
>
> }
>
> }
>
> ]
>
> }
>
> }
>
> }
>
> ]
>
> }
>
>
> We are able to make this as a flat JSON file. However, a nested object is
> supported by data backends in Metron (ES, ORC, etc.), so I was wondering
> whether with the current version of Metron we are able to index nested
> documents or we have to make it flat?
>
>
>
> Cheers,
>
> Ali
>
>


-- 
A.Nazemian