That is correct, nested Json structures introduced a high level of complexity for Solr. We did this work a while ago, but if I remember correctly we had to have child documents for nested pieces so for one ingest of a complex Json you could end up with multiple documents indexed. Template definition and management became more difficult as well and there were also some gotchas on the query side. For those reasons it became much simpler to just flatten everything.


04.04.2017, 05:46, "Nick Allen" <n...@nickallen.org>:
I have no knowledge of specifics.  I was not involved with the original problem.  I just know that I had to flatten the output from threat triage based on these concerns as part of PR #438 [1]. 



On Mon, Apr 3, 2017 at 8:11 PM, Ali Nazemian <alinazem...@gmail.com> wrote:
Thanks, Nick. 

Can you give me more information on what the problem with Solr indexing was at the first place? I've got some experience with Solr so I might be able to help to fix that situation.  

Regards,
Ali

On Mon, Apr 3, 2017 at 11:55 PM, Nick Allen <n...@nickallen.org> wrote:
Up to this point, we have been making the assumption that we need to "flatten" complex data types like lists and maps before they get indexed.  For example, a list like this...

   users: [ mary, alice, bob ] 

is flattened and ends up looking like this... 

  users.0: mary, 
  users.1: alice, 
  users.2: bob 
}

The goal of the JIRA that I referenced is to make each indexer responsible for transforming the message in whatever way necessary to correctly index the data.  This way enrichments and transformations that occur upstream don't have to worry about this.

I *think* the specific issue is that Solr indexing may not work with complex data types like lists and maps in some scenarios.  I *think* Elasticsearch indexing may be fine.  Others may have more insight, but this is what I remember. It is probably worth the effort to validate this in your environment and see if any problems arise.  It should be fairly simple to validate.





On Sun, Apr 2, 2017 at 10:50 PM, Ali Nazemian <alinazem...@gmail.com> wrote:
Thank you very much, Nick. I was not aware of the fact that Metron does not support the multi-value attribute. So, in this case, I need to have a Stellar function to deal with splitting data and mapping to enrichment CF. Is that correct?

Regards,
Ali

On Mon, Apr 3, 2017 at 6:31 AM, Nick Allen <n...@nickallen.org> wrote:
You could use the programmatic enrichment functions to do this.  For instance, say you wanted to look-up the impacted users in a company 'phonebook' to get more information.

'impacted-user-0": ENRICHMENT_GET(''phonebook", GET(user_ids, 0), "tb", "cf")
'impacted-user-1": ENRICHMENT_GET(''phonebook", GET(user_ids, 1), "tb", "cf")
"impacted-user-2": ENRICHMENT_GET(''phonebook", GET(user_ids, 2), "tb", "cf")
 

Also note that there is an open JIRA to ensure that all of the index destinations can handle complex types in the message JSON.  This may or may not impact your use case, but something to keep in mind.





On Sun, Apr 2, 2017 at 10:26 AM, Ali Nazemian <alinazem...@gmail.com> wrote:

Hi all,


I was wondering how I can achieve the following use case in the current version of Metron?

 

I want to have attributes in the Metron JSON object that are an array.  For example, if a threat is impacting multiple users, they are all contained in an attribute (e.g.  user_id:[id1, id2, id3]).   Now if I want to enrich the event with data that requires the user_id as a key in enrichment stored in HBASE, how would I do this?


Cheers,
Ali




--
A.Nazemian




--
A.Nazemian



------------------- 
Thank you,
 
James Sirota
PPMC- Apache Metron (Incubating)
jsirota AT apache DOT org

Reply via email to