Wait, are we sure that's the case? Generally speaking, messages coming
into the parser which contains the envelope strategy has a _source field
which is a string, which this isn't (it's JSON).
For instance, the format expected is:
{
"_index": "indexing",
"_type": "Event",
"_id": "AWAkTAefYn0uCUpkHmCy",
"_score": 1,
"_source": "{*\*"dst\": \"127.0.0.1\",\"devTimeEpoch\":
"1512437340000",\"dstPort\":
\"0\",\"srcPort\": \"80\",\"src\": \"194.51.198.185\"}"
}
I think that Simon is right, if we want to support this, we might consider
a different envelope strategy that takes parsed objects. I think you can
just, in the *routing* parser, set the "_source" field as "original_string"
via a stellar field transformation.
On Thu, Apr 25, 2019 at 12:05 PM Nick Allen <[email protected]> wrote:
> > Otto: I’m not sure this is an enveloped issue, or a new feature for the
> json map parser
>
> This is not an issue with JSONMapParser. This is an issue with the
> "enveloping" mechanism, prior to when the JSONMapParser gets the message.
>
> The entire message has been parsed as a JSON object including the value of
> the "_source" field. Since the "_source" field itself contains valid JSON,
> the parser transformed it into a Map, rather than the String that it
> expects.
>
> In my opinion, the ENVELOPE strategy needs to not parse the contents of
> that "_source" field. The ENVELOPE strategy should work for JSON and
> non-JSON content alike.
>
>
> On Thu, Apr 25, 2019 at 11:31 AM Otto Fowler <[email protected]>
> wrote:
>
>> I’m not sure about the name, I’m more thinking about the case.
>> I’m not sure this is an enveloped issue, or a new feature for the json
>> map parser ( or if you could do it with the jsonMap parser and JSONPath )
>>
>>
>>
>> On April 25, 2019 at 11:23:25, Simon Elliston Ball (
>> [email protected]) wrote:
>>
>> Seems like this would a good additional strategy, something like
>> ENVELOPE_PARSED? Any thoughts on a good name?
>>
>> On Thu, 25 Apr 2019 at 16:20, Otto Fowler <[email protected]>
>> wrote:
>>
>>> So, the enveloped message doesn’t support getting an already parsed
>>> json object from the enveloped json, we would have to do some work to
>>> support this, Even if we _could_ wrangle it in there now, from what I can
>>> see we would still have to serialize to bytes to pass to the actual parser
>>> and that would be inefficient.
>>> Can you open a jira with the information you provided?
>>>
>>>
>>>
>>> On April 25, 2019 at 11:12:38, Otto Fowler ([email protected])
>>> wrote:
>>>
>>> Raw message in this case assumes that the raw message is a String
>>> embedded in the json field that you supply, not a nested json object, so it
>>> is looking for
>>>
>>>
>>> “_source” : “some other embedded string of some format like syslog in
>>> json”
>>>
>>> There are other message strategies, but I’m not sure they would work in
>>> this instance. I’ll keep looking. hopefully someone more familiar will
>>> jump in.
>>>
>>>
>>> On April 25, 2019 at 10:48:06, [email protected] (
>>> [email protected]) wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> I’m trying to load some JSON data which has the following structure
>>> (this is a sample):
>>>
>>>
>>>
>>> {
>>>
>>> "_index": "indexing",
>>>
>>> "_type": "Event",
>>>
>>> "_id": "AWAkTAefYn0uCUpkHmCy",
>>>
>>> "_score": 1,
>>>
>>> "_source": {
>>>
>>> "dst": "127.0.0.1",
>>>
>>> "devTimeEpoch": "1512437340000",
>>>
>>> "dstPort": "0",
>>>
>>> "srcPort": "80",
>>>
>>> "src": "194.51.198.185"
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> In my file, everything is on the same line. My parser config is the
>>> following:
>>>
>>>
>>>
>>> {
>>>
>>> "parserClassName": "org.apache.metron.parsers.json.JSONMapParser",
>>>
>>> "filterClassName": null,
>>>
>>> "sensorTopic": "my_topic",
>>>
>>> "outputTopic": null,
>>>
>>> "errorTopic": null,
>>>
>>> "writerClassName": null,
>>>
>>> "errorWriterClassName": null,
>>>
>>> "readMetadata": true,
>>>
>>> "mergeMetadata": true,
>>>
>>> "numWorkers": 2,
>>>
>>> "numAckers": null,
>>>
>>> "spoutParallelism": 1,
>>>
>>> "spoutNumTasks": 1,
>>>
>>> "parserParallelism": 2,
>>>
>>> "parserNumTasks": 2,
>>>
>>> "errorWriterParallelism": 1,
>>>
>>> "errorWriterNumTasks": 1,
>>>
>>> "spoutConfig": {},
>>>
>>> "securityProtocol": null,
>>>
>>> "stormConfig": {},
>>>
>>> "parserConfig": {
>>>
>>> },
>>>
>>> "fieldTransformations": [
>>>
>>> {
>>>
>>> "transformation":"RENAME",
>>>
>>> "config": {
>>>
>>> "dst": "ip_dst_addr",
>>>
>>> "src": "ip_src_addr",
>>>
>>> "srcPort": "ip_src_port",
>>>
>>> "dstPort": "ip_dst_port",
>>>
>>> "devTimeEpoch": "timestamp"
>>>
>>> }
>>>
>>> }
>>>
>>> ],
>>>
>>> "cacheConfig": {},
>>>
>>> "rawMessageStrategy": "ENVELOPE",
>>>
>>> "rawMessageStrategyConfig": {
>>>
>>> "messageField": "_source"
>>>
>>> }
>>>
>>> }
>>>
>>>
>>>
>>> But in Storm I get the following errors:
>>>
>>>
>>>
>>> 2019-04-25 16:45:22.225 o.a.s.d.executor Thread-5-parserBolt-executor[8
>>> 8] [ERROR]
>>>
>>> java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to
>>> java.lang.String
>>>
>>> at
>>> org.apache.metron.common.message.metadata.EnvelopedRawMessageStrategy.get(EnvelopedRawMessageStrategy.java:78)
>>> ~[stormjar.jar:?]
>>>
>>> at
>>> org.apache.metron.common.message.metadata.RawMessageStrategies.get(RawMessageStrategies.java:54)
>>> ~[stormjar.jar:?]
>>>
>>> at
>>> org.apache.metron.common.message.metadata.RawMessageUtil.getRawMessage(RawMessageUtil.java:55)
>>> ~[stormjar.jar:?]
>>>
>>> at
>>> org.apache.metron.parsers.bolt.ParserBolt.execute(ParserBolt.java:251)
>>> [stormjar.jar:?]
>>>
>>> at
>>> org.apache.storm.daemon.executor$fn__10195$tuple_action_fn__10197.invoke(executor.clj:735)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.daemon.executor$mk_task_receiver$fn__10114.invoke(executor.clj:466)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.disruptor$clojure_handler$reify__4137.onEvent(disruptor.clj:40)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.daemon.executor$fn__10195$fn__10208$fn__10263.invoke(executor.clj:855)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at
>>> org.apache.storm.util$async_loop$fn__1221.invoke(util.clj:484)
>>> [storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
>>>
>>> at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
>>>
>>> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
>>>
>>>
>>>
>>>
>>>
>>> How can I debug this?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Stéphane
>>>
>>> _________________________________________________________________________________________________________________________
>>>
>>> Ce message et ses pieces jointes peuvent contenir des informations
>>> confidentielles ou privilegiees et ne doivent donc
>>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
>>> ce message par erreur, veuillez le signaler
>>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>>> electroniques etant susceptibles d'alteration,
>>> Orange decline toute responsabilite si ce message a ete altere, deforme ou
>>> falsifie. Merci.
>>>
>>> This message and its attachments may contain confidential or privileged
>>> information that may be protected by law;
>>> they should not be distributed, used or copied without authorisation.
>>> If you have received this email in error, please notify the sender and
>>> delete this message and its attachments.
>>> As emails may be altered, Orange is not liable for messages that have been
>>> modified, changed or falsified.
>>> Thank you.
>>>
>>>
>>
>> --
>> --
>> simon elliston ball
>> @sireb
>>
>>