Hello,
Actually, I want to keep only the _source part. The full story is that these
data are a dump from another Elasticsearch cluster. After reading this:
https://metron.apache.org/current-book/metron-platform/metron-parsers/ParserChaining.html,
I thought I could do the same with JSON. In this example, the BLOB is a CSV,
and the parser config is the following:
{
"parserClassName" : "org.apache.metron.parsers.csv.CSVParser"
,"sensorTopic" : "my_topic"
,"rawMessageStrategy" : "ENVELOPE"
,"rawMessageStrategyConfig" : {
"messageField" : "payload",
"metadataPrefix" : ""
}
, "parserConfig": {
"columns" : { "f1": 0,
, "f2": 1,
, "f3": 2
}
}
}
My understanding is that using “ENVELOPE”, the parser expects to have some high
level JSON, and a CSV in payload, this is why I wanted to do the same with
JSON. But as far as I understand, it doesn’t seem to work, does it?
Stéphane
From: Otto Fowler [mailto:[email protected]]
Sent: Thursday, April 25, 2019 17:34
To: [email protected]
Subject: Re: Issue when trying to load JSON
Also, our support for nested, unflattened json isn’t great to begin with.
Stephane, can you state your use case?
Do you want to get _source only to transform it? or do you want to use source
as the message and discard the top level fields? other?
On April 25, 2019 at 11:31:36, Otto Fowler
([email protected]<mailto:[email protected]>) wrote:
I’m not sure about the name, I’m more thinking about the case.
I’m not sure this is an enveloped issue, or a new feature for the json map
parser ( or if you could do it with the jsonMap parser and JSONPath )
On April 25, 2019 at 11:23:25, Simon Elliston Ball
([email protected]<mailto:[email protected]>) wrote:
Seems like this would a good additional strategy, something like
ENVELOPE_PARSED? Any thoughts on a good name?
On Thu, 25 Apr 2019 at 16:20, Otto Fowler
<[email protected]<mailto:[email protected]>> wrote:
So, the enveloped message doesn’t support getting an already parsed json
object from the enveloped json, we would have to do some work to support this,
Even if we _could_ wrangle it in there now, from what I can see we would still
have to serialize to bytes to pass to the actual parser and that would be
inefficient.
Can you open a jira with the information you provided?
On April 25, 2019 at 11:12:38, Otto Fowler
([email protected]<mailto:[email protected]>) wrote:
Raw message in this case assumes that the raw message is a String embedded in
the json field that you supply, not a nested json object, so it is looking for
“_source” : “some other embedded string of some format like syslog in json”
There are other message strategies, but I’m not sure they would work in this
instance. I’ll keep looking. hopefully someone more familiar will jump in.
On April 25, 2019 at 10:48:06,
[email protected]<mailto:[email protected]>
([email protected]<mailto:[email protected]>) wrote:
Hello,
I’m trying to load some JSON data which has the following structure (this is a
sample):
{
"_index": "indexing",
"_type": "Event",
"_id": "AWAkTAefYn0uCUpkHmCy",
"_score": 1,
"_source": {
"dst": "127.0.0.1",
"devTimeEpoch": "1512437340000",
"dstPort": "0",
"srcPort": "80",
"src": "194.51.198.185"
}
}
In my file, everything is on the same line. My parser config is the following:
{
"parserClassName": "org.apache.metron.parsers.json.JSONMapParser",
"filterClassName": null,
"sensorTopic": "my_topic",
"outputTopic": null,
"errorTopic": null,
"writerClassName": null,
"errorWriterClassName": null,
"readMetadata": true,
"mergeMetadata": true,
"numWorkers": 2,
"numAckers": null,
"spoutParallelism": 1,
"spoutNumTasks": 1,
"parserParallelism": 2,
"parserNumTasks": 2,
"errorWriterParallelism": 1,
"errorWriterNumTasks": 1,
"spoutConfig": {},
"securityProtocol": null,
"stormConfig": {},
"parserConfig": {
},
"fieldTransformations": [
{
"transformation":"RENAME",
"config": {
"dst": "ip_dst_addr",
"src": "ip_src_addr",
"srcPort": "ip_src_port",
"dstPort": "ip_dst_port",
"devTimeEpoch": "timestamp"
}
}
],
"cacheConfig": {},
"rawMessageStrategy": "ENVELOPE",
"rawMessageStrategyConfig": {
"messageField": "_source"
}
}
But in Storm I get the following errors:
2019-04-25 16:45:22.225 o.a.s.d.executor Thread-5-parserBolt-executor[8 8]
[ERROR]
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to
java.lang.String
at
org.apache.metron.common.message.metadata.EnvelopedRawMessageStrategy.get(EnvelopedRawMessageStrategy.java:78)
~[stormjar.jar:?]
at
org.apache.metron.common.message.metadata.RawMessageStrategies.get(RawMessageStrategies.java:54)
~[stormjar.jar:?]
at
org.apache.metron.common.message.metadata.RawMessageUtil.getRawMessage(RawMessageUtil.java:55)
~[stormjar.jar:?]
at
org.apache.metron.parsers.bolt.ParserBolt.execute(ParserBolt.java:251)
[stormjar.jar:?]
at
org.apache.storm.daemon.executor$fn__10195$tuple_action_fn__10197.invoke(executor.clj:735)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.daemon.executor$mk_task_receiver$fn__10114.invoke(executor.clj:466)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.disruptor$clojure_handler$reify__4137.onEvent(disruptor.clj:40)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:472)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:451)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at
org.apache.storm.daemon.executor$fn__10195$fn__10208$fn__10263.invoke(executor.clj:855)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at org.apache.storm.util$async_loop$fn__1221.invoke(util.clj:484)
[storm-core-1.1.0.2.6.5.1050-37.jar:1.1.0.2.6.5.1050-37]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
How can I debug this?
Thanks
Stéphane
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
--
--
simon elliston ball
@sireb
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.