Re: flatfile_summarizer

Michael Miklavcic Wed, 10 Jul 2019 08:26:19 -0700

Hi David,

In this case you would probably want to write your own extractor by
implementing the following interface and setting it as your extractor
implementation -
https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java


Reference -
https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework

Best,
Mike


On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]> wrote:

> I’m trying to generate a serialized object using the flatfile_summarizer
> and I’m having some difficulty…
>
>
>
> I’m trying to take a list of RegEx’s in a text file (one regex per line),
> and load it with the following extractor:
>
>
>
> {
>
>   "config" : {
>
>     "columns" : {
>
>       "regex" : 0
>
>     },
>
>     "value_filter" : "LENGTH(regex) > 0",
>
>     "state_init" : "SET_INIT()",
>
>     "state_update" : {
>
>       "state" : "SET_ADD(state,regex)"
>
>     },
>
>     "state_merge" : "SET_MERGE(states)",
>
>     "separator" : ","
>
>   },
>
>   "extractor" : "CSV"
>
> }
>
>
>
> Running the tool, as follows:
>
> /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o
> regex.ser -e regex_extractor.json
>
>
>
> I end up with the following error message:
>
> Exception in thread "main" java.lang.NullPointerException
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45)
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54)
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30)
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136)
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51)
>
>         at
> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
>
>
>
> Am I doing something wrong?  Also, is there a better alternative to the
> “CSV” extractor?  I’m ideally looking to load the entire line, regardless
> of any specific characters (regex may contain commas for example).
>
>
>
> Thanks in advance,
>
> David Auclair
>
>
>

Re: flatfile_summarizer

Reply via email to