Re: flatfile_summarizer

Michael Miklavcic Wed, 10 Jul 2019 08:31:45 -0700

...alternatively, you could also set "separator" : "," to something that
does not exist in your regex. This would be the quick workaround, and the
more robust option would be to roll your own. And hey, while you're at it
we'd welcome the contribution back to the project!


On Wed, Jul 10, 2019 at 9:25 AM Michael Miklavcic <
[email protected]> wrote:

> Hi David,
>
> In this case you would probably want to write your own extractor by
> implementing the following interface and setting it as your extractor
> implementation -
> https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java
>
> Reference -
> https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework
>
> Best,
> Mike
>
>
> On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]>
> wrote:
>
>> I’m trying to generate a serialized object using the flatfile_summarizer
>> and I’m having some difficulty…
>>
>>
>>
>> I’m trying to take a list of RegEx’s in a text file (one regex per line),
>> and load it with the following extractor:
>>
>>
>>
>> {
>>
>>   "config" : {
>>
>>     "columns" : {
>>
>>       "regex" : 0
>>
>>     },
>>
>>     "value_filter" : "LENGTH(regex) > 0",
>>
>>     "state_init" : "SET_INIT()",
>>
>>     "state_update" : {
>>
>>       "state" : "SET_ADD(state,regex)"
>>
>>     },
>>
>>     "state_merge" : "SET_MERGE(states)",
>>
>>     "separator" : ","
>>
>>   },
>>
>>   "extractor" : "CSV"
>>
>> }
>>
>>
>>
>> Running the tool, as follows:
>>
>> /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o
>> regex.ser -e regex_extractor.json
>>
>>
>>
>> I end up with the following error message:
>>
>> Exception in thread "main" java.lang.NullPointerException
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>>
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
>>
>>
>>
>> Am I doing something wrong?  Also, is there a better alternative to the
>> “CSV” extractor?  I’m ideally looking to load the entire line, regardless
>> of any specific characters (regex may contain commas for example).
>>
>>
>>
>> Thanks in advance,
>>
>> David Auclair
>>
>>
>>
>

Re: flatfile_summarizer

Reply via email to