Another workaround might be to specify a never occurring character as the
separator and sticking with CSV.

Simon

On Wed, 10 Jul 2019 at 11:26, Michael Miklavcic <[email protected]>
wrote:

> Hi David,
>
> In this case you would probably want to write your own extractor by
> implementing the following interface and setting it as your extractor
> implementation -
> https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java
>
> Reference -
> https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework
>
> Best,
> Mike
>
>
> On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]>
> wrote:
>
>> I’m trying to generate a serialized object using the flatfile_summarizer
>> and I’m having some difficulty…
>>
>>
>>
>> I’m trying to take a list of RegEx’s in a text file (one regex per line),
>> and load it with the following extractor:
>>
>>
>>
>> {
>>
>>   "config" : {
>>
>>     "columns" : {
>>
>>       "regex" : 0
>>
>>     },
>>
>>     "value_filter" : "LENGTH(regex) > 0",
>>
>>     "state_init" : "SET_INIT()",
>>
>>     "state_update" : {
>>
>>       "state" : "SET_ADD(state,regex)"
>>
>>     },
>>
>>     "state_merge" : "SET_MERGE(states)",
>>
>>     "separator" : ","
>>
>>   },
>>
>>   "extractor" : "CSV"
>>
>> }
>>
>>
>>
>> Running the tool, as follows:
>>
>> /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o
>> regex.ser -e regex_extractor.json
>>
>>
>>
>> I end up with the following error message:
>>
>> Exception in thread "main" java.lang.NullPointerException
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51)
>>
>>         at
>> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:498)
>>
>>         at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
>>
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
>>
>>
>>
>> Am I doing something wrong?  Also, is there a better alternative to the
>> “CSV” extractor?  I’m ideally looking to load the entire line, regardless
>> of any specific characters (regex may contain commas for example).
>>
>>
>>
>> Thanks in advance,
>>
>> David Auclair
>>
>>
>>
>

-- 
--
simon elliston ball
@sireb

Reply via email to