Hi David, In this case you would probably want to write your own extractor by implementing the following interface and setting it as your extractor implementation - https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java
Reference - https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework Best, Mike On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]> wrote: > I’m trying to generate a serialized object using the flatfile_summarizer > and I’m having some difficulty… > > > > I’m trying to take a list of RegEx’s in a text file (one regex per line), > and load it with the following extractor: > > > > { > > "config" : { > > "columns" : { > > "regex" : 0 > > }, > > "value_filter" : "LENGTH(regex) > 0", > > "state_init" : "SET_INIT()", > > "state_update" : { > > "state" : "SET_ADD(state,regex)" > > }, > > "state_merge" : "SET_MERGE(states)", > > "separator" : "," > > }, > > "extractor" : "CSV" > > } > > > > Running the tool, as follows: > > /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o > regex.ser -e regex_extractor.json > > > > I end up with the following error message: > > Exception in thread "main" java.lang.NullPointerException > > at > org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45) > > at > org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54) > > at > org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30) > > at > org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136) > > at > org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51) > > at > org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at org.apache.hadoop.util.RunJar.run(RunJar.java:233) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:148) > > > > Am I doing something wrong? Also, is there a better alternative to the > “CSV” extractor? I’m ideally looking to load the entire line, regardless > of any specific characters (regex may contain commas for example). > > > > Thanks in advance, > > David Auclair > > >
