...alternatively, you could also set "separator" : "," to something that does not exist in your regex. This would be the quick workaround, and the more robust option would be to roll your own. And hey, while you're at it we'd welcome the contribution back to the project!
On Wed, Jul 10, 2019 at 9:25 AM Michael Miklavcic < [email protected]> wrote: > Hi David, > > In this case you would probably want to write your own extractor by > implementing the following interface and setting it as your extractor > implementation - > https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java > > Reference - > https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework > > Best, > Mike > > > On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]> > wrote: > >> I’m trying to generate a serialized object using the flatfile_summarizer >> and I’m having some difficulty… >> >> >> >> I’m trying to take a list of RegEx’s in a text file (one regex per line), >> and load it with the following extractor: >> >> >> >> { >> >> "config" : { >> >> "columns" : { >> >> "regex" : 0 >> >> }, >> >> "value_filter" : "LENGTH(regex) > 0", >> >> "state_init" : "SET_INIT()", >> >> "state_update" : { >> >> "state" : "SET_ADD(state,regex)" >> >> }, >> >> "state_merge" : "SET_MERGE(states)", >> >> "separator" : "," >> >> }, >> >> "extractor" : "CSV" >> >> } >> >> >> >> Running the tool, as follows: >> >> /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o >> regex.ser -e regex_extractor.json >> >> >> >> I end up with the following error message: >> >> Exception in thread "main" java.lang.NullPointerException >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:233) >> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:148) >> >> >> >> Am I doing something wrong? Also, is there a better alternative to the >> “CSV” extractor? I’m ideally looking to load the entire line, regardless >> of any specific characters (regex may contain commas for example). >> >> >> >> Thanks in advance, >> >> David Auclair >> >> >> >
