Another workaround might be to specify a never occurring character as the separator and sticking with CSV.
Simon On Wed, 10 Jul 2019 at 11:26, Michael Miklavcic <[email protected]> wrote: > Hi David, > > In this case you would probably want to write your own extractor by > implementing the following interface and setting it as your extractor > implementation - > https://github.com/apache/metron/blob/5d3e73ab95adf0c8f49c3f821975740e365df91a/metron-platform/metron-data-management/src/main/java/org/apache/metron/dataloads/extractor/Extractor.java > > Reference - > https://github.com/apache/metron/blob/f43035c02ef01f07ff382bbf136eb1bada727fbb/metron-platform/metron-data-management/README.md#extractor-framework > > Best, > Mike > > > On Fri, Jul 5, 2019 at 12:40 PM David Auclair <[email protected]> > wrote: > >> I’m trying to generate a serialized object using the flatfile_summarizer >> and I’m having some difficulty… >> >> >> >> I’m trying to take a list of RegEx’s in a text file (one regex per line), >> and load it with the following extractor: >> >> >> >> { >> >> "config" : { >> >> "columns" : { >> >> "regex" : 0 >> >> }, >> >> "value_filter" : "LENGTH(regex) > 0", >> >> "state_init" : "SET_INIT()", >> >> "state_update" : { >> >> "state" : "SET_ADD(state,regex)" >> >> }, >> >> "state_merge" : "SET_MERGE(states)", >> >> "separator" : "," >> >> }, >> >> "extractor" : "CSV" >> >> } >> >> >> >> Running the tool, as follows: >> >> /usr/hcp/current/metron/bin/flatfile_summarizer.sh -i ./regex.txt -o >> regex.ser -e regex_extractor.json >> >> >> >> I end up with the following error message: >> >> Exception in thread "main" java.lang.NullPointerException >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.LocalWriter.write(LocalWriter.java:45) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writers.write(Writers.java:54) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.writer.Writer.write(Writer.java:30) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.importer.LocalSummarizer.importData(LocalSummarizer.java:136) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:51) >> >> at >> org.apache.metron.dataloads.nonbulk.flatfile.SimpleFlatFileSummarizer.main(SimpleFlatFileSummarizer.java:38) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:498) >> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:233) >> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:148) >> >> >> >> Am I doing something wrong? Also, is there a better alternative to the >> “CSV” extractor? I’m ideally looking to load the entire line, regardless >> of any specific characters (regex may contain commas for example). >> >> >> >> Thanks in advance, >> >> David Auclair >> >> >> > -- -- simon elliston ball @sireb
