[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/432 I'll point out as well that it'd be nice to have a decent exception there, kinda like what you'd get from jsonlint.com: ``` Error: Parse error on line 2: ... "config": "columns": { "domain": 1, ---^ Expecting 'EOF', '}', ',', ']', got ':' ``` That might be worth a JIRA, honestly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/432 Also, I'll point out that you can make your life easier and kill pretty much everything on your vagrant and do this. The only reliance is on HBase and MR. I would suggest killing: * monit via `service monit stop` * all the storm topologies via `storm kill bro && storm kill snort && storm kill enrichment && storm kill indexing` * tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print $2}');do kill -9 $i;done` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/432 [vagrant@node1 tmp]$ /usr/metron/0.3.0/bin/flatfile_loader.sh -i http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e ./extractor.json -p 5 -b 128 Exception in thread "main" org.codehaus.jackson.map.JsonMappingException: Can not instantiate value of type [map type; class java.util.LinkedHashMap, [simple type, class java.lang.String] -> [simple type, class java.lang.Object]] from JSON String; no single-String constructor/factory method (through reference chain: org.apache.metron.dataloads.extractor.ExtractorHandler["config"]) at org.codehaus.jackson.map.deser.std.StdValueInstantiator._createFromStringFallbacks(StdValueInstantiator.java:379) at org.codehaus.jackson.map.deser.std.StdValueInstantiator.createFromString(StdValueInstantiator.java:268) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:244) at org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33) at org.codehaus.jackson.map.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:299) at org.codehaus.jackson.map.deser.SettableBeanProperty$MethodProperty.deserializeAndSet(SettableBeanProperty.java:414) at org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:697) at org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580) at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1909) at org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:70) at org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:75) at org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:78) at org.apache.metron.dataloads.nonbulk.flatfile.SimpleEnrichmentFlatFileLoader.main(SimpleEnrichmentFlatFileLoader.java:49) at org.apache.metron.dataloads.nonbulk.flatfile.SimpleEnrichmentFlatFileLoader.main(SimpleEnrichmentFlatFileLoader.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) [vagrant@node1 tmp]$ with extractor.json of: ``` { "config" : "columns" : { "domain" : 1, "rank" : 0 } ,"indicator_column" : "domain" ,"type" : "alexa" ,"separator" : "," }, "extractor" : "CSV" } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/432 I am trying to test this out - my vm is not doing well at the moment however. Hopefully I can get it straight --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/432 # Testing Plan ## Preliminaries * Download the alexa 1m dataset: ``` wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip unzip top-1m.csv.zip ``` * Stage import files ``` head -n 1 top-1m.csv > top-10k.csv hadoop fs -put top-10k.csv /tmp head -n 1 top-1m.csv | gzip - > top-10k.csv.gz head -n 1 top-1m.csv | zip > top-10k.csv.zip ``` * Create an extractor.json for the CSV data by editing `extractor.json` and pasting in these contents: ``` { "config" : { "columns" : { "domain" : 1, "rank" : 0 } ,"indicator_column" : "domain" ,"type" : "alexa" ,"separator" : "," }, "extractor" : "CSV" } ``` ## Import from URL ``` # truncate hbase echo "truncate 'enrichment'" | hbase shell # import data into hbase from URL. This should take approximately 5 or 6 minutes /usr/metron/0.3.0/bin/flatfile_loader.sh -i http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e ./extractor.json -p 5 -b 128 # count data written and verify it's 1M echo "count 'enrichment'" | hbase shell ``` ## Import from local file (non-zipped) ``` # truncate hbase echo "truncate 'enrichment'" | hbase shell # import data into hbase /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c t -e ./extractor.json -p 5 -b 128 # count data written and verify it's 10k echo "count 'enrichment'" | hbase shell ``` ## Import from local file (gzipped) ``` # truncate hbase echo "truncate 'enrichment'" | hbase shell # import data into hbase /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.gz -t enrichment -c t -e ./extractor.json -p 5 -b 128 # count data written and verify it's 10k echo "count 'enrichment'" | hbase shell ``` ## Import from local file (zipped) ``` # truncate hbase echo "truncate 'enrichment'" | hbase shell # import data into hbase /usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.zip -t enrichment -c t -e ./extractor.json -p 5 -b 128 # count data written and verify it's 10k echo "count 'enrichment'" | hbase shell ``` ## Import from HDFS via MR ``` # truncate hbase echo "truncate 'enrichment'" | hbase shell # import data into hbase /usr/metron/0.3.0/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t -e ./extractor.json -m MR # count data written and verify it's 10k echo "count 'enrichment'" | hbase shell ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/432 I know it seems like a lot of code changed, but a lot of this was reorganizing the flat file loader class and splitting it into separate reusable components, rather than new code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---