[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-03 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
I'll point out as well that it'd be nice to have a decent exception there, 
kinda like what you'd get from jsonlint.com:
```
Error: Parse error on line 2:
... "config": "columns": {  "domain": 1,
---^
Expecting 'EOF', '}', ',', ']', got ':'
```

That might be worth a JIRA, honestly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-03 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
Also, I'll point out that you can make your life easier and kill pretty 
much everything on your vagrant and do this.  The only reliance is on HBase and 
MR.  

I would suggest killing:
* monit via `service monit stop`
* all the storm topologies via `storm kill bro && storm kill snort && storm 
kill enrichment && storm kill indexing`
* tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print $2}');do 
kill -9 $i;done`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-03 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
[vagrant@node1 tmp]$ /usr/metron/0.3.0/bin/flatfile_loader.sh -i 
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e 
./extractor.json -p 5 -b 128
Exception in thread "main" org.codehaus.jackson.map.JsonMappingException: 
Can not instantiate value of type [map type; class java.util.LinkedHashMap, 
[simple type, class java.lang.String] -> [simple type, class java.lang.Object]] 
from JSON String; no single-String constructor/factory method (through 
reference chain: 
org.apache.metron.dataloads.extractor.ExtractorHandler["config"])
at 
org.codehaus.jackson.map.deser.std.StdValueInstantiator._createFromStringFallbacks(StdValueInstantiator.java:379)
at 
org.codehaus.jackson.map.deser.std.StdValueInstantiator.createFromString(StdValueInstantiator.java:268)
at 
org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:244)
at 
org.codehaus.jackson.map.deser.std.MapDeserializer.deserialize(MapDeserializer.java:33)
at 
org.codehaus.jackson.map.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:299)
at 
org.codehaus.jackson.map.deser.SettableBeanProperty$MethodProperty.deserializeAndSet(SettableBeanProperty.java:414)
at 
org.codehaus.jackson.map.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:697)
at 
org.codehaus.jackson.map.deser.BeanDeserializer.deserialize(BeanDeserializer.java:580)
at 
org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2732)
at 
org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1909)
at 
org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:70)
at 
org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:75)
at 
org.apache.metron.dataloads.extractor.ExtractorHandler.load(ExtractorHandler.java:78)
at 
org.apache.metron.dataloads.nonbulk.flatfile.SimpleEnrichmentFlatFileLoader.main(SimpleEnrichmentFlatFileLoader.java:49)
at 
org.apache.metron.dataloads.nonbulk.flatfile.SimpleEnrichmentFlatFileLoader.main(SimpleEnrichmentFlatFileLoader.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
[vagrant@node1 tmp]$ 

with extractor.json of:


```
{
  "config" :
"columns" : {
  "domain" : 1,
  "rank" : 0
  }
,"indicator_column" : "domain"
,"type" : "alexa"
,"separator" : ","
},
  "extractor" : "CSV"
}   
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-03 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
I am trying to test this out - my vm is not doing well at the moment 
however.  Hopefully I can get it straight


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-01 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
# Testing Plan

## Preliminaries

* Download the alexa 1m dataset:
```
wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
unzip top-1m.csv.zip
```
* Stage import files
```
head -n 1 top-1m.csv > top-10k.csv
hadoop fs -put top-10k.csv /tmp
head -n 1 top-1m.csv | gzip - > top-10k.csv.gz
head -n 1 top-1m.csv | zip > top-10k.csv.zip
```
* Create an extractor.json for the CSV data by editing `extractor.json` and 
pasting in these contents:
```
{
  "config" : {
"columns" : {
   "domain" : 1,
   "rank" : 0
}
,"indicator_column" : "domain"
,"type" : "alexa"
,"separator" : ","
 },
  "extractor" : "CSV"
}
```

## Import from URL
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase from URL.  This should take approximately 5 or 6 
minutes
/usr/metron/0.3.0/bin/flatfile_loader.sh -i 
http://s3.amazonaws.com/alexa-static/top-1m.csv.zip -t enrichment -c t -e 
./extractor.json -p 5 -b 128
# count data written and verify it's 1M
echo "count 'enrichment'" | hbase shell
```

## Import from local file (non-zipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv -t enrichment -c 
t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```

## Import from local file (gzipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.gz -t enrichment 
-c t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```

## Import from local file (zipped)
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i ./top-10k.csv.zip -t enrichment 
-c t -e ./extractor.json -p 5 -b 128
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```

## Import from HDFS via MR
```
# truncate hbase
echo "truncate 'enrichment'" | hbase shell
# import data into hbase 
/usr/metron/0.3.0/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment 
-c t -e ./extractor.json -m MR
# count data written and verify it's 10k
echo "count 'enrichment'" | hbase shell
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #432: METRON-682: Unify and Improve the Flat File Loa...

2017-02-01 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/432
  
I know it seems like a lot of code changed, but a lot of this was 
reorganizing the flat file loader class and splitting it into separate reusable 
components, rather than new code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---