Hey Sumo,

To explain how the ExtractText processor works for this case I created a unit 
test based on your regex and example string:



    final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText());
    testRunner.setProperty("regex.result", "^([^ ]*).+\\[(.+)\\] \"(\\w+) ([^ 
]+) .*\" (\\w+) ([^ ]*)");
    
    testRunner.enqueue("127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] \"GET / 
HTTP/1.0\" 200 454 \"-\" \"ApacheBench/2.3\"".getBytes("UTF-8"));
    testRunner.run();
    
    testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
    final MockFlowFile out = 
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
    out.assertAttributeEquals("regex.result.0", "127.0.0.1 - - 
[07/Mar/2012:23:21:47 +0100] \"GET / HTTP/1.0\" 200 454");
    out.assertAttributeEquals("regex.result.1", "127.0.0.1");
    out.assertAttributeEquals("regex.result.2", "07/Mar/2012:23:21:47 +0100");
    out.assertAttributeEquals("regex.result.3", "GET");
    out.assertAttributeEquals("regex.result.4", "/");
    out.assertAttributeEquals("regex.result.5", "200");
    out.assertAttributeEquals("regex.result.6", "454");


You can see your regex matches each capture group to an attribute for the 
flowfile. To get IP address you reference attrivute "regex.result.1".

Hope this helps,
Joe 
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]




On Friday, October 2, 2015 8:22 PM, Sumanth Chinthagunta <[email protected]> 
wrote:



I am building a flow:
GetFile —> SplitText —> ExtractText —> LogAttribute 
the input is standard apache log file. My ultimate goal is to create JSON for 
each line from the log. 

I use to have code like this, now I am trying to build RegEx for ExtractText to 
extract ip, time, method etc as attributes.
somehow I am not able to get it working. any help will be greatly appreciated. 


//127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] "GET / HTTP/1.0" 200 454 "-" 
"ApacheBench/2.3"

defregex =~/^([^ ]*).+\[(.+)\] "(\w+) ([^ ]+) .*" (\w+) ([^ ]*)/

defmatcher =regex.matcher(line)
if(matcher.find()) {
def data = [ip: matcher.group(1), time: matcher.group(2), method: 
matcher.group(3), path: matcher.group(4), result: matcher.group(5), size: 
matcher.group(6)]
}else{
println"no match: "+line
}


https://github.com/xmlking/vertx-stalgia/blob/master/grails-app/conf/BootStrap.groovy

Thanks
Sumo

Reply via email to