Hey Sumo,
To explain how the ExtractText processor works for this case I created a unit
test based on your regex and example string:
final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText());
testRunner.setProperty("regex.result", "^([^ ]*).+\\[(.+)\\] \"(\\w+) ([^
]+) .*\" (\\w+) ([^ ]*)");
testRunner.enqueue("127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] \"GET /
HTTP/1.0\" 200 454 \"-\" \"ApacheBench/2.3\"".getBytes("UTF-8"));
testRunner.run();
testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
final MockFlowFile out =
testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
out.assertAttributeEquals("regex.result.0", "127.0.0.1 - -
[07/Mar/2012:23:21:47 +0100] \"GET / HTTP/1.0\" 200 454");
out.assertAttributeEquals("regex.result.1", "127.0.0.1");
out.assertAttributeEquals("regex.result.2", "07/Mar/2012:23:21:47 +0100");
out.assertAttributeEquals("regex.result.3", "GET");
out.assertAttributeEquals("regex.result.4", "/");
out.assertAttributeEquals("regex.result.5", "200");
out.assertAttributeEquals("regex.result.6", "454");
You can see your regex matches each capture group to an attribute for the
flowfile. To get IP address you reference attrivute "regex.result.1".
Hope this helps,
Joe
- - - - - -
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]
On Friday, October 2, 2015 8:22 PM, Sumanth Chinthagunta <[email protected]>
wrote:
I am building a flow:
GetFile —> SplitText —> ExtractText —> LogAttribute
the input is standard apache log file. My ultimate goal is to create JSON for
each line from the log.
I use to have code like this, now I am trying to build RegEx for ExtractText to
extract ip, time, method etc as attributes.
somehow I am not able to get it working. any help will be greatly appreciated.
//127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] "GET / HTTP/1.0" 200 454 "-"
"ApacheBench/2.3"
defregex =~/^([^ ]*).+\[(.+)\] "(\w+) ([^ ]+) .*" (\w+) ([^ ]*)/
defmatcher =regex.matcher(line)
if(matcher.find()) {
def data = [ip: matcher.group(1), time: matcher.group(2), method:
matcher.group(3), path: matcher.group(4), result: matcher.group(5), size:
matcher.group(6)]
}else{
println"no match: "+line
}
https://github.com/xmlking/vertx-stalgia/blob/master/grails-app/conf/BootStrap.groovy
Thanks
Sumo