Hi Joe,
Thanks. that really helped.  I was not clear from ExtractText's documentation,
where  I should set regex and how I get the matching results back.
now I got it working.

On Fri, Oct 2, 2015 at 5:52 PM, Joe Percivall <[email protected]>
wrote:

> Hey Sumo,
>
> To explain how the ExtractText processor works for this case I created a
> unit test based on your regex and example string:
>
>
>
>     final TestRunner testRunner = TestRunners.newTestRunner(new
> ExtractText());
>     testRunner.setProperty("regex.result", "^([^ ]*).+\\[(.+)\\] \"(\\w+)
> ([^ ]+) .*\" (\\w+) ([^ ]*)");
>
>     testRunner.enqueue("127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] \"GET /
> HTTP/1.0\" 200 454 \"-\" \"ApacheBench/2.3\"".getBytes("UTF-8"));
>     testRunner.run();
>
>     testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
>     final MockFlowFile out =
> testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
>     out.assertAttributeEquals("regex.result.0", "127.0.0.1 - -
> [07/Mar/2012:23:21:47 +0100] \"GET / HTTP/1.0\" 200 454");
>     out.assertAttributeEquals("regex.result.1", "127.0.0.1");
>     out.assertAttributeEquals("regex.result.2", "07/Mar/2012:23:21:47
> +0100");
>     out.assertAttributeEquals("regex.result.3", "GET");
>     out.assertAttributeEquals("regex.result.4", "/");
>     out.assertAttributeEquals("regex.result.5", "200");
>     out.assertAttributeEquals("regex.result.6", "454");
>
>
> You can see your regex matches each capture group to an attribute for the
> flowfile. To get IP address you reference attrivute "regex.result.1".
>
> Hope this helps,
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
>
> On Friday, October 2, 2015 8:22 PM, Sumanth Chinthagunta <
> [email protected]> wrote:
>
>
>
> I am building a flow:
> GetFile —> SplitText —> ExtractText —> LogAttribute
> the input is standard apache log file. My ultimate goal is to create JSON
> for each line from the log.
>
> I use to have code like this, now I am trying to build RegEx for
> ExtractText to extract ip, time, method etc as attributes.
> somehow I am not able to get it working. any help will be greatly
> appreciated.
>
>
> //127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] "GET / HTTP/1.0" 200 454 "-"
> "ApacheBench/2.3"
>
> defregex =~/^([^ ]*).+\[(.+)\] "(\w+) ([^ ]+) .*" (\w+) ([^ ]*)/
>
> defmatcher =regex.matcher(line)
> if(matcher.find()) {
> def data = [ip: matcher.group(1), time: matcher.group(2), method:
> matcher.group(3), path: matcher.group(4), result: matcher.group(5), size:
> matcher.group(6)]
> }else{
> println"no match: "+line
> }
>
>
>
> https://github.com/xmlking/vertx-stalgia/blob/master/grails-app/conf/BootStrap.groovy
>
> Thanks
> Sumo
>



-- 
Thanks,
Sumanth

Reply via email to