Hi Joe, Thanks. that really helped. I was not clear from ExtractText's documentation, where I should set regex and how I get the matching results back. now I got it working.
On Fri, Oct 2, 2015 at 5:52 PM, Joe Percivall <[email protected]> wrote: > Hey Sumo, > > To explain how the ExtractText processor works for this case I created a > unit test based on your regex and example string: > > > > final TestRunner testRunner = TestRunners.newTestRunner(new > ExtractText()); > testRunner.setProperty("regex.result", "^([^ ]*).+\\[(.+)\\] \"(\\w+) > ([^ ]+) .*\" (\\w+) ([^ ]*)"); > > testRunner.enqueue("127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] \"GET / > HTTP/1.0\" 200 454 \"-\" \"ApacheBench/2.3\"".getBytes("UTF-8")); > testRunner.run(); > > testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1); > final MockFlowFile out = > testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0); > out.assertAttributeEquals("regex.result.0", "127.0.0.1 - - > [07/Mar/2012:23:21:47 +0100] \"GET / HTTP/1.0\" 200 454"); > out.assertAttributeEquals("regex.result.1", "127.0.0.1"); > out.assertAttributeEquals("regex.result.2", "07/Mar/2012:23:21:47 > +0100"); > out.assertAttributeEquals("regex.result.3", "GET"); > out.assertAttributeEquals("regex.result.4", "/"); > out.assertAttributeEquals("regex.result.5", "200"); > out.assertAttributeEquals("regex.result.6", "454"); > > > You can see your regex matches each capture group to an attribute for the > flowfile. To get IP address you reference attrivute "regex.result.1". > > Hope this helps, > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > > On Friday, October 2, 2015 8:22 PM, Sumanth Chinthagunta < > [email protected]> wrote: > > > > I am building a flow: > GetFile —> SplitText —> ExtractText —> LogAttribute > the input is standard apache log file. My ultimate goal is to create JSON > for each line from the log. > > I use to have code like this, now I am trying to build RegEx for > ExtractText to extract ip, time, method etc as attributes. > somehow I am not able to get it working. any help will be greatly > appreciated. > > > //127.0.0.1 - - [07/Mar/2012:23:21:47 +0100] "GET / HTTP/1.0" 200 454 "-" > "ApacheBench/2.3" > > defregex =~/^([^ ]*).+\[(.+)\] "(\w+) ([^ ]+) .*" (\w+) ([^ ]*)/ > > defmatcher =regex.matcher(line) > if(matcher.find()) { > def data = [ip: matcher.group(1), time: matcher.group(2), method: > matcher.group(3), path: matcher.group(4), result: matcher.group(5), size: > matcher.group(6)] > }else{ > println"no match: "+line > } > > > > https://github.com/xmlking/vertx-stalgia/blob/master/grails-app/conf/BootStrap.groovy > > Thanks > Sumo > -- Thanks, Sumanth
