Hi, I don’t think you can do what you want to using ExtractText processor. The relevant section of the code
if (matcher.find()) Line 320 (v0.4.1) ExtractText.java (I would have included more of this to put in context but got blocked by email filtering) Because matcher.find() is used it will only match once. To get each match of the repeated group, it would have to be in a while (matcher.find()) …. with each matching group returned with matcher.group() call. Unless someone else can suggest anything different, I would say you would have to write your own custom processor for this (or extend ExtractText processor with another property for repeating groups and have a different part of code run if set which uses while matcher.find() HTH, Conrad From: John Burns <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, 25 February 2016 at 09:44 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: ExtractText Processor Hi, Thank you for the reply. I am trying to solve something I thought would be fairly simple but not having much success: Consider the string "my friend and I went for a long walk. It was raining and it was very cold". When tested against one single Java regex (.{9}and.{9})+ results in two matches: "y friend and I went f" and "raining and it was v". In NiFi I wish to do something similar, ie, capture all the matching strings for a given regex (similar to grep). When I run the above regex in NiFi I see only the first match but not the second. Could you advise how I can access all matches for the regex. The use case here is to monitor websites for specific a word and extract (say) 10 characters either side of the matching word - for all matches on the site. Thanks again John On Mon, Feb 22, 2016 at 7:05 AM, Conrad Crampton <[email protected]<mailto:[email protected]>> wrote: Hi John, If you use a property for your regexp called matches for example that has many capture groups in it e.g. matches (?:^(.+) (\d+)$) If this matches the incoming flow file, then you will end up after processing with 3 attributes. matches matches.1 matches.2 With the matches and matches.1 being the same value (of the first capture group). If you set the ‘Include Capture Group 0’ to be true you get an additional attribute matches.0 that is the whole match group (as with Java RegExp class. HTH, Conrad From: John Burns <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Sunday, 21 February 2016 at 20:04 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: ExtractText Processor Hi, I'm using ExtractText processor to monitor a website for specific content terms and log matches to a database. However, according to the documents on ExtractText ".....If the Regular Expression matches more than once, only the first match will be used" Do I understand this correctly as meaning that only the first regex match of a given term will be captured (as opposed to how grep works for example). I want to capture all occurrences of the match not just the first. Any help would be appreciated. Many thanks John ***This email originated outside SecureData*** Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam. SecureData, combating cyber threats ________________________________ The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote. SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT
