Thanks Sven. Could I ask you to open a Jira [1] requesting a boolean option in the ExtractText processor properties that allows for global results?
[1] https://issues.apache.org/jira/browse/NIFI Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jun 1, 2016, at 3:23 AM, Sven Davison <[email protected]> wrote: > > Thanks. I did some more reading in the documentation and Nifi's documentation > says it only returns the first one. HOWEVER... The Jain object returned had > an element of tags already! > > $.entities.hashtags.*.text or... Something. I got it working late last night! > > > > -Sven Davison > (sent from my iPhone) > > On May 31, 2016, at 10:47 PM, Andy LoPresto <[email protected] > <mailto:[email protected]>> wrote: > >> Hi Sven, >> >> Are you using an ExtractText processor [1] here? If so, you can extract >> multiple capture groups which will be stored in flowfile attributes such as >> “regexattr.1”, “regexattr.2”, etc. when assigned to the regular expression >> name “regexattr”. >> >> Try the regular expression I’ve provided here [2] (explanation available on >> the site). This captures a literal ‘#’, any “word” character one or more >> times until a word boundary, and does this “globally”, aka does not stop >> searching after the first result. I didn’t check exhaustively if hashtags >> can contain special characters like ‘-‘, etc. but that should be >> well-documented by Twitter. >> >> /(#[\w]+\b)/g >> >> [1] >> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html >> >> <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html> >> [2] https://regex101.com/r/gV3mO5/1 <https://regex101.com/r/gV3mO5/1> >> >> >> Andy LoPresto >> [email protected] <mailto:[email protected]> >> [email protected] <mailto:[email protected]> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >> >>> On May 31, 2016, at 3:32 PM, Sven Davison <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> http://prntscr.com/basrzy <http://prntscr.com/basrzy> >>> >>> the above is a screenshot showing a hashtags var only containing the first >>> instance of a hashtag. i want to get a list of ALL hashtags from >>> twitter.text not just the first one. i'm fairly sure my RegEx is wrong... >>> here's what i have. >>> >>> (#{1}[a-zA-Z0-9_]*) >>> >>> i'm using https://regex101.com/ <https://regex101.com/> to simulate traffic >>> and tests.. but i can't get it to recognize more than the first instance of >>> the regex. >>
signature.asc
Description: Message signed with OpenPGP using GPGMail
