Hi Sven, Are you using an ExtractText processor [1] here? If so, you can extract multiple capture groups which will be stored in flowfile attributes such as “regexattr.1”, “regexattr.2”, etc. when assigned to the regular expression name “regexattr”.
Try the regular expression I’ve provided here [2] (explanation available on the site). This captures a literal ‘#’, any “word” character one or more times until a word boundary, and does this “globally”, aka does not stop searching after the first result. I didn’t check exhaustively if hashtags can contain special characters like ‘-‘, etc. but that should be well-documented by Twitter. /(#[\w]+\b)/g [1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html [2] https://regex101.com/r/gV3mO5/1 Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On May 31, 2016, at 3:32 PM, Sven Davison <[email protected]> wrote: > > > http://prntscr.com/basrzy <http://prntscr.com/basrzy> > > the above is a screenshot showing a hashtags var only containing the first > instance of a hashtag. i want to get a list of ALL hashtags from twitter.text > not just the first one. i'm fairly sure my RegEx is wrong... here's what i > have. > > (#{1}[a-zA-Z0-9_]*) > > i'm using https://regex101.com/ <https://regex101.com/> to simulate traffic > and tests.. but i can't get it to recognize more than the first instance of > the regex.
signature.asc
Description: Message signed with OpenPGP using GPGMail
