Re: RegEx not catching all tags

Andy LoPresto Tue, 31 May 2016 19:47:51 -0700

Hi Sven,

Are you using an ExtractText processor [1] here? If so, you can extract 
multiple capture groups which will be stored in flowfile attributes such as 
“regexattr.1”, “regexattr.2”, etc. when assigned to the regular expression name 
“regexattr”.

Try the regular expression I’ve provided here [2] (explanation available on the 
site). This captures a literal ‘#’, any “word” character one or more times 
until a word boundary, and does this “globally”, aka does not stop searching 
after the first result. I didn’t check exhaustively if hashtags can contain 
special characters like ‘-‘, etc. but that should be well-documented by Twitter.

/(#[\w]+\b)/g

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExtractText/index.html
[2] https://regex101.com/r/gV3mO5/1

Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 31, 2016, at 3:32 PM, Sven Davison <[email protected]> wrote:
> 
> 
> http://prntscr.com/basrzy <http://prntscr.com/basrzy>
> 
> the above is a screenshot showing a hashtags var only containing the first 
> instance of a hashtag. i want to get a list of ALL hashtags from twitter.text 
> not just the first one. i'm fairly sure my RegEx is wrong... here's what i 
> have.
> 
> (#{1}[a-zA-Z0-9_]*)
> 
> i'm using https://regex101.com/ <https://regex101.com/> to simulate traffic 
> and tests.. but i can't get it to recognize more than the first instance of 
> the regex.

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: RegEx not catching all tags

Reply via email to