Em 08-08-2016 19:25, Bernardo Ezequiel Contreras escreveu: > Hi, > > have you try with (World>>Help>>Help Browser>>Regular Expressions > Framework>>Usage) > > SUBEXPRESSION MATCHES > > After a successful match attempt, you can query the specifics of which > part of the original string has matched which part of the whole > expression. > > (...) Thanks, but thing is: my need is little more complex than finding sequences. I'm looking for expressions in natural language text. The expressions must be extracted without ambiguities so I have cases for occurrences in the beginning of line (aka '^(#\w+)([\s.,;\:!?]*)') in the middle of the line (aka '([\s.,;\:!?]+)(#\w+)([\s.,;\:!?]+)') or at the end (which may be simplified to the second case...). So, if I find several hashtags in a text like:
'A política no Brasil está complicada #FAIL porque a corrupção impera #CRIME. De qualquer forma os #PETRALHAS, que tudo justificam, levam o país ao #CAOS' I want two things: 1st and obvious: #( '#FAIL' '#CRIME' '#PETRALHAS' '#CAOS') 2nd: the line minus hashtags: 'A política no Brasil está complicada porque a corrupção impera. De qualquer forma os, que tudo justificam, levam o país ao' When I use regexps to process the line, for instance: bfr := line copyWithRegex: '#\w+' matchesReplacedUsing [ :e | '' ]. I can have trouble because it will extract things like #ANOTAÇÃO# which is not a hashtag but will match. And I'm trying to avoid doing the Lex/Yacc thing here :D Best regards, CdAB -- The information contained in this message is confidential and intended to the recipients specified in the headers. If you received this message by error, notify the sender immediately. The unauthorized use, disclosure, copy or alteration of this message are strictly forbidden and subjected to civil and criminal sanctions. == This email may be signed using PGP key *ID: 0x4134A417*
signature.asc
Description: OpenPGP digital signature