Em 08-08-2016 19:25, Bernardo Ezequiel Contreras escreveu:
> Hi,
>
>   have you try with  (World>>Help>>Help Browser>>Regular Expressions
> Framework>>Usage)
>
> SUBEXPRESSION MATCHES
>
> After a successful match attempt, you can query the specifics of which
> part of the original string has matched which part of the whole
> expression.
>
> (...)
Thanks, but thing is: my need is little more complex than finding
sequences. I'm looking for expressions in natural language text. The
expressions must be extracted without ambiguities so I have cases for
occurrences in the beginning of line (aka '^(#\w+)([\s.,;\:!?]*)') in
the middle of the line (aka '([\s.,;\:!?]+)(#\w+)([\s.,;\:!?]+)') or at
the end (which may be simplified to the second case...). So, if I find
several hashtags in a text like:

'A política no Brasil está complicada #FAIL porque a corrupção impera
#CRIME. De qualquer forma os #PETRALHAS, que tudo justificam, levam o
país ao #CAOS'

I want two things:

1st and obvious: #( '#FAIL' '#CRIME' '#PETRALHAS' '#CAOS')
2nd: the line minus hashtags: 'A política no Brasil está complicada
porque a corrupção impera. De qualquer forma os, que tudo justificam,
levam o país ao'

When I use regexps to process the line, for instance:

bfr := line copyWithRegex: '#\w+' matchesReplacedUsing [ :e | '' ].

I can have trouble because it will extract things like #ANOTAÇÃO# which
is not a hashtag but will match.

And I'm trying to avoid doing the Lex/Yacc thing here :D

Best regards,

CdAB

-- 
The information contained in this message is confidential and intended
to the recipients specified in the headers. If you received this message
by error, notify the sender immediately. The unauthorized use,
disclosure, copy or alteration of this message are strictly forbidden
and subjected to civil and criminal sanctions.

==

This email may be signed using PGP key *ID: 0x4134A417*

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to