Hi Merce,
Ah, yes, I see what you mean. The problem with using \s in the stoplist is
that the toknization prior to checking for stop words does not include a
trailing \s, and so /\s[Ii]n\s/ is never matched.
The trick here is to redfine the \b character class so it doesn't include -.
This
Ted,
Thanks, I've add this regular expression in my tokens file and it works well.
One more comment about that:
In my corpus I have some interesting bigrams as
in-band signalling
in-call rearrangement
in-slot signalling
If I filter as a stopword in, I can't get these kind of bigrams from my