Re: [ngram] Re: ngrams with hyphen

2011-04-23 Thread Ted Pedersen
Hi Merce, Ah, yes, I see what you mean. The problem with using \s in the stoplist is that the toknization prior to checking for stop words does not include a trailing \s, and so /\s[Ii]n\s/ is never matched. The trick here is to redfine the \b character class so it doesn't include -. This

[ngram] Re: ngrams with hyphen

2011-04-22 Thread mercevg
Ted, Thanks, I've add this regular expression in my tokens file and it works well. One more comment about that: In my corpus I have some interesting bigrams as in-band signalling in-call rearrangement in-slot signalling If I filter as a stopword in, I can't get these kind of bigrams from my