[ngram] Re: Problem with a token

2008-02-14 Thread mercevg
Patrick, Ted, I added use locale; in line 83 but this can't improve my results: words containing the character l·l (like intel·ligència)are not included in the results list. But it is important to say that I add as a tokens all accents, diaeresis and apostrophes that are used in Catalan corpus

[ngram] Re: plans for version 1.05

2008-02-14 Thread mercevg
Ted, I have two suggestions to improve the new version. 1. I have problems to extract bigrams using Fishers exact test - left sided and Fishers exact test - right sided. Could you fix this two measures? The error message: Can't locate Text/NSP/Measures/2D/left.pm in @INC (@INC contains:

Re: [ngram] plans for version 1.05

2008-02-14 Thread Richard Jelinek
On Thu, Feb 14, 2008 at 03:51:40PM -, Ted Pedersen wrote: 1) Incorporate use locale throughout package (suggested by Patrick Drouin long ago)This will make for more convenient handling of non-English text. Wrong idea, wrong solution. To make handling of non-Latin1 text more convenient,

Re: [ngram] Re: plans for version 1.05

2008-02-14 Thread Björn Wilmsmann
Richard Jelinek wrote: Ths advantage is illusional - unfortunately. llusional in the sense, as the some problems it seems to solve rely on a well set up environment on the OS side. Which isn't always the case. Moreover, Well, an improperly set up system locale is bound to give you all kinds