Re: [ngram] Spearman's rank correlation coefficient

2004-05-07 Thread ted pedersen
ersions and > not difficult to implement. > > Thanks in advance. > > Bego~a > > > > > Yahoo! Groups Links > > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor -~--> Yahoo! Domain

Re: [ngram] Spearman's rank correlation coefficient

2004-05-08 Thread ted pedersen
lue obtained from > the standard normal table. > > > > Back to my question, i looked at the script rank.pl and i think it > returns r_s in the formula above, say, the normal Spearman's rank > correlation coefficient. But, when n > 100 rank.pl should return the > z value mentio

Re: [ngram] incorrect count

2004-05-10 Thread ted pedersen
e discrepency resolved so that I > can go ahead and describe the circumstances of the incorrect count > in detail. Hopefully the above clarifies that. Go ahead and describe! It would help to know all the command line settings you are using, as well as having a reduced version of your input

[ngram] rank.pl and how to interpret when n is large...

2004-05-17 Thread ted pedersen
z = r_s * sqrt(n -1) with r_s being Spearman's rho and n the number of pairs (z is identical to M in Agresti,2002 (pg.87)) Then, the z value can be compared with the critical value obtained from the standard normal table. ------

[ngram] Ngram Statistics Package v0.71 released!

2004-06-18 Thread ted pedersen
ersion of NSP as provided by SourceForge, please note that the current version is now found in NSP69. Please let us know if you have any questions or comments! Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Yahoo!

[ngram] English GigaWord bigram counts via Ngram Statistics Package

2004-06-20 Thread ted pedersen
rd sequences that occur next to each other without regard to any ordering. -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Yahoo! Domains - Claim yours for only $14.70 http://us.click.yahoo.com/Z1wm

Re: [ngram] 4-gram and 5-gram statistical analysis

2004-07-06 Thread ted pedersen
reply. Suffice to say I think this is a very intresting issue. Other thoughts and comments on the above are most welcome (and if I've flubbed up anything *please* let me know!). Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor -

[ngram] installation question / PERL5LIB?

2004-09-21 Thread ted pedersen
rhaps, but I'm really wanting to try and do things in the most "standard" way possible in terms of installation, path searching, etc. I'll be looking at this, and if anyone has any insights into this issue I'd be interested to hear them. Cordially, Ted -- Ted Pederse

[ngram] more on PERL5LIB

2004-09-21 Thread ted pedersen
s is a slightly different way of using modules, in that we specify the name of the module from the command line... statistic.pl ll.pm myoutput myinput Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> $9.95 domai

[ngram] typo in phi.pm documentation!

2004-11-04 Thread ted pedersen
n22 | n2p ___ np1 np2npp So the phi coefficient takes the difference between the products of the diagonals, squares that, and then divides by the product of all the marginals. Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Grou

Re: [ngram] OT: newbie question regarding N-gram models

2004-10-17 Thread ted pedersen
quot;N-gram > model". > > Is it possible that someone on this list might take mercy on me and > either mention the kinds of applications that can be made from N-gram > models or point me to some literature that a literate non-linguist > might understand? If so, I would be gre

[ngram] a user tip on non-English alphabets

2004-11-11 Thread ted pedersen
A user writes: > And apropos to your "note on working with non-English alphabets": Some > students had problems with norwegian characters, even with the latest > version. The solution was to set their "LANG" environment properly (on > many systems it defaults to english). Could mabye be included

[ngram] Re: [Corpora-List] fisher's exact test

2004-11-11 Thread ted pedersen
sts (for now) require that n22 > n11. 1 2 | 3 3 10 | 13 --- 4 1216 Cordially, Ted PS NSP turns 4 years old on November 30. Big party in Duluth, you are all invited. :) -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor

[ngram] Re: [Corpora-List] fisher's exact test (fwd)

2004-11-12 Thread ted pedersen
Forwarded on behalf of Leonoor... -- Forwarded message -- Date: Fri, 12 Nov 2004 16:37:15 +0100 (CET) From: Beek L.J.van der <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: ted pedersen <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Re: [Corpora-List] fisher's e

Re: [ngram] dialogue bigrams

2004-11-12 Thread ted pedersen
is rising in priority even as we speak, so hopefully we'll have a fix for it shortly. (A fix has been in fact suggested by a user, and that is posted to the list, but we'll go back and double check things just to be sure). OK, sorry for all this. I'm sure some of this is already

Re: [ngram] dialogue bigrams

2004-11-12 Thread ted pedersen
nt email (from yesterday even!) about this situation, or you can go back into the archives and find: http://groups.yahoo.com/group/ngram/message/15 http://groups.yahoo.com/group/ngram/message/17 This problem is rising in priority even as we speak, so hopefully we'll have a fix f

[ngram] Re: dialogue bigrams

2004-11-30 Thread Ted Pedersen
)? > > And then for x2 and 11, are the higher-ranked bigrams more > strongly dependent (although we'll have to look up their score in the > table to know if their dependence is statistically significant)? > > > Thanks very much, > Kate > > > On Fri, 12 N

[ngram] new year's resolutions/ngram statistics package

2005-01-03 Thread ted pedersen
ing available by Jan 21. Also, if you happen to have code that uses NSP without a related publication, and that code is distributed, we want to know about you too. We'll have a separate section for software systems... Happy New Year! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yah

RE: [ngram] new year's resolutions/ngram statistics package

2005-01-12 Thread ted pedersen
Hi Marie, You have confused me (Ted Pedersen) with Jan Pedersen, now of Yahoo. Name ambiguity abounds in the internet, so don't feel badly. In fact, even search engines have a hard time with things like this. :) Enjoy the meeting, I'm sure it will be good, although I don't kno

Re: [ngram] bash: ALL-TESTS.sh: command not found

2005-02-20 Thread ted pedersen
operations you should carry out. I hope this helps. Let us know if it doesn't. I have limited experience with cygwin, but it should be equivalent to Unix and Linux in regards to paths and setting them and so forth (I think). Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpede

Re: [ngram] Re: bash: ALL-TESTS.sh: command not found

2005-02-21 Thread ted pedersen
evolve more nicely than cygwin. This is not to dismiss cygwin, like I say it's a great idea, but I think life will get easier if you are able to run on a Linux machine. Good luck, and let us know what happens! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Group

[ngram] other alphabets -> use locale;

2005-03-17 Thread ted pedersen
feedback and tips like these are particularly useful and appreciated. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/ngram/ <*> To unsubscribe from this group, send

[ngram] Extensions to NSP for log-likelihood ratio

2005-05-01 Thread ted pedersen
u have any questions about this. Sorry for not making this available sooner, Bridget did a nice job on this and it just fell through the cracks! Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo

[ngram] overflow in fisher's test

2005-07-20 Thread ted pedersen
roblems, etc. in order to make sure we have caught everything. And of course, please feel free to let us know of any other questions or concerns. Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yah

[ngram] Ngram Statistics Package version 0.73 released

2005-08-09 Thread ted pedersen
regarding the log - likelihood measure and Pearson's chi-squared test, and I also fixed a long standing bug in our Phi coefficient documentation. Enjoy, and let us know if you have any additional questions or concerns! Ted and Mahesh -- Ted Pedersen http://www.d.umn.edu/~tpe

[ngram] input format

2005-08-16 Thread ted pedersen
A user is wondering about how to manually create input files for statistic.pl ... > I have read your readme file which came with the package. It's well > written and quite understandable even for a person ignorant in the field > of Ngrams. > But unfortunately, although I quickly understood the ge

[ngram] group meeting friday sept 16, nsp reorganization

2005-09-15 Thread ted pedersen
ent to the ngram/NSP mailing list, for example) please feel free to contribute any thoughts you have to the mailing list. We will send a summary of what we plan as soon as final decisions are made. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Grou

[ngram] proposed re-design of Measures in Ngram Statistics Package

2005-09-20 Thread ted pedersen
The following is a description of our plan of attach for the first stage of the NSP redesign, that is to organize the measures in an object oriented hierarchical fashion. The description below is written by Saiyam Kohli. Your comments and questions are of course most welcome, especially at this ti

[ngram] NSP bibliography now under construction

2005-09-27 Thread ted pedersen
o the bibliography! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Fair play? Video games influencing politics. Click and talk back! http://us.click.yahoo.com/T8sf5C/tzNLAA/TtwFAA/d

[ngram] suggestion for nsp from user

2005-10-04 Thread ted pedersen
An NSP user has the following idea: -- I just thought it would be nice to have an option in NSP (specifically in statistic.pl) to filter bigrams based on their p-values, like we currently do by rank and score. Very often I need to find "significant" bigrams, and it will be nice if I c

Re: [ngram] Newbie question

2005-10-09 Thread ted pedersen
I'd > like to call NSP's log-likelihood code to do the calculation for me. > How hints on this for me? > > Thanks, > P -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Fair play? Video games influe

[ngram] Re: [Corpora-List] Looking for English word lists

2005-10-31 Thread ted pedersen
ase let us know if you have any questions. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Fair play? Video games influencing politics. Click and talk back! http://us.cl

[ngram] Re: [cpan #15862] Incorrect packaging practices

2005-11-16 Thread ted pedersen
n, etc.) but we made some choices that weren't so good, and clearly very non-standard. Rest assured we want to resolve these asap! We are hoping that the 0.75 release will be ready by mid-December. Cordially, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse ---

[ngram] Re: [cpan #15861] Installation procedure for documentation is unsafe

2005-11-16 Thread ted pedersen
e'll certainly make sure 0.75 is also more standard with respect to Makefile.PL issues as well. Cordially, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Get fast access to your favorite Yahoo! Grou

[ngram] two tailed fisher's exact test

2005-11-23 Thread ted pedersen
any case, it strikes me as a minor modification to make this work, so we'll plan on including this in the next release. Questions or comments are of course welcome! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~-->

[ngram] more on fisher's test, paper with details

2005-11-23 Thread ted pedersen
r's Problem). I found it helpful to re-read what I wrote years ago. :) Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> AIDS in India: A "lurking bomb." Click and help stop AIDS now. http://us.cli

Re: [ngram] question about huge-count.pl

2005-11-23 Thread ted pedersen
d above work with count.pl as well. It is fun to play with tokenization, and it's one of my favorite features about NSP. So, if you have any questions about any of the above, I'd be very happy to describe it in more detail. Meanwhile, have fun. :) Cordially, Ted -- Ted Pedersen http://www.

[ngram] NSPGate v0.02 released! (Gate Plug-In for NSP)

2005-12-15 Thread ted pedersen
d and Mahesh -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> AIDS in India: A "lurking bomb." Click and help stop AIDS now. http://us.click.yahoo.com/9QUssC/lzNLA

Re: [ngram] "Can you do this with NSP/Ngram" type question: Name Matching?

2005-12-18 Thread ted pedersen
es for each pair of names. As each > ratio is computed, I'll test it for a threshold and if the pair > exceeds a threshold, I'll push it to an array. Repeat for the 2nd > name in the list, 3rd name in the list, etc. > > Thanks in advance for any wisdom you might have on this

[ngram] Ngram Statistics Package version 0.91 released

2006-04-21 Thread ted pedersen
usion during installation. Here is a more detailed ChangeLog : http://search.cpan.org/src/TPEDERSE/Text-NSP-0.91/CHANGES Please give this new version a try, and let us know what you think! Enjoy, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse YAHOO! GROUPS

Re: [ngram] Re: Another "Can you do this with NSP/Ngram"

2006-05-01 Thread ted pedersen
979 5 6 25 > > human<>beings<>11 38.0403 3 5 3 > > ... > > ... > > > > This file also contains the expressions but they are sorted in the > > order of their significance and the first number after each bigram is > > the score given by the statistical measure used, which in this case is > > the Log Likelihood(ll) me

Re: [ngram] working with large corpora

2006-05-28 Thread ted pedersen
falls in changing that? Is this > something that will come in a future release? > > thanks very much, > ilya > > > > > > > > > > > > Yahoo! Groups Links > > > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse SPON

Re: [ngram] Re: Another "Can you do this with NSP/Ngram"

2006-05-28 Thread ted pedersen
grams in which some > (i.e. 2) tokens must match some regular expressions, but the others > are also allowed to match some others? I'm trying to get expressions > like "drag and drop" or "press-and-hold", or "create a new \w{4,}" > > Thanks, again! &

[ngram] Ngram Statistics Package version 0.95 released

2006-06-18 Thread ted pedersen
/TPEDERSE/Text-NSP-0.95/Docs/NSP-Class-diagram.pdf Please let us know if you have any questions or comments on this new release. Enjoy, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~--> Something is new

[ngram] Ngram Statistics Package version 0.97 released

2006-06-21 Thread ted pedersen
better performance than the last few releases (0.91, 0.93, and 0.95). Please let us know if you have any comments, suggestions, or questions! Cordially, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor

[ngram] more details on performance issues that led to NSP 0.97 release

2006-06-21 Thread ted pedersen
0.0005 Getopt::Long::BEGIN 0.00 0.000 0.000 4 0. 0. Exporter::Heavy::heavy_export 0.00 - -0.000 1- - Getopt::Long::ConfigDefaults 0.00 - -0.000 1- - Getopt::Long::Configure -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] /Testing directory in NSP 0.97

2006-06-22 Thread ted pedersen
are being added to the /t directory, and are run when you submit : make test So you can do both "make test" and the running of the /Testing scripts to test your installation. I will fix this permission issue with our next release. Enjoy, Ted -- Ted Pedersen http://www

[ngram] Ngram Statistics Package Version 1.01 released

2006-06-24 Thread ted pedersen
e are caused by very minor differences in the scores reported on this system versus our normal testing environement (which includes a number of 64 bit machines, so this puzzles us a little). Please do let us know of any other questions or concerns you may have! Cordially, Ted -- Ted Pedersen

[ngram] help requested - any Mac OS users out there?

2006-07-10 Thread ted pedersen
d any version of NSP on Mac OS, it would be great to know that (and if there was anything at all out of the ordinary). Thank you! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/ngr

[ngram] NSP 1.01, divide by zero error

2006-08-08 Thread ted pedersen
rtly. If you encounter this divide by zero error please let us know about it. We only encountered it when doing some rather unusual experiments, and have not seen this in our more normal uses of NSP, where count.pl is used to obtain frequency count information from corpora. Thanks, Ted -- Ted P

[ngram] another condition that leads to divide by zero in NSP 1.01

2006-08-08 Thread ted pedersen
est 27 for<>our<>1 9.3073 10 10 19 Please let us know if you have any questions, concerns, or additional information! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com

[ngram] updates to Ngram Statistics Package bibliography

2006-09-05 Thread ted pedersen
d, or if any work is included there that should not be. The basic criterion for inclusion is that the paper should mention having used NSP in some way, and include a citation either to an NSP URL or paper (most likely the CICLING-2003 paper by Banerjee and Pedersen). Thanks! Ted -- Ted Pedersen

Re: [ngram] chi square test

2006-09-12 Thread ted pedersen
pls help me I hope this helps. There are no right answers to the questions you pose, but I would encourage you to use NSP for whatever you doing here, I think you'll be able to do a lot of experiments very quickly, and that will help you figure out what makes the most sense for you in terms

[ngram] Ngram Statistics Package version 1.03 released!

2006-09-16 Thread ted pedersen
ou can find the bibliography at: http://www.d.umn.edu/~tpederse/nsp-bib/ Please let us know if you have any questions! Enjoy, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/n

[ngram] use of window size in count.pl

2006-09-22 Thread ted pedersen
Greetings all, I was corresponding with someone about the --window option in count.pl, and realized that this might be of general interest to NSP users, so I have modifed that note slightly and sent it here. When you are counting up the bigrams in a corpus, you can specify a --window size tha

Re: [ngram] Pb with tokenisation in nsp

2006-11-24 Thread ted pedersen
> Siham > Hi Siham, Check out sections 2 and 3 in the README. They describe how to set your own tokenization scheme. http://search.cpan.org/src/TPEDERSE/Text-NSP-1.03/README Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Pb with tokenisation in nsp

2006-11-30 Thread ted pedersen
files so as to > avoid editing every single release? ;o) > > BTW, thanks again for making this package available to the community, > this is great stuff. > > Regards, > Patrick > -- -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Major overhaul of TextNSP (fwd)

2006-11-30 Thread ted pedersen
FYI, please keep an eye on this rewrite!! Cordially, Ted -- Forwarded message -- Date: Thu, 30 Nov 2006 22:45:42 +0100 From: Björn Wilmsmann <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: Major overhaul of TextNSP -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Mr. Peder

Re: [ngram] Problems getting the NSP plugin running in GATE

2006-12-05 Thread ted pedersen
hread.run(Unknown Source) > > Suggestions? > > - Bob Futrelle > > Robert P. Futrelle >Associate Professor > Biological Knowledge Laboratory > College of Computer and Information Science > Northeastern University MS WVH202 > 360 Huntington Ave. > Boston, MA

[ngram] Problem with CPANPLUS 0.076 misidentifying versions after installing Text::NSP 1.03 (fwd)

2006-12-23 Thread ted pedersen
-- -- Ted Pedersen http://www.d.umn.edu/~tpederse -- Forwarded message -- Date: Sat, 23 Dec 2006 10:34:03 -0800 From: Jonathan Leffler <[EMAIL PROTECTED]> To: [EMAIL PROTECTED], Bugs in CPANPLUS via RT <[EMAIL PROTECTED]>, [EMAIL PROTECTED] Subject: Problem w

Re: [ngram] installing NSP on Windows

2007-04-26 Thread Ted Pedersen
r.pm not found) at D:/ActivePerl/perl/lib/ExtUtils/MM_Unix.pm > li > ne 1676. > Could not open 'lib/Text/NSP.pm': No such file or directory at > D:/ActivePerl/per > l/lib/ExtUtils/MM_Unix.pm line 2669. > > > if anybody has any idea how it can be installed on Windows instead of linux? > > regards > Madiha > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] NSP distributed under General Public License (GPL)

2007-04-26 Thread Ted Pedersen
do that. We would be happy to see NSP modified and redistributed, and we hope the GPL allows you to do that in a way that you are happy with. If you feel that this isn't the case, do let us know what your concerns are and how you think they might best be resolved. Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: Re-write of Text-NSP

2007-05-15 Thread Ted Pedersen
A PS of sorts... There are various nice discussions of Perl profiling around, I mention one below. I think it's actually perl -d rather than -D...anyway, details here: http://www.perl.com/pub/a/2004/06/25/profiling.html Also, here's an example of how we used Perl profiling to guide us in earlie

Re: [ngram] Re-write of Text-NSP

2007-05-15 Thread Ted Pedersen
gt; http://www.topicalizer.com/files/TextNSP/Text-NSP-1.10.tar.gz > > Any input would be appreciated. > > -- > Best regards, > Bjoern Wilmsmann > > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re-write of Text-NSP

2007-05-15 Thread Ted Pedersen
motivation for this change? Thanks, Ted On 5/15/07, Ted Pedersen <[EMAIL PROTECTED]> wrote: > Greetings again, > > I was able to run all the way through the Testing/ALL-TESTS.sh script, > and I posted the output from that to : > > http://www.d.umn.edu/~tpederse/nsp110-all

Re: [ngram] Re: Re-write of Text-NSP (RESULTS of Profiling with -d:DProf)

2007-05-15 Thread Ted Pedersen
00 0.000 0.000 3 0. 0. Exporter::Heavy::heavy_export On 5/15/07, Ted Pedersen <[EMAIL PROTECTED]> wrote: > > > > > > > > A PS of sorts... > > There are various nice discussions of Perl profiling around, I mention > one below. I think it's a

[ngram] what the GPL means...

2007-05-16 Thread Ted Pedersen
om you. Finally, if I have messed up any of the above, please don't hesitate to correct or comment on it. I think it's best that we discuss these sorts of issues from time to time, just to make sure everyone is thinking of them. Most of us aren't lawyers, and this sort of stuff gets pr

[ngram] what the GPL means...

2007-05-17 Thread Ted Pedersen
ant to do things in such a way that doesn't significantly impede the use of NSP 2) yes 3) GPL 4) $0 (USD) :) Thanks, Ted On 5/17/07, Richard Jelinek <[EMAIL PROTECTED]> wrote: > > Greetings Ted, > > On Wed, May 16, 2007 at 10:50:59AM -0500, Ted Pedersen wrote: > [..

Re: [ngram] use n-gram in neural network

2007-10-17 Thread Ted Pedersen
s in proteins. > After i want to use the result in neural network. > > Who knows how to use the results of count.pl how input datas of neural > network. > > i need help ........ > > Thanks .. William > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] New user of Ngram Statistics Package

2007-10-26 Thread Ted Pedersen
ll comments above are helpful. Cordially, Ted > > -- Thank you, > Mary D. Taffet > Ph.D. Candidate/Syracuse University, School of Information Studies > Scientist/TextWise LLC > Syracuse, NY > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: use n-gram in neural network

2007-10-28 Thread Ted Pedersen
> Gly<>Ala<>76 287 470 > Lys<>Lys<>73 479 479 > > I know what it means each row and column of this file . but how i can use this datas for a neural network ? > Is necessary wich modifications ? What is way ? > > Cordially, > > W

Re: [ngram] Re: New user of Ngram Statistics Package

2007-10-29 Thread Ted Pedersen
nctuation > > for you... > > > > @stop.mode=OR > > /\b[,.?!'"`]+\b/ > > /\b\d+\b/ > > > > Each line in this file after the mode line is a Perl regular expression > > that shows what you would like to remove. > > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] installing NSP on Windows, other non-Linux systems

2007-12-11 Thread Ted Pedersen
ially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: Problem with a token

2008-02-13 Thread Ted Pedersen
ginning of the Perl > > scripts fixed the problems I had for French. You can have a look at > this > > for more information : > > > > http://tech.groups.yahoo.com/group/ngram/message/159 > > > > Hope this helps... > > > > Regards, > > Patrick > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: Problem with a token

2008-02-14 Thread Ted Pedersen
; > > > > Mercè > > > > > > > > > > > Mercè, > > > > > > > > I have not checked the latest version of NSP to see if count.pl > and the > > > > other files contain "use locale;" as I suggested some time ago. The > > > > simple inclusion of such a statement at the beginning of the Perl > > > > scripts fixed the problems I had for French. You can have a look at > > > this > > > > for more information : > > > > > > > > http://tech.groups.yahoo.com/group/ngram/message/159 > > > > > > > > Hope this helps... > > > > > > > > Regards, > > > > Patrick > > > > > > > > > > > > > -- > > Ted Pedersen > > http://www.d.umn.edu/~tpederse > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] plans for version 1.05

2008-02-14 Thread Ted Pedersen
Greetings all, I'm in the process of collecting up the various bug reports that we've gotten since version 1.03 was released in September 2006, and I'll resolve those in 1.05. Here's what I have so far... 1) Incorporate "use locale" throughout package (suggested by Patrick Drouin long ago)This w

Re: [ngram] Re: plans for version 1.05

2008-02-14 Thread Ted Pedersen
f you recall something else, or these is some feature or change you > > are interested in seeing, please let me know. As you can tell NSP > > releases have slowed considerably in recent years, so this is likely > > to be the only release for some time to come, so please do let me know > > asap if there are other issues. Comments and suggestions are of course > > welcome. > > > > Cordially, > > Ted > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: plans for version 1.05

2008-02-14 Thread Ted Pedersen
TED]> wrote: > > On Thu, Feb 14, 2008 at 03:51:40PM -, Ted Pedersen wrote: > > 1) Incorporate "use locale" throughout package (suggested by Patrick > > Drouin long ago)This will make for more convenient handling of > > non-English text. > > Wrong idea,

Re: [ngram] Re: plans for version 1.05

2008-02-15 Thread Ted Pedersen
-score to 3 dimensions or more then we are certainly interested in discussing that. Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: plans for version 1.05

2008-02-15 Thread Ted Pedersen
c.perl.org/open.html > > Setting up default IN and OUT layers this way could save a lot of > typing/transforming in the migration process. > > > -- > Kind regards, > > Dipl.-Inf. Richard Jelinek > > - The PetaMem Group - Prague/Nuremberg - www.petamem.com - > -= 2007-09-25: 49235653 Mind Units =- > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Ngram Statistics Package version 1.07 released

2008-03-24 Thread Ted Pedersen
re, some in fact still use 5.8.3, etc. which would have been a problem previously. So, if you are a Windows user or a pre 5.8.5 user of Perl, you'll certainly want to upgrade to 1.07! Please let us know if any questions arise! Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Windows Installation

2008-05-02 Thread Ted Pedersen
--- In ngram@yahoogroups.com, "Sayed Magdy" <[EMAIL PROTECTED]> wrote: > > hello all > i have just download Text-NSP-1.09 package and i cannot install it > i have setup active perl,but i donot know how to install the package. > i am using windos XP > please any one help me in this problem > thank

Re: [ngram] Re: need help please (urgent)

2008-05-03 Thread Ted Pedersen
gt; > thanks in advance > > > > sayed magdy > > > > > > > > > > > __ > ___ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: NSP-Class-diagram.pdf link kaput

2008-05-30 Thread Ted Pedersen
erarchy? > > And points to: > http://search.cpan.org/perldoc?NSP-Class-diagram.pdf > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Not Ngram

2008-06-21 Thread Ted Pedersen
can surely get frequency counts from count.pl. Good luck! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Ngram Statistics Package Bibliography

2008-06-22 Thread Ted Pedersen
email with the information. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Using n-gram package with java

2008-08-22 Thread Ted Pedersen
gt; how I will proceed > thanks > arezki > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] NSP with java

2008-08-30 Thread Ted Pedersen
séparat file with their decresing scores. > > Best Regards > Arezki > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] problem with statistic.pl

2008-09-09 Thread Ted Pedersen
I have this code with java : > Process p = Runtime.getRuntime (). Exec (perl g:/Text-NSP- > 1.09/bin/statistic.pl mi.pm output_pmi.txt output.txt); > What is the problem? > > The second problem concerns accented letters (é,è,ç,...) how not > conceder them as end of token. b

[ngram] bug when using pmi as fully specified module name

2008-10-07 Thread Ted Pedersen
specified pmi name, and there is an easy work around should that occur. However, this is certainly something we will fix. Please let me know of any questions or concerns Thanks! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: Possible count.pl bug

2008-10-16 Thread Ted Pedersen
29 124 > Obama<>.<>5 13 132 > the<>,<>5 15 124 > .<>The<>5 129 8 > in<>.<>5 7 132 > > Note all those stop words in there. I'd like to get rid of them and I think > that's what that -stop stop.txt should do, no? > > $ egrep '/said/|/the/' stop.txt > /said/ > /the/ > > Is this a bug or am I doing something wrong? > > Thanks, > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: Possible count.pl bug

2008-10-17 Thread Ted Pedersen
k<>2 > houses<>1 > > > For some reason using -stop messed things up. Nothing funky in my stop.txt > (attached) I believe. > > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > - Original Message >> From: Ted P

Re: [ngram] Web interface to play with NSP

2008-11-05 Thread Ted Pedersen
u will find a simple interface to play with NSP. > it could be a tool for NSP beginners, let me know what do you think > about this interface. > > Arnaud. > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: 4-gram and 5-gram statistical analysis

2008-11-19 Thread Ted Pedersen
able to >> run exactly as before. So unless you want to use a model outside of >> the independence model, you should not notice any change. If you >> would like to use a different model, this can be specified in a file >> similar to how the ordering of the marginal values are specified. >> >> For example, with the independence model for trigrams, the expected >> values would be ordered as such in a >> >> 0 >> 1 >> 2 >> >> For the "p(word1 word2) * p(word3)" model: >> >> 0 1 >> 2 >> >> and so on. Well, I hope this makes sense and if you have any questions >> or I have not explained something clearly, please let me know. >> >> Thanks! >> >> Bridget >> >> >> >> On Tue, 6 Jul 2004, theonomo wrote: >> >> > Hello all. >> > >> > I would like to calculate statistical measures of association for 4- >> > grams and 5-grams. Is this possible? Does this even make sense? >> > >> > Thanks, >> > >> > Jon >> > >> > >> > >> > >> > >> > Yahoo! Groups Links >> > >> > >> > >> > >> > >> > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] counting longer n-grams with --set_freq_comb

2008-11-26 Thread Ted Pedersen
70.1%0+0k 0+19144io 0pf+0w %time count.pl --ngram 3 --set_freq_combo mycomb1.txt output1 wrnpc12.txt 34.518u 0.284s 0:52.53 66.2%0+0k 0+30504io 0pf+0w %time count.pl --ngram 3 output wrnpc12.txt 48.435u 0.476s 1:13.20 66.8%0+0k 0+37096io 0pf+0w BTW, all of the above is on War and Peace, which is a long novel but a fairly small corpus, so your savings will be more dramatic on larger corpora %wc wrnpc12.txt 67403 564514 3285165 wrnpc12.txt In any case, if you are using longer ngrams, I think it's very likely you will find set_freq_comb very handy. Please don't hesitate to let us know if you have any questions on how to use or interpret this. Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Need a phrase list

2008-12-15 Thread Ted Pedersen
ngrams for contemporary English of popular usage. > > It doesnt have to be clean. I can clean it and give it back to you. > > Excel file would be great. > > Thank you in advance. > > Abhijit, India > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] how to find most frequent words that start sentences with NSP

2008-12-18 Thread Ted Pedersen
lt;>6542 TOKYO<>6445 AUSTRALIAN<>6433 STOCKS<>6363 IRAN<>6359 FIVE<>6304 GERMANY<>6271 IRAQ<>6124 WHAT<>6076 FROM<>6056 AUSTRALIA<>5933 Q<>5923 PAKISTAN<>5919 GOLD<>5761 PHILIPPINE<>5609 HUNDREDS<>5590 ART<>5463 SRI<>5431 EDITORS<>5350 SINGAPORE<>5338 YOU<>5331 WEATHER<>5325 TODAY<>5305 TOP<>5283 ALL<>5212 FOREIGN<>5154 WE<>5124 BUT<>5104 MALAYSIA<>5062 EVEN<>5020 DOW<>4974 WASHINGTON<>4938 ITALIAN<>4875 JUST<>4854 JOHN<>4852 The real power here is in the --token option, which can do a lot of interesting things like this... Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] How not to filter stopwords inside of tri-grams

2009-01-29 Thread Ted Pedersen
ariable length (the operand) is preceded by an opcode." > > I would like to get as a result list "data of variable" and not "Data > variable length". > > Best whishes, > Mercè > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Fishers exact test as a trigram mesure

2009-02-01 Thread Ted Pedersen
trigram mesure. > > Thanks, > Mercè > > -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] No ngram over sentence

2009-02-05 Thread Ted Pedersen
ey > Honey<>Bunny > A<>women > women<>snorts > > so i want that the bigram Bunny<>A is not created (and don't gets counted) > > Is there a way to achieve this? > > I hope my question is understandable and has not been ask bevor. > > If i mis

  1   2   >