[ngram] new year's resolutions/ngram statistics package

2005-01-03 Thread ted pedersen
, if you happen to have code that uses NSP without a related publication, and that code is distributed, we want to know about you too. We'll have a separate section for software systems... Happy New Year! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links * To visit your

Re: [ngram] Re: bash: ALL-TESTS.sh: command not found

2005-02-21 Thread ted pedersen
cygwin, like I say it's a great idea, but I think life will get easier if you are able to run on a Linux machine. Good luck, and let us know what happens! Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links * To visit your group on the web, go to: http

[ngram] Extensions to NSP for log-likelihood ratio

2005-05-01 Thread ted pedersen
questions about this. Sorry for not making this available sooner, Bridget did a nice job on this and it just fell through the cracks! Enjoy, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links * To visit your group on the web, go to: http://groups.yahoo.com/group/ngram

[ngram] overflow in fisher's test

2005-07-20 Thread ted pedersen
, and some new enhancements and features. As we get closer to starting that work, I'll be posting our list of reported problems, etc. in order to make sure we have caught everything. And of course, please feel free to let us know of any other questions or concerns. Ted -- Ted Pedersen http

[ngram] input format

2005-08-16 Thread ted pedersen
A user is wondering about how to manually create input files for statistic.pl ... I have read your readme file which came with the package. It's well written and quite understandable even for a person ignorant in the field of Ngrams. But unfortunately, although I quickly understood the

[ngram] proposed re-design of Measures in Ngram Statistics Package

2005-09-20 Thread ted pedersen
The following is a description of our plan of attach for the first stage of the NSP redesign, that is to organize the measures in an object oriented hierarchical fashion. The description below is written by Saiyam Kohli. Your comments and questions are of course most welcome, especially at this

[ngram] NSP bibliography now under construction

2005-09-27 Thread ted pedersen
to the bibliography! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~-- Fair play? Video games influencing politics. Click and talk back! http://us.click.yahoo.com/T8sf5C/tzNLAA/TtwFAA/dpFolB/TM

[ngram] suggestion for nsp from user

2005-10-04 Thread ted pedersen
An NSP user has the following idea: -- I just thought it would be nice to have an option in NSP (specifically in statistic.pl) to filter bigrams based on their p-values, like we currently do by rank and score. Very often I need to find significant bigrams, and it will be nice if I

[ngram] Re: [cpan #15862] Incorrect packaging practices

2005-11-16 Thread ted pedersen
to resolve these asap! We are hoping that the 0.75 release will be ready by mid-December. Cordially, Ted and Saiyam -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~-- Get fast access to your favorite Yahoo! Groups. Make Yahoo

Re: [ngram] Can you do this with NSP/Ngram type question: Name Matching?

2005-12-18 Thread ted pedersen
you might have on this question :-) Yahoo! Groups Links -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Sponsor ~-- Most low income homes are not online. Make a difference this holiday season! http://us.click.yahoo.com

Re: [ngram] Re: Another Can you do this with NSP/Ngram

2006-05-28 Thread ted pedersen
expressions, but the others are also allowed to match some others? I'm trying to get expressions like drag and drop or press-and-hold, or create a new \w{4,} Thanks, again! Leonardo F. Fontenelle Yahoo! Groups Links -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] more details on performance issues that led to NSP 0.97 release

2006-06-21 Thread ted pedersen
0.0005 Getopt::Long::BEGIN 0.00 0.000 0.000 4 0. 0. Exporter::Heavy::heavy_export 0.00 - -0.000 1- - Getopt::Long::ConfigDefaults 0.00 - -0.000 1- - Getopt::Long::Configure -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] use of window size in count.pl

2006-09-22 Thread ted pedersen
Greetings all, I was corresponding with someone about the --window option in count.pl, and realized that this might be of general interest to NSP users, so I have modifed that note slightly and sent it here. When you are counting up the bigrams in a corpus, you can specify a --window size

Re: [ngram] Pb with tokenisation in nsp

2006-11-24 Thread ted pedersen
sections 2 and 3 in the README. They describe how to set your own tokenization scheme. http://search.cpan.org/src/TPEDERSE/Text-NSP-1.03/README Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Problem with CPANPLUS 0.076 misidentifying versions after installing Text::NSP 1.03 (fwd)

2006-12-23 Thread ted pedersen
-- -- Ted Pedersen http://www.d.umn.edu/~tpederse -- Forwarded message -- Date: Sat, 23 Dec 2006 10:34:03 -0800 From: Jonathan Leffler [EMAIL PROTECTED] To: [EMAIL PROTECTED], Bugs in CPANPLUS via RT [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Problem with CPANPLUS 0.076

Re: [ngram] Re: plans for version 1.05

2008-02-15 Thread Ted Pedersen
regards, Dipl.-Inf. Richard Jelinek - The PetaMem Group - Prague/Nuremberg - www.petamem.com - -= 2007-09-25: 49235653 Mind Units =- -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] providing commandline options

2009-02-06 Thread Ted Pedersen
On Fri, Feb 6, 2009 at 1:31 AM, reshmijose reshmij...@yahoo.com wrote: can you tell me how command line arguments like 'count.pl output.txt input.txt' are defined? Are they defined within the source code? What should i do if i want to run the program count.pl separately? -- Ted Pedersen

Re: [ngram] No ngram over sentence

2009-02-06 Thread Ted Pedersen
? If it is a running text, how does it identify the end of the sentence? Thanks Jayaram --- On Thu, 2/5/09, Ted Pedersen duluth...@gmail.com wrote: From: Ted Pedersen duluth...@gmail.com Subject: Re: [ngram] No ngram over sentence To: ngram@yahoogroups.com Date: Thursday, February 5, 2009, 9

Re: [ngram] search in file generated by statistic.pl

2009-03-25 Thread Ted Pedersen
this can be quite fun. You could also use egrep to specify regular expression patterns to search for (rather than just strings), but I find grep to be a nice starting point. I hope this is helpful! Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] output files

2009-04-01 Thread Ted Pedersen
should process those with separate runs of count.pl. I hope this helps! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: the NSP trigram calculations don't match mine??

2009-06-10 Thread Ted Pedersen
Hi Gunn, You might be hitting a peculiar bug we notice late last year (which still hasn't been fixed). http://tech.groups.yahoo.com/group/ngram/message/240 If you run using just pmi in the command line, do your results agree with your Lisp code? If there is still disagreement, let's run some

Re: [ngram] Re: the NSP trigram calculations don't match mine??

2009-06-11 Thread Ted Pedersen
inputfile ) produces the following line atdeter1 6.4127 262744 7073841 9391062 5872364 1234064 647295 1064083 Best, Gunn -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: the NSP trigram calculations don't match mine??

2009-06-14 Thread Ted Pedersen
decision to use base-10 logarithms. Note that all versions should still give the same ranking of candidates, so that's a robust test case. Cheers, Stefan (Evert) -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Ngrams without line break

2009-07-01 Thread Ted Pedersen
that are at the begining of the next sentence. That is, ngram without containing line breaks. Best wishes, Mercè -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: Ngrams without line break

2009-07-01 Thread Ted Pedersen
of line markers? Thanks a lot! Mercè --- In ngram@yahoogroups.com, Ted Pedersen duluth...@... wrote: Greetings Merce, To make sure I understand correctly, it sounds like you *only* want to see those ngrams that contain a line break. For example, if you run count.pl as follows on your test file

[ngram] Re: significant collocations

2009-07-15 Thread Ted Pedersen
--- In ngram@yahoogroups.com, Amada Eliseo amadaeli...@... wrote: Hello to all, I appreciate your work. Please. Can someone help me to identify significant collocations in some text. I would like to use text-nsp, but I don't know how to do it. Thank you. If you are just starting

[ngram] NSP stress testing, windowing memory usage

2009-10-22 Thread Ted Pedersen
step is to do this with a larger data file and see if the above rules of thumb continue to hold... More soon, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Fwd: nsp/trigram signficance

2009-12-05 Thread Ted Pedersen
-- Forwarded message -- From: r...@imsc.res.in r...@imsc.res.in Date: Dec 5, 2009 11:59 AM Subject: Re: nsp/trigram signficance To: Ted Pedersen duluth...@gmail.com May I request you to forward this to list ? I don't use Yahoo and it seems posting to the mailing list

[ngram] possible trouble with hugecount.pl in Text-NSP-1.17?

2010-05-02 Thread Ted Pedersen
11583688 xie200108.txt 8867 1728337 10179257 xie200109.txt 9437 1793786 10614699 xie200110.txt 1740339343 1995533 xie200111.txt 679007 132786005 783905863 total -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: Extending huge-count to 3 grams.

2011-01-13 Thread Ted Pedersen
/%7Ewestburylab/ University of Alberta 780-492-5843 =[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=} -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] ngrams with hyphen

2011-04-20 Thread Ted Pedersen
- clearback283 1115 733 backsignal157 380 9176 - clearforward632 1115 877 forwardsignal493 1547 9176 Thanks a lot, Mercè -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] Re: ngrams with hyphen

2011-04-23 Thread Ted Pedersen
interest? Thanks for your help, Mercè --- In ngram@yahoogroups.com, Ted Pedersen tpederse@... wrote: Hi Merce, Yes, indeed, you can do as you describe. This gets into some important details about regular expressions that I'm happy to have a chance to mention. In the default stoplist

[ngram] Ngram Statistics Package at ACL 2011 in Portland

2011-05-31 Thread Ted Pedersen
, Portland, Oregon. http://www.d.umn.edu/~tpederse/Pubs/pedersen-disco2011.pdf So, if you are at ACL 2011 please consider attending these events, or catch up with us some other time. Hoping to see you in Portland, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] MI for a 4-gram

2011-06-07 Thread Ted Pedersen
for a 4-gram made of of words A B C and D? Mi(ABCD) = log(P(ABCD) / (P (A) x P (B) x P (C) x P (D))) If not, what is a better way? Why is this bad? Thanks for your help, Cyrus -- Ted Pedersen http://www.d.umn.edu/~tpederse Yahoo! Groups Links

[ngram] demo at acl mwe workshop, talk today at disco

2011-06-24 Thread Ted Pedersen
- will attend the DisCo workshop and give a talk showing how I used NSP to participate in the shared task on identifying semantic compositionality. http://disco2011.fzi.de/ Cordially, Ted --- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] NSP home page in Romanian!

2011-07-26 Thread Ted Pedersen
/seremina/edu/nsp-rom.html Enjoy! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

Re: [ngram] formatting + punctuation removal

2011-08-17 Thread Ted Pedersen
cellularphones96 214 384 mobile phones cellular phones 2. I need to remove punctuation . and , I've tried within my stopword list, but I don't have the tags quite right. How should I enter into my stop file? Thanks! Patrick Yahoo! Groups Links -- Ted

Re: [ngram] Re: formatting + punctuation removal

2011-08-17 Thread Ted Pedersen
Hi Patrick, NSP makes no real distinction between punctuation and words, so if you do not do anything with tokenization via --token or --nontoken or preprocessing, the punctuation marks will be treated just like words and will affect your results. --token and --nontoken essentially remove them

Re: [ngram] Fwd: -1.1000(sic!) as result from rank.pl

2013-02-06 Thread Ted Pedersen
PhD Student in Computational Linguistics University of Gothenburg, Sweden -- *Från:* duluth...@gmail.com [duluth...@gmail.com] för Ted Pedersen [ tpede...@d.umn.edu] *Skickat:* den 6 februari 2013 03:35 *Till:* ngram@yahoogroups.com *Cc:* Karin Cavallin *Ämne

[ngram] bug in rank.pl v (0.03) in Text::NSP 1.25

2013-02-14 Thread Ted Pedersen
A user reports a bug in rank.pl. This seems to occur when dealing with smaller files, for example... marimba(49): more x firstbigram1 4.000 1 1 secondbigram2 3.000 2 2 extrabigram13 2.000 3 3 thirdbigram4 1.000 4 4 marimba(50): more y secondbigram1 4.000 2 2 extrabigram22 3.000 4 4 firstbigram3

[ngram] Re: Fwd: ll4 giving me trouble with 4-grams

2013-03-27 Thread Ted Pedersen
, Ted Pedersen tpederse@... wrote: Merce, I got an email error when responding directly to your yahoo.es account. Could you follow up with another email address or use the group...? Thanks, Ted -- Forwarded message -- From: Ted Pedersen tpederse@... Date: Wed, Mar 27, 2013

[ngram] the (apparent) demise of search.cpan.org

2014-07-18 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
/perl/beginners/145nsxqz2w/cpan-unavailable When we started using the search site in about 2002 it was pretty great. The good news is that https://metacpan.org is even better, so this is a positive change. Thanks, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Fwd: Ngrams and Text Similarity deployed as SOAP web services

2014-09-22 Thread Ted Pedersen duluth...@gmail.com [ngram]
Very nice news for users of NSP and Text::Similarity! Please support these resources by giving them a try and letting others know about them too. Cordially, Ted -- Forwarded message -- From: Marta Villegas marta.ville...@upf.edu Date: Mon, Sep 22, 2014 at 4:13 AM Subject: Ngrams

Re: [ngram] accented character

2015-01-19 Thread Ted Pedersen duluth...@gmail.com [ngram]
Hi Arnaud, There is nothing new for more recent versions - the same solutions proposed for earlier versions are still relevant (and still the best available options). You can find some discussion of those here (via the NSP mailing list):

[ngram] Ngram Statistics Package version 1.29 released (minor bug fix release)

2015-10-17 Thread Ted Pedersen duluth...@gmail.com [ngram]
?node_id=1077762 You can download the most current version of NSP from CPAN or Sourceforge by following the links here : http://ngram.sourceforge.net Please let us know if any questions arise. Cordially, Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse

[ngram] Re: Ngram Statistics Package version 1.29 released (minor bug fix release)

2015-10-04 Thread Ted Pedersen duluth...@gmail.com [ngram]
links on this page. http://ngram.sourceforge.net Enjoy, Ted On Sat, Oct 3, 2015 at 5:30 PM, Ted Pedersen <duluth...@gmail.com> wrote: > We are pleased to announce a new release of Text::NSP, the Ngram > Statistics Package. This is a very minor bug fix release, but might be > som

Re: [ngram] simple test using chi-squared

2015-11-23 Thread Ted Pedersen duluth...@gmail.com [ngram]
CHI is a parent class, and not intended to be used as a measure. Rather, the measures x2, pmi, and tscore are the end user measures which you can run (and they all access that CHI class). So, if your goal is to run the chi squared test, you can do that with the x2 measure, as in: statistic.pl x2

Re: [ngram] Ignoring regex with no delimiters

2016-05-12 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
The regex in token should look like this : /\S+/ I think not having the / / is causing the delimeter errors... On Thu, May 12, 2016 at 2:11 AM, amir.jad...@yahoo.com [ngram] < ngram@yahoogroups.com> wrote: > > > I'm running count.pl on a set of unicode documents. Create a new > file('token')

Re: [ngram] How to recognize informative n-grams in a corpus?

2016-05-10 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
The Ngram Statistics Package is mostly intended to help you find the most frequent ngrams in a corpus, or the most strongly associated ngrams in a corpus. It doesn't necessarily directly give you informativeness, although you can certainly come up with ways to use frequency and measures of

Re: [ngram] count.pl for unicode documents

2016-05-10 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Tokenization and the --token option are described here : http://search.cpan.org/~tpederse/Text-NSP/doc/README.pod#2._Tokens On Tue, May 10, 2016 at 8:14 AM, amir.jad...@yahoo.com [ngram] < ngram@yahoogroups.com> wrote: > > [Attachment(s) <#m_-6964475169159201585_TopText> from >

Re: [ngram] Upload files

2017-01-31 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Text::NSP has a command line interface that allows you to provide a file or a folder/directory for input. There are some simple examples shown below that take a single file as input. That might be a good place to start, just to make sure everything is working as expected.

Re: [ngram] Upload files

2017-04-01 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
I think this mail was somehow delayed, but I hope this response is still useful. NSP has a command line interface. In general you specify the output file first, and the input file second. So if you want to write the output of count.pl to a file called myoutput.txt, and if your input text is

[ngram] Re: PMI Query

2017-05-14 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Hi Julio, Thanks for your question. In NSP we are always counting ngrams, so the order of the words making up the ngram is considered. When we are counting bigrams (the default case for NSP) word1 is always the first word in a bigram, and word2 is always the second word. I think in other

Re: [ngram] Re: Using huge-count.pl with lots of files

2018-04-17 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
reat don't as three tokens (don ' t) or one (don't). --stop lets you exclude words from being counted, but there isn't anything that lets you ignore case. On Tue, Apr 17, 2018 at 8:51 AM, Ted Pedersen <tpede...@d.umn.edu> wrote: > Hi Catherine, > > Here are a few answers to your qu

Re: [ngram] Re: Using huge-count.pl with lots of files

2018-04-17 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
issues to address...and of course if you try something and it does or doesn't work I'm very interested in hearing about that... Cordially, Ted On Tue, Apr 17, 2018 at 7:33 AM, Ted Pedersen <tpede...@d.umn.edu> wrote: > The good news is that our documentation is more reliable than

Re: [ngram] Using huge-count.pl with lots of files

2018-04-15 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
I guess my first thought would be to see if there is a simple way to compute the input you are providing to huge count into fewer files. If you have a lot of files that start with the letter 'a', for example, you could concatentate them all together via a (Linux) command like cat a* >

Re: [ngram] Re: Using huge-count.pl with lots of files

2018-04-15 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Hi Catherine, Just to make sure I'm understanding what you'd like to do, could you send the command you are trying to run, and some idea of the number of files you'd like to process? Thanks! Ted On Sun, Apr 15, 2018 at 6:01 PM, catherine.dejage...@gmail.com [ngram] < ngram@yahoogroups.com>

Re: [ngram] Re: Using huge-count.pl with lots of files

2018-04-16 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Let me go back and revisit this again, I seem to have confused myself! More soon, Ted On Mon, Apr 16, 2018 at 12:55 PM, catherine.dejage...@gmail.com [ngram] < ngram@yahoogroups.com> wrote: > > > Did I misread the documentation then? > > "huge-count.pl doesn't consider bigrams at file

[ngram] Re: Some questions about Text-NSP

2018-12-06 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
this makes some sense, but please feel free to follow up if it doesn't or if you think I may be misinterpreting something here. Cordially, Ted --- Ted Pedersen http://www.d.umn.edu/~tpederse On Sun, Nov 25, 2018 at 6:28 PM Ted Pedersen wrote: > > Thanks for these questions - all of the d

[ngram] Re: Some questions about Text-NSP

2018-11-25 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
know if you have any other questions. Cordially, Ted --- Ted Pedersen http://www.d.umn.edu/~tpederse On Sun, Nov 25, 2018 at 4:13 AM BLK Serene wrote: > > Hi, I have some questions about the association measures implemented in > Text-NSP: > > The Poisson-Sterlin

[ngram] Re: Some questions about Text-NSP

2018-11-25 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
sely. I'm not sure about they keyword extraction case, but if you have an example I'd be happy to think a little further about that as well! More soon, Ted --- Ted Pedersen http://www.d.umn.edu/~tpederseOn Sun, Nov 25, 2018 at 11:32 AM BLK Serene wrote: > > Thanks for the clarification! >

[ngram] yahoo groups going away - ngram - Ngram Statistic Package

2019-10-21 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
interest in NSP over the years, and please do stay in touch. Cordially, Ted --- Ted Pedersen http://www.d.umn.edu/~tpederse