, if you happen to have code that uses NSP without a related
publication, and that code is distributed, we want to know about you too.
We'll have a separate section for software systems...
Happy New Year!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
* To visit your
cygwin, like I say it's a great idea, but I think life will get
easier if you are able to run on a Linux machine.
Good luck, and let us know what happens!
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
* To visit your group on the web, go to:
http
questions about this. Sorry for not
making this available sooner, Bridget did a nice job on this and it just
fell through the cracks!
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
* To visit your group on the web, go to:
http://groups.yahoo.com/group/ngram
, and some new
enhancements and features. As we get closer to starting that work, I'll be
posting our list of reported problems, etc. in order to make sure we
have caught everything. And of course, please feel free to let us know
of any other questions or concerns.
Ted
--
Ted Pedersen
http
A user is wondering about how to manually create input files for
statistic.pl ...
I have read your readme file which came with the package. It's well
written and quite understandable even for a person ignorant in the field
of Ngrams.
But unfortunately, although I quickly understood the
The following is a description of our plan of attach for the first stage
of the NSP redesign, that is to organize the measures in an object
oriented hierarchical fashion. The description below is written by Saiyam
Kohli. Your comments and questions are of course most welcome, especially
at this
to the bibliography!
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~--
Fair play? Video games influencing politics. Click and talk back!
http://us.click.yahoo.com/T8sf5C/tzNLAA/TtwFAA/dpFolB/TM
An NSP user has the following idea:
--
I just thought it would be nice to have an option in NSP (specifically in
statistic.pl) to filter bigrams based on their p-values, like we currently
do by rank and score. Very often I need to find significant bigrams, and
it will be nice if I
to resolve these asap! We are hoping that the 0.75
release will be ready by mid-December.
Cordially,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~--
Get fast access to your favorite Yahoo! Groups. Make Yahoo
you might have on this question :-)
Yahoo! Groups Links
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~--
Most low income homes are not online. Make a difference this holiday season!
http://us.click.yahoo.com
expressions, but the others
are also allowed to match some others? I'm trying to get expressions
like drag and drop or press-and-hold, or create a new \w{4,}
Thanks, again!
Leonardo F. Fontenelle
Yahoo! Groups Links
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
0.0005 Getopt::Long::BEGIN
0.00 0.000 0.000 4 0. 0. Exporter::Heavy::heavy_export
0.00 - -0.000 1- - Getopt::Long::ConfigDefaults
0.00 - -0.000 1- - Getopt::Long::Configure
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Greetings all,
I was corresponding with someone about the --window option in count.pl,
and realized that this might be of general interest to NSP users, so
I have modifed that note slightly and sent it here.
When you are counting up the bigrams in a corpus, you can specify a
--window size
sections 2 and 3 in the README. They describe how to set your
own tokenization scheme.
http://search.cpan.org/src/TPEDERSE/Text-NSP-1.03/README
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
--
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
-- Forwarded message --
Date: Sat, 23 Dec 2006 10:34:03 -0800
From: Jonathan Leffler [EMAIL PROTECTED]
To: [EMAIL PROTECTED], Bugs in CPANPLUS via RT [EMAIL PROTECTED],
[EMAIL PROTECTED]
Subject: Problem with CPANPLUS 0.076
regards,
Dipl.-Inf. Richard Jelinek
- The PetaMem Group - Prague/Nuremberg - www.petamem.com -
-= 2007-09-25: 49235653 Mind Units =-
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
On Fri, Feb 6, 2009 at 1:31 AM, reshmijose reshmij...@yahoo.com wrote:
can you tell me how command line arguments like 'count.pl output.txt
input.txt' are defined? Are they defined within the source code?
What should i do if i want to run the program count.pl separately?
--
Ted Pedersen
? If it is a
running text, how does it identify the end of the sentence?
Thanks
Jayaram
--- On Thu, 2/5/09, Ted Pedersen duluth...@gmail.com wrote:
From: Ted Pedersen duluth...@gmail.com
Subject: Re: [ngram] No ngram over sentence
To: ngram@yahoogroups.com
Date: Thursday, February 5, 2009, 9
this can be quite fun. You could also use egrep to
specify regular expression patterns to search for (rather than just
strings), but I find grep to be a nice starting point.
I hope this is helpful!
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
should process those with separate runs of count.pl.
I hope this helps!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Hi Gunn,
You might be hitting a peculiar bug we notice late last year (which still
hasn't been fixed).
http://tech.groups.yahoo.com/group/ngram/message/240
If you run using just pmi in the command line, do your results agree with your
Lisp code?
If there is still disagreement, let's run some
inputfile )
produces the following line
atdeter1 6.4127 262744 7073841 9391062 5872364 1234064 647295 1064083
Best,
Gunn
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
decision to use base-10 logarithms. Note that all
versions should still give the same ranking of candidates, so that's a
robust test case.
Cheers,
Stefan (Evert)
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
that are at the begining of the next sentence. That is, ngram without
containing line breaks.
Best wishes,
Mercè
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
of line markers?
Thanks a lot!
Mercè
--- In ngram@yahoogroups.com, Ted Pedersen duluth...@... wrote:
Greetings Merce,
To make sure I understand correctly, it sounds like you *only* want to
see those ngrams that contain a line break. For example, if you run
count.pl as follows on your test file
--- In ngram@yahoogroups.com, Amada Eliseo amadaeli...@... wrote:
Hello to all,
I appreciate your work.
Please. Can someone help me to identify significant collocations in some
text. I would like to use text-nsp, but I don't know how to do it.
Thank you.
If you are just starting
step is to do this with a larger data file and see
if the above rules of thumb continue to hold...
More soon,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
-- Forwarded message --
From: r...@imsc.res.in r...@imsc.res.in
Date: Dec 5, 2009 11:59 AM
Subject: Re: nsp/trigram signficance
To: Ted Pedersen duluth...@gmail.com
May I request you to forward this to list ? I don't use Yahoo and it
seems posting to the mailing list
11583688 xie200108.txt
8867 1728337 10179257 xie200109.txt
9437 1793786 10614699 xie200110.txt
1740339343 1995533 xie200111.txt
679007 132786005 783905863 total
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
/%7Ewestburylab/
University of Alberta
780-492-5843
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
- clearback283 1115 733
backsignal157 380 9176
- clearforward632 1115 877
forwardsignal493 1547 9176
Thanks a lot,
Mercè
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
interest?
Thanks for your help,
Mercè
--- In ngram@yahoogroups.com, Ted Pedersen tpederse@... wrote:
Hi Merce,
Yes, indeed, you can do as you describe. This gets into some important
details about regular expressions that I'm happy to have a chance to
mention. In the default stoplist
, Portland, Oregon.
http://www.d.umn.edu/~tpederse/Pubs/pedersen-disco2011.pdf
So, if you are at ACL 2011 please consider attending these events, or
catch up with us some other time.
Hoping to see you in Portland,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
for a
4-gram made of of words A B C and D?
Mi(ABCD) = log(P(ABCD) / (P (A) x P (B) x P (C) x P (D)))
If not, what is a better way? Why is this bad?
Thanks for your help,
Cyrus
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
- will attend the DisCo workshop and give a
talk showing how I used NSP to participate in the shared task on identifying
semantic compositionality.
http://disco2011.fzi.de/
Cordially,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederse
/seremina/edu/nsp-rom.html
Enjoy!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
cellularphones96 214 384
mobile phones
cellular phones
2. I need to remove punctuation . and , I've tried within my stopword
list, but I don't have the tags quite right. How should I enter into
my stop file?
Thanks!
Patrick
Yahoo! Groups Links
--
Ted
Hi Patrick,
NSP makes no real distinction between punctuation and words, so if you do
not do anything with tokenization via --token or --nontoken or
preprocessing, the punctuation marks will be treated just like words and
will affect your results. --token and --nontoken essentially remove them
PhD Student in Computational Linguistics
University of Gothenburg, Sweden
--
*Från:* duluth...@gmail.com [duluth...@gmail.com] för Ted Pedersen [
tpede...@d.umn.edu]
*Skickat:* den 6 februari 2013 03:35
*Till:* ngram@yahoogroups.com
*Cc:* Karin Cavallin
*Ämne
A user reports a bug in rank.pl. This seems to occur when dealing with
smaller files, for example...
marimba(49): more x
firstbigram1 4.000 1 1
secondbigram2 3.000 2 2
extrabigram13 2.000 3 3
thirdbigram4 1.000 4 4
marimba(50): more y
secondbigram1 4.000 2 2
extrabigram22 3.000 4 4
firstbigram3
, Ted Pedersen tpederse@... wrote:
Merce, I got an email error when responding directly to your yahoo.es
account. Could you follow up with another email address or use the
group...?
Thanks,
Ted
-- Forwarded message --
From: Ted Pedersen tpederse@...
Date: Wed, Mar 27, 2013
/perl/beginners/145nsxqz2w/cpan-unavailable
When we started using the search site in about 2002 it was pretty great.
The good news is that https://metacpan.org is even better, so this is a
positive change.
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Very nice news for users of NSP and Text::Similarity! Please support these
resources by giving them a try and letting others know about them too.
Cordially,
Ted
-- Forwarded message --
From: Marta Villegas marta.ville...@upf.edu
Date: Mon, Sep 22, 2014 at 4:13 AM
Subject: Ngrams
Hi Arnaud,
There is nothing new for more recent versions - the same solutions proposed
for earlier versions are still relevant (and still the best available
options). You can find some discussion of those here (via the NSP mailing
list):
?node_id=1077762
You can download the most current version of NSP from CPAN or
Sourceforge by following the links here :
http://ngram.sourceforge.net
Please let us know if any questions arise.
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
links on this page.
http://ngram.sourceforge.net
Enjoy,
Ted
On Sat, Oct 3, 2015 at 5:30 PM, Ted Pedersen <duluth...@gmail.com> wrote:
> We are pleased to announce a new release of Text::NSP, the Ngram
> Statistics Package. This is a very minor bug fix release, but might be
> som
CHI is a parent class, and not intended to be used as a measure. Rather,
the measures x2, pmi, and tscore are the end user measures which you can
run (and they all access that CHI class). So, if your goal is to run the
chi squared test, you can do that with the x2 measure, as in:
statistic.pl x2
The regex in token should look like this :
/\S+/
I think not having the / / is causing the delimeter errors...
On Thu, May 12, 2016 at 2:11 AM, amir.jad...@yahoo.com [ngram] <
ngram@yahoogroups.com> wrote:
>
>
> I'm running count.pl on a set of unicode documents. Create a new
> file('token')
The Ngram Statistics Package is mostly intended to help you find the most
frequent ngrams in a corpus, or the most strongly associated ngrams in a
corpus. It doesn't necessarily directly give you informativeness, although
you can certainly come up with ways to use frequency and measures of
Tokenization and the --token option are described here :
http://search.cpan.org/~tpederse/Text-NSP/doc/README.pod#2._Tokens
On Tue, May 10, 2016 at 8:14 AM, amir.jad...@yahoo.com [ngram] <
ngram@yahoogroups.com> wrote:
>
> [Attachment(s) <#m_-6964475169159201585_TopText> from
>
Text::NSP has a command line interface that allows you to provide a file or
a folder/directory for input. There are some simple examples shown below
that take a single file as input. That might be a good place to start, just
to make sure everything is working as expected.
I think this mail was somehow delayed, but I hope this response is still
useful.
NSP has a command line interface. In general you specify the output file
first, and the input file second. So if you want to write the output of
count.pl to a file called myoutput.txt, and if your input text is
Hi Julio,
Thanks for your question. In NSP we are always counting ngrams, so the
order of the words making up the ngram is considered. When we are counting
bigrams (the default case for NSP) word1 is always the first word in a
bigram, and word2 is always the second word. I think in other
reat don't as
three tokens (don ' t) or one (don't). --stop lets you exclude words from
being counted, but there isn't anything that lets you ignore case.
On Tue, Apr 17, 2018 at 8:51 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:
> Hi Catherine,
>
> Here are a few answers to your qu
issues to address...and of course if you try something and it does or
doesn't work I'm very interested in hearing about that...
Cordially,
Ted
On Tue, Apr 17, 2018 at 7:33 AM, Ted Pedersen <tpede...@d.umn.edu> wrote:
> The good news is that our documentation is more reliable than
I guess my first thought would be to see if there is a simple way to
compute the input you are providing to huge count into fewer files. If you
have a lot of files that start with the letter 'a', for example, you could
concatentate them all together via a (Linux) command like
cat a* >
Hi Catherine,
Just to make sure I'm understanding what you'd like to do, could you send
the command you are trying to run, and some idea of the number of files
you'd like to process?
Thanks!
Ted
On Sun, Apr 15, 2018 at 6:01 PM, catherine.dejage...@gmail.com [ngram] <
ngram@yahoogroups.com>
Let me go back and revisit this again, I seem to have confused myself!
More soon,
Ted
On Mon, Apr 16, 2018 at 12:55 PM, catherine.dejage...@gmail.com [ngram] <
ngram@yahoogroups.com> wrote:
>
>
> Did I misread the documentation then?
>
> "huge-count.pl doesn't consider bigrams at file
this makes some sense, but please feel free to follow up if it
doesn't or if you think I may be misinterpreting something here.
Cordially,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederse
On Sun, Nov 25, 2018 at 6:28 PM Ted Pedersen wrote:
>
> Thanks for these questions - all of the d
know if you have any other
questions.
Cordially,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederse
On Sun, Nov 25, 2018 at 4:13 AM BLK Serene wrote:
>
> Hi, I have some questions about the association measures implemented in
> Text-NSP:
>
> The Poisson-Sterlin
sely.
I'm not sure about they keyword extraction case, but if you have an
example I'd be happy to think a little further about that as well!
More soon,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederseOn Sun, Nov 25, 2018 at 11:32 AM BLK
Serene wrote:
>
> Thanks for the clarification!
>
interest in NSP over the years, and please do stay in
touch.
Cordially,
Ted
---
Ted Pedersen
http://www.d.umn.edu/~tpederse
62 matches
Mail list logo