ersions and
> not difficult to implement.
>
> Thanks in advance.
>
> Bego~a
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor -~-->
Yahoo! Domain
lue obtained from
> the standard normal table.
>
>
>
> Back to my question, i looked at the script rank.pl and i think it
> returns r_s in the formula above, say, the normal Spearman's rank
> correlation coefficient. But, when n > 100 rank.pl should return the
> z value mentio
e discrepency resolved so that I
> can go ahead and describe the circumstances of the incorrect count
> in detail.
Hopefully the above clarifies that. Go ahead and describe! It would help
to know all the command line settings you are using, as well as having a
reduced version of your input
z = r_s * sqrt(n -1) with r_s being Spearman's rho and n the number of
pairs (z is identical to M in Agresti,2002 (pg.87))
Then, the z value can be compared with the critical value obtained from
the standard normal table.
------
ersion of NSP as provided by
SourceForge, please note that the current version is now found in NSP69.
Please let us know if you have any questions or comments!
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Yahoo!
rd sequences that occur next to each other
without regard to any ordering.
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wm
reply. Suffice to say I think this is a very
intresting issue. Other thoughts and comments on the above are most
welcome (and if I've flubbed up anything *please* let me know!).
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor -
rhaps, but I'm really wanting to try
and do things in the most "standard" way possible in terms of
installation, path searching, etc.
I'll be looking at this, and if anyone has any insights into this issue
I'd be interested to hear them.
Cordially,
Ted
--
Ted Pederse
s is a slightly different way of
using modules, in that we specify the name of the module from the command
line...
statistic.pl ll.pm myoutput myinput
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
$9.95 domai
n22 | n2p
___
np1 np2npp
So the phi coefficient takes the difference between the products of the
diagonals, squares that, and then divides by the product of all the
marginals.
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Grou
quot;N-gram
> model".
>
> Is it possible that someone on this list might take mercy on me and
> either mention the kinds of applications that can be made from N-gram
> models or point me to some literature that a literate non-linguist
> might understand? If so, I would be gre
A user writes:
> And apropos to your "note on working with non-English alphabets": Some
> students had problems with norwegian characters, even with the latest
> version. The solution was to set their "LANG" environment properly (on
> many systems it defaults to english). Could mabye be included
sts
(for now) require that n22 > n11.
1 2 | 3
3 10 | 13
---
4 1216
Cordially,
Ted
PS NSP turns 4 years old on November 30. Big party in Duluth, you are
all invited. :)
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor
Forwarded on behalf of Leonoor...
-- Forwarded message --
Date: Fri, 12 Nov 2004 16:37:15 +0100 (CET)
From: Beek L.J.van der <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Cc: ted pedersen <[EMAIL PROTECTED]>, [EMAIL PROTECTED]
Subject: Re: [Corpora-List] fisher's e
is rising in priority even as we speak, so hopefully we'll
have a fix for it shortly. (A fix has been in fact suggested by a user,
and that is posted to the list, but we'll go back and double check things
just to be sure).
OK, sorry for all this. I'm sure some of this is already
nt email (from yesterday even!) about this situation, or
you can go back into the archives and find:
http://groups.yahoo.com/group/ngram/message/15
http://groups.yahoo.com/group/ngram/message/17
This problem is rising in priority even as we speak, so hopefully we'll
have a fix f
)?
>
> And then for x2 and 11, are the higher-ranked bigrams more
> strongly dependent (although we'll have to look up their score in the
> table to know if their dependence is statistically significant)?
>
>
> Thanks very much,
> Kate
>
>
> On Fri, 12 N
ing
available by Jan 21.
Also, if you happen to have code that uses NSP without a related
publication, and that code is distributed, we want to know about you too.
We'll have a separate section for software systems...
Happy New Year!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yah
Hi Marie,
You have confused me (Ted Pedersen) with Jan Pedersen, now of Yahoo.
Name ambiguity abounds in the internet, so don't feel badly. In fact,
even search engines have a hard time with things like this. :)
Enjoy the meeting, I'm sure it will be good, although I don't kno
operations you should carry out.
I hope this helps. Let us know if it doesn't. I have limited experience
with cygwin, but it should be equivalent to Unix and Linux in regards
to paths and setting them and so forth (I think).
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpede
evolve more nicely than cygwin. This is not
to dismiss cygwin, like I say it's a great idea, but I think life will get
easier if you are able to run on a Linux machine.
Good luck, and let us know what happens!
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Group
feedback and tips like these
are particularly useful and appreciated.
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/ngram/
<*> To unsubscribe from this group, send
u have any questions about this. Sorry for not
making this available sooner, Bridget did a nice job on this and it just
fell through the cracks!
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo
roblems, etc. in order to make sure we
have caught everything. And of course, please feel free to let us know
of any other questions or concerns.
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yah
regarding the log -
likelihood measure and Pearson's chi-squared test, and I also
fixed a long standing bug in our Phi coefficient documentation.
Enjoy, and let us know if you have any additional questions or concerns!
Ted and Mahesh
--
Ted Pedersen
http://www.d.umn.edu/~tpe
A user is wondering about how to manually create input files for
statistic.pl ...
> I have read your readme file which came with the package. It's well
> written and quite understandable even for a person ignorant in the field
> of Ngrams.
> But unfortunately, although I quickly understood the ge
ent to the
ngram/NSP mailing list, for example) please feel free to contribute any
thoughts you have to the mailing list. We will send a summary of what we
plan as soon as final decisions are made.
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Grou
The following is a description of our plan of attach for the first stage
of the NSP redesign, that is to organize the measures in an object
oriented hierarchical fashion. The description below is written by Saiyam
Kohli. Your comments and questions are of course most welcome, especially
at this ti
o the bibliography!
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Fair play? Video games influencing politics. Click and talk back!
http://us.click.yahoo.com/T8sf5C/tzNLAA/TtwFAA/d
An NSP user has the following idea:
--
I just thought it would be nice to have an option in NSP (specifically in
statistic.pl) to filter bigrams based on their p-values, like we currently
do by rank and score. Very often I need to find "significant" bigrams, and
it will be nice if I c
I'd
> like to call NSP's log-likelihood code to do the calculation for me.
> How hints on this for me?
>
> Thanks,
> P
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Fair play? Video games influe
ase let us know if you have any questions.
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Fair play? Video games influencing politics. Click and talk back!
http://us.cl
n, etc.) but
we made some choices that weren't so good, and clearly very non-standard.
Rest assured we want to resolve these asap! We are hoping that the 0.75
release will be ready by mid-December.
Cordially,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
---
e'll certainly make sure 0.75 is also more
standard with respect to Makefile.PL issues as well.
Cordially,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Get fast access to your favorite Yahoo! Grou
any case, it strikes me as a minor modification to make this work, so
we'll plan on including this in the next release.
Questions or comments are of course welcome!
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
r's Problem). I found it helpful to re-read what
I wrote years ago. :)
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
AIDS in India: A "lurking bomb." Click and help stop AIDS now.
http://us.cli
d above work with count.pl as well.
It is fun to play with tokenization, and it's one of my favorite features
about NSP. So, if you have any questions about any of the above, I'd be
very happy to describe it in more detail.
Meanwhile, have fun. :)
Cordially,
Ted
--
Ted Pedersen
http://www.
d and Mahesh
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
AIDS in India: A "lurking bomb." Click and help stop AIDS now.
http://us.click.yahoo.com/9QUssC/lzNLA
es for each pair of names. As each
> ratio is computed, I'll test it for a threshold and if the pair
> exceeds a threshold, I'll push it to an array. Repeat for the 2nd
> name in the list, 3rd name in the list, etc.
>
> Thanks in advance for any wisdom you might have on this
usion during installation.
Here is a more detailed ChangeLog :
http://search.cpan.org/src/TPEDERSE/Text-NSP-0.91/CHANGES
Please give this new version a try, and let us know what you think!
Enjoy,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
YAHOO! GROUPS
979 5 6 25
> > human<>beings<>11 38.0403 3 5 3
> > ...
> > ...
> >
> > This file also contains the expressions but they are sorted in the
> > order of their significance and the first number after each bigram is
> > the score given by the statistical measure used, which in this case is
> > the Log Likelihood(ll) me
falls in changing that? Is this
> something that will come in a future release?
>
> thanks very much,
> ilya
>
>
>
>
>
>
>
>
>
>
>
> Yahoo! Groups Links
>
>
>
>
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
SPON
grams in which some
> (i.e. 2) tokens must match some regular expressions, but the others
> are also allowed to match some others? I'm trying to get expressions
> like "drag and drop" or "press-and-hold", or "create a new \w{4,}"
>
> Thanks, again!
&
/TPEDERSE/Text-NSP-0.95/Docs/NSP-Class-diagram.pdf
Please let us know if you have any questions or comments on this new
release.
Enjoy,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor ~-->
Something is new
better
performance than the last few releases (0.91, 0.93, and 0.95).
Please let us know if you have any comments, suggestions, or questions!
Cordially,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Sponsor
0.0005 Getopt::Long::BEGIN
0.00 0.000 0.000 4 0. 0. Exporter::Heavy::heavy_export
0.00 - -0.000 1- - Getopt::Long::ConfigDefaults
0.00 - -0.000 1- - Getopt::Long::Configure
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
are being added to the /t directory, and are run when
you submit :
make test
So you can do both "make test" and the running of the /Testing scripts
to test your installation.
I will fix this permission issue with our next release.
Enjoy,
Ted
--
Ted Pedersen
http://www
e are caused by very minor differences in the scores reported on
this system versus our normal testing environement (which includes a
number of 64 bit machines, so this puzzles us a little).
Please do let us know of any other questions or concerns you may have!
Cordially,
Ted
--
Ted Pedersen
d any version of NSP on
Mac OS, it would be great to know that (and if there was
anything at all out of the ordinary).
Thank you!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/ngr
rtly. If you encounter this divide by zero error please
let us know about it. We only encountered it when doing some rather
unusual experiments, and have not seen this in our more normal uses of
NSP, where count.pl is used to obtain frequency count information from
corpora.
Thanks,
Ted
--
Ted P
est
27
for<>our<>1 9.3073 10 10 19
Please let us know if you have any questions, concerns, or additional
information!
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com
d, or if any work is included there that
should not be. The basic criterion for inclusion is that the paper
should mention having used NSP in some way, and include a citation
either to an NSP URL or paper (most likely the CICLING-2003 paper by
Banerjee and Pedersen).
Thanks!
Ted
--
Ted Pedersen
pls help me
I hope this helps. There are no right answers to the questions you pose,
but I would encourage you to use NSP for whatever you doing here, I
think you'll be able to do a lot of experiments very quickly, and that
will help you figure out what makes the most sense for you in terms
ou can find the bibliography at:
http://www.d.umn.edu/~tpederse/nsp-bib/
Please let us know if you have any questions!
Enjoy,
Ted and Saiyam
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/n
Greetings all,
I was corresponding with someone about the --window option in count.pl,
and realized that this might be of general interest to NSP users, so
I have modifed that note slightly and sent it here.
When you are counting up the bigrams in a corpus, you can specify a
--window size tha
> Siham
>
Hi Siham,
Check out sections 2 and 3 in the README. They describe how to set your
own tokenization scheme.
http://search.cpan.org/src/TPEDERSE/Text-NSP-1.03/README
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
files so as to
> avoid editing every single release? ;o)
>
> BTW, thanks again for making this package available to the community,
> this is great stuff.
>
> Regards,
> Patrick
>
--
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
FYI, please keep an eye on this rewrite!!
Cordially,
Ted
-- Forwarded message --
Date: Thu, 30 Nov 2006 22:45:42 +0100
From: Björn Wilmsmann <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: Major overhaul of TextNSP
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Dear Mr. Peder
hread.run(Unknown Source)
>
> Suggestions?
>
> - Bob Futrelle
>
> Robert P. Futrelle
>Associate Professor
> Biological Knowledge Laboratory
> College of Computer and Information Science
> Northeastern University MS WVH202
> 360 Huntington Ave.
> Boston, MA
--
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
-- Forwarded message --
Date: Sat, 23 Dec 2006 10:34:03 -0800
From: Jonathan Leffler <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], Bugs in CPANPLUS via RT <[EMAIL PROTECTED]>,
[EMAIL PROTECTED]
Subject: Problem w
r.pm not found) at D:/ActivePerl/perl/lib/ExtUtils/MM_Unix.pm
> li
> ne 1676.
> Could not open 'lib/Text/NSP.pm': No such file or directory at
> D:/ActivePerl/per
> l/lib/ExtUtils/MM_Unix.pm line 2669.
>
>
> if anybody has any idea how it can be installed on Windows instead of linux?
>
> regards
> Madiha
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
do
that.
We would be happy to see NSP modified and redistributed, and we hope
the GPL allows you to do that in a way that you are happy with. If you
feel that this isn't the case, do let us know what your concerns are
and how you think they might best be resolved.
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
A PS of sorts...
There are various nice discussions of Perl profiling around, I mention
one below. I think it's actually perl -d rather than -D...anyway,
details here:
http://www.perl.com/pub/a/2004/06/25/profiling.html
Also, here's an example of how we used Perl profiling to guide us in
earlie
gt; http://www.topicalizer.com/files/TextNSP/Text-NSP-1.10.tar.gz
>
> Any input would be appreciated.
>
> --
> Best regards,
> Bjoern Wilmsmann
>
>
>
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
motivation for this change?
Thanks,
Ted
On 5/15/07, Ted Pedersen <[EMAIL PROTECTED]> wrote:
> Greetings again,
>
> I was able to run all the way through the Testing/ALL-TESTS.sh script,
> and I posted the output from that to :
>
> http://www.d.umn.edu/~tpederse/nsp110-all
00 0.000 0.000 3 0. 0. Exporter::Heavy::heavy_export
On 5/15/07, Ted Pedersen <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
>
>
> A PS of sorts...
>
> There are various nice discussions of Perl profiling around, I mention
> one below. I think it's a
om you.
Finally, if I have messed up any of the above, please don't hesitate
to correct or comment on it. I think it's best that we discuss these
sorts of issues from time to time, just to make sure everyone is
thinking of them.
Most of us aren't lawyers, and this sort of stuff gets pr
ant to do things in such a
way that doesn't significantly impede the use of NSP
2) yes
3) GPL
4) $0 (USD) :)
Thanks,
Ted
On 5/17/07, Richard Jelinek <[EMAIL PROTECTED]> wrote:
>
> Greetings Ted,
>
> On Wed, May 16, 2007 at 10:50:59AM -0500, Ted Pedersen wrote:
> [..
s in proteins.
> After i want to use the result in neural network.
>
> Who knows how to use the results of count.pl how input datas of neural
> network.
>
> i need help ........
>
> Thanks .. William
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ll comments above are helpful.
Cordially,
Ted
>
> -- Thank you,
> Mary D. Taffet
> Ph.D. Candidate/Syracuse University, School of Information Studies
> Scientist/TextWise LLC
> Syracuse, NY
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
> Gly<>Ala<>76 287 470
> Lys<>Lys<>73 479 479
>
> I know what it means each row and column of this file
. but how i can use this datas for a neural network ?
> Is necessary wich modifications ? What is way ?
>
> Cordially,
>
> W
nctuation
> > for you...
> >
> > @stop.mode=OR
> > /\b[,.?!'"`]+\b/
> > /\b\d+\b/
> >
> > Each line in this file after the mode line is a Perl regular expression
> > that shows what you would like to remove.
> >
> >
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ginning of the Perl
> > scripts fixed the problems I had for French. You can have a look at
> this
> > for more information :
> >
> > http://tech.groups.yahoo.com/group/ngram/message/159
> >
> > Hope this helps...
> >
> > Regards,
> > Patrick
> >
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
; >
> > > Mercè
> > >
> > > >
> > > > Mercè,
> > > >
> > > > I have not checked the latest version of NSP to see if count.pl
> and the
> > > > other files contain "use locale;" as I suggested some time ago. The
> > > > simple inclusion of such a statement at the beginning of the Perl
> > > > scripts fixed the problems I had for French. You can have a look at
> > > this
> > > > for more information :
> > > >
> > > > http://tech.groups.yahoo.com/group/ngram/message/159
> > > >
> > > > Hope this helps...
> > > >
> > > > Regards,
> > > > Patrick
> > > >
> > >
> >
> >
> > --
> > Ted Pedersen
> > http://www.d.umn.edu/~tpederse
> >
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
Greetings all,
I'm in the process of collecting up the various bug reports that we've
gotten since version 1.03 was released in September 2006, and I'll
resolve those in 1.05. Here's what I have so far...
1) Incorporate "use locale" throughout package (suggested by Patrick
Drouin long ago)This w
f you recall something else, or these is some feature or change you
> > are interested in seeing, please let me know. As you can tell NSP
> > releases have slowed considerably in recent years, so this is likely
> > to be the only release for some time to come, so please do let me know
> > asap if there are other issues. Comments and suggestions are of course
> > welcome.
> >
> > Cordially,
> > Ted
> >
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
TED]> wrote:
>
> On Thu, Feb 14, 2008 at 03:51:40PM -, Ted Pedersen wrote:
> > 1) Incorporate "use locale" throughout package (suggested by Patrick
> > Drouin long ago)This will make for more convenient handling of
> > non-English text.
>
> Wrong idea,
-score to 3 dimensions or more then we are certainly interested in
discussing that.
Thanks,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
c.perl.org/open.html
>
> Setting up default IN and OUT layers this way could save a lot of
> typing/transforming in the migration process.
>
>
> --
> Kind regards,
>
> Dipl.-Inf. Richard Jelinek
>
> - The PetaMem Group - Prague/Nuremberg - www.petamem.com -
> -= 2007-09-25: 49235653 Mind Units =-
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
re, some in fact still use 5.8.3, etc.
which would
have been a problem previously.
So, if you are a Windows user or a pre 5.8.5 user of Perl, you'll certainly
want to upgrade to 1.07!
Please let us know if any questions arise!
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
--- In ngram@yahoogroups.com, "Sayed Magdy" <[EMAIL PROTECTED]> wrote:
>
> hello all
> i have just download Text-NSP-1.09 package and i cannot install it
> i have setup active perl,but i donot know how to install the package.
> i am using windos XP
> please any one help me in this problem
> thank
gt; > thanks in advance
> >
> > sayed magdy
> >
> >
> >
> >
> >
> __
> ___
> > Be a better friend, newshound, and
> > know-it-all with Yahoo! Mobile. Try it now.
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
erarchy?
>
> And points to:
> http://search.cpan.org/perldoc?NSP-Class-diagram.pdf
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
can surely get frequency counts from count.pl.
Good luck!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
email
with the information.
Cordially,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
gt; how I will proceed
> thanks
> arezki
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
séparat file with their decresing scores.
>
> Best Regards
> Arezki
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
I have this code with java :
> Process p = Runtime.getRuntime (). Exec (perl g:/Text-NSP-
> 1.09/bin/statistic.pl mi.pm output_pmi.txt output.txt);
> What is the problem?
>
> The second problem concerns accented letters (é,è,ç,...) how not
> conceder them as end of token. b
specified pmi name, and there is an easy work around should that
occur. However, this is certainly something we will fix.
Please let me know of any questions or concerns
Thanks!
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
29 124
> Obama<>.<>5 13 132
> the<>,<>5 15 124
> .<>The<>5 129 8
> in<>.<>5 7 132
>
> Note all those stop words in there. I'd like to get rid of them and I think
> that's what that -stop stop.txt should do, no?
>
> $ egrep '/said/|/the/' stop.txt
> /said/
> /the/
>
> Is this a bug or am I doing something wrong?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
k<>2
> houses<>1
>
>
> For some reason using -stop messed things up. Nothing funky in my stop.txt
> (attached) I believe.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> - Original Message
>> From: Ted P
u will find a simple interface to play with NSP.
> it could be a tool for NSP beginners, let me know what do you think
> about this interface.
>
> Arnaud.
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
able to
>> run exactly as before. So unless you want to use a model outside of
>> the independence model, you should not notice any change. If you
>> would like to use a different model, this can be specified in a file
>> similar to how the ordering of the marginal values are specified.
>>
>> For example, with the independence model for trigrams, the expected
>> values would be ordered as such in a
>>
>> 0
>> 1
>> 2
>>
>> For the "p(word1 word2) * p(word3)" model:
>>
>> 0 1
>> 2
>>
>> and so on. Well, I hope this makes sense and if you have any questions
>> or I have not explained something clearly, please let me know.
>>
>> Thanks!
>>
>> Bridget
>>
>>
>>
>> On Tue, 6 Jul 2004, theonomo wrote:
>>
>> > Hello all.
>> >
>> > I would like to calculate statistical measures of association for 4-
>> > grams and 5-grams. Is this possible? Does this even make sense?
>> >
>> > Thanks,
>> >
>> > Jon
>> >
>> >
>> >
>> >
>> >
>> > Yahoo! Groups Links
>> >
>> >
>> >
>> >
>> >
>>
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
70.1%0+0k 0+19144io 0pf+0w
%time count.pl --ngram 3 --set_freq_combo mycomb1.txt output1 wrnpc12.txt
34.518u 0.284s 0:52.53 66.2%0+0k 0+30504io 0pf+0w
%time count.pl --ngram 3 output wrnpc12.txt
48.435u 0.476s 1:13.20 66.8%0+0k 0+37096io 0pf+0w
BTW, all of the above is on War and Peace, which is a long novel but a
fairly small corpus, so your savings will be more dramatic on larger
corpora
%wc wrnpc12.txt
67403 564514 3285165 wrnpc12.txt
In any case, if you are using longer ngrams, I think it's very likely
you will find set_freq_comb very handy. Please don't hesitate to let
us know if you have any questions on how to use or interpret this.
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ngrams for contemporary English of popular usage.
>
> It doesnt have to be clean. I can clean it and give it back to you.
>
> Excel file would be great.
>
> Thank you in advance.
>
> Abhijit, India
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
lt;>6542
TOKYO<>6445
AUSTRALIAN<>6433
STOCKS<>6363
IRAN<>6359
FIVE<>6304
GERMANY<>6271
IRAQ<>6124
WHAT<>6076
FROM<>6056
AUSTRALIA<>5933
Q<>5923
PAKISTAN<>5919
GOLD<>5761
PHILIPPINE<>5609
HUNDREDS<>5590
ART<>5463
SRI<>5431
EDITORS<>5350
SINGAPORE<>5338
YOU<>5331
WEATHER<>5325
TODAY<>5305
TOP<>5283
ALL<>5212
FOREIGN<>5154
WE<>5124
BUT<>5104
MALAYSIA<>5062
EVEN<>5020
DOW<>4974
WASHINGTON<>4938
ITALIAN<>4875
JUST<>4854
JOHN<>4852
The real power here is in the --token option, which can do a lot of
interesting things like this...
Enjoy,
Ted
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ariable length (the operand) is preceded by an opcode."
>
> I would like to get as a result list "data of variable" and not "Data
> variable length".
>
> Best whishes,
> Mercè
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
trigram mesure.
>
> Thanks,
> Mercè
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
ey
> Honey<>Bunny
> A<>women
> women<>snorts
>
> so i want that the bigram Bunny<>A is not created (and don't gets counted)
>
> Is there a way to achieve this?
>
> I hope my question is understandable and has not been ask bevor.
>
> If i mis
1 - 100 of 185 matches
Mail list logo