[R] Pointwise Mutual Information

2012-04-12 Thread pl.r...@gmail.com
Hi, I want to calculate pointwise mutual information between label 2-gram, and words in my corpus 1-gram. Any suggestions as to how to go about it? l =label w = word C = reference collection I want to calculate following: p(w,l| C) p(w| C) p(l | C) -- View this message in context:

[R] tm package, custom reader

2012-01-13 Thread pl.r...@gmail.com
I need help with creating custom xml reader for use with the tm package. The objective is to crate a corpus for analysis. Files that I'm working with come from solr and are in a funky XML format never the less I'm able to parse the XML files using solrDocs.R function provided by Duncan Temple

Re: [R] Custom XML Readers

2011-12-29 Thread pl.r...@gmail.com
I found the source of the error, in my XML document there are some costume tags such us response doc if I change those tags to lst the code work. One other source of error is when the text does not fit on to one line such as: str name=quot;fulltextquot; MORGANZA, La. (AP) -- Federal officials

Re: [R] Custom XML Readers

2011-12-29 Thread pl.r...@gmail.com
I found the source of the error, in my XML document there are some costume tags such us response doc if I change those tags to lst the code work. One other source of error is when the text does not fit on to one line such as: str name=fulltext MORGANZA, La. (AP) -- Federal officials say they are

Re: [R] Custom XML Readers

2011-12-28 Thread pl.r...@gmail.com
Thanks all for helpful advise, however I'm still running in to an error while trying to run readSolrDoc provided by Ducan Temple Lang. The documents I'm trying to parse come from solr and look very much like the example provided on http://www.omegahat.org/RSXML/ I'm not that familiar with the

[R] Custom XML Readers

2011-12-23 Thread pl.r...@gmail.com
I need to construct a custom XML reader, the files I'm working with are in funky XML format: str name=authorPaul H/str str name=countryUSA/str date name=created_date2010-02-16/date I want to read the file so it looks like: author = Paul H country = USA created_date=2010-02-16 Does any one