[R] using XML package to read RSS

2012-05-16 Thread J Toll
Hi,

I'm trying to use the XML package to read an RSS feed.  To get
started, I was trying to use this post as an example:

http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/

I can replicate the beginning section of the post, but when I try to
use another RSS feed I have an issue.  The RSS feed I would like to
use is:

 URL - 
 http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom;

 library(XML)
 doc - xmlTreeParse(URL)

 src - xpathApply(xmlRoot(doc), //entry)

I get an empty list rather than a list of each of the entry:

 src
list()
attr(,class)
[1] XMLNodeSet

I'm not sure how to fix this.  Any suggestions?  Do I need to provide
a namespace, or is the RSS malformed?

Thanks,


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using XML package to read RSS

2012-05-16 Thread Duncan Temple Lang
Hi James.

 Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;))

This yeilds 40 matching nodes.

(getNodeSet() is more convenient to use when you don't specify a function
to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
entire document with the query //)

 BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


On 5/16/12 6:40 PM, J Toll wrote:
 Hi,
 
 I'm trying to use the XML package to read an RSS feed.  To get
 started, I was trying to use this post as an example:
 
 http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/
 
 I can replicate the beginning section of the post, but when I try to
 use another RSS feed I have an issue.  The RSS feed I would like to
 use is:
 
 URL - 
 http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom;
 
 library(XML)
 doc - xmlTreeParse(URL)
 
 src - xpathApply(xmlRoot(doc), //entry)
 
 I get an empty list rather than a list of each of the entry:
 
 src
 list()
 attr(,class)
 [1] XMLNodeSet
 
 I'm not sure how to fix this.  Any suggestions?  Do I need to provide
 a namespace, or is the RSS malformed?
 
 Thanks,
 
 
 James
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using XML package to read RSS

2012-05-16 Thread J Toll
On Wed, May 16, 2012 at 9:02 PM, Duncan Temple Lang
dun...@wald.ucdavis.edu wrote:
 Hi James.

  Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;))

 This yeilds 40 matching nodes.

 (getNodeSet() is more convenient to use when you don't specify a function
 to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
 entire document with the query //)

  BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


Brilliant!  Thank you so much.  I never would have figure out
specifying the namespace like that.  I had tried:

src - xpathApply(xmlRoot(doc), //entry, namespaces =
http://www.w3.org/2005/Atom;)

but that wasn't working.

Thanks again,


James

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.