[R] using XML package to read RSS
Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: URL - http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom; library(XML) doc - xmlTreeParse(URL) src - xpathApply(xmlRoot(doc), //entry) I get an empty list rather than a list of each of the entry: src list() attr(,class) [1] XMLNodeSet I'm not sure how to fix this. Any suggestions? Do I need to provide a namespace, or is the RSS malformed? Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using XML package to read RSS
Hi James. Yes, you need to identify the namespace in the query, e.g. getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;)) This yeilds 40 matching nodes. (getNodeSet() is more convenient to use when you don't specify a function to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the entire document with the query //) BTW, you want to use xmlParse() and not xmlTreeParse(). D. On 5/16/12 6:40 PM, J Toll wrote: Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: URL - http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom; library(XML) doc - xmlTreeParse(URL) src - xpathApply(xmlRoot(doc), //entry) I get an empty list rather than a list of each of the entry: src list() attr(,class) [1] XMLNodeSet I'm not sure how to fix this. Any suggestions? Do I need to provide a namespace, or is the RSS malformed? Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using XML package to read RSS
On Wed, May 16, 2012 at 9:02 PM, Duncan Temple Lang dun...@wald.ucdavis.edu wrote: Hi James. Yes, you need to identify the namespace in the query, e.g. getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;)) This yeilds 40 matching nodes. (getNodeSet() is more convenient to use when you don't specify a function to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the entire document with the query //) BTW, you want to use xmlParse() and not xmlTreeParse(). D. Brilliant! Thank you so much. I never would have figure out specifying the namespace like that. I had tried: src - xpathApply(xmlRoot(doc), //entry, namespaces = http://www.w3.org/2005/Atom;) but that wasn't working. Thanks again, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.