Thanks so much for looking into this for me. Unfortunately, I get an error when I execute your code. Is there a library that you loaded that I haven't?
require(scrapeR) require(XML) require(RCurl) doc<-htmlTreeParse("http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany") node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" ) Error in UseMethod("xpathApply") : no applicable method for 'xpathApply' applied to an object of class "character" Guidance would be much appreciated. --JJS On Wed, August 14, 2013 4:19 am, Jeffrey Dick wrote: > Hi, > > There are many occurrences of the CIK number in the page source. This > pulls > out the first node containing it: > > node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" ) > > From there you can extract the number. Here's one way to do it. > > strsplit(strsplit(unlist(node)[[5]], "CIK=")[[1]][2], "&type")[[1]][1] > > Jeff > > > On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James <jspa...@uic.edu> > wrote: > >> Dear R Helpers, >> >> I would like to pull the CIK number from the web page >> >> >> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany >> >> If you put this web page into your browser you will see the CIK number >> in >> red on the left side of the page near the top. >> >> When I try the basic >> require(scrapeR) >> require(XML) >> require(RCurl) >> doc >> <-htmlTreeParse(" >> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany >> ") >> str(doc) >> >> I get a large number of items in the data frame that I don't know how to >> interpret. Both >> tables <- readHTMLTable(doc) >> >> and >> >> list<-xmlToList(doc) >> >> result in errors. >> >> Any (positive) guidance would be much appreciated. >> >> --John J. Sparks, Ph.D. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.