Re: [R] XML package example code?
It's been a long time since i read the tutorials, but 'I think', the reason you get those notifications is because the html code is malformed, meaning that some of the opening tags '' don't have corresponding end tags etc. The XML package seems rather good at working with malformed code, and therefore I usually just force those notifications into an empty function. library(RCurl) library(XML) html <- getURL("http://www.omegahat.org/RSXML/index.html";) html.tree <- htmlTreeParse(html, useInternalNodes = TRUE, error = function(...){}) HTH, Tony Breyal On 25 Nov, 16:21, Peng Yu wrote: > On Wed, Nov 25, 2009 at 12:19 AM, cls59 wrote: > > > Peng Yu wrote: > > >> I'm interested in parsing an html page. I should use XML, right? Could > >> you somebody show me some example code? Is there a tutorial for this > >> package? > > > Did you try looking through the help pages for the XML package or browsing > > the Omegahat website? > > > Look at: > > > library(XML) > > ?htmlTreeParse > > > And the relevant web page for documentation and examples is: > > > http://www.omegahat.org/RSXML/ > > http://www.omegahat.org/RSXML/shortIntro.html > > I'm trying the example on the above webpage. But I'm not sure why I > got the following error. Would you help to take a look? > > $ Rscript main.R> library(XML) > > > download.file('http://www.omegahat.org/RSXML/index.html','index.html') > > trying URL 'http://www.omegahat.org/RSXML/index.html' > Content type 'text/html; charset=ISO-8859-1' length 3021 bytes > opened URL > == > downloaded 3021 bytes > > > > > doc = xmlInternalTreeParse("index.html") > > Opening and ending tag mismatch: dd line 68 and dl > Opening and ending tag mismatch: li line 67 and body > Opening and ending tag mismatch: dt line 66 and html > Premature end of data in tag dd line 64 > Premature end of data in tag li line 63 > Premature end of data in tag dt line 62 > Premature end of data in tag dl line 61 > Premature end of data in tag body line 5 > Premature end of data in tag html line 1 > Error: 1: Opening and ending tag mismatch: dd line 68 and dl > 2: Opening and ending tag mismatch: li line 67 and body > 3: Opening and ending tag mismatch: dt line 66 and html > 4: Premature end of data in tag dd line 64 > 5: Premature end of data in tag li line 63 > 6: Premature end of data in tag dt line 62 > 7: Premature end of data in tag dl line 61 > 8: Premature end of data in tag body line 5 > 9: Premature end of data in tag html line 1 > Execution halted > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
Not sure if my code was attached in that last post: library(RCurl) library(XML) html <- getURL("http://www.omegahat.org/RSXML/index.html";) html.tree <- htmlTreeParse(html, useInternalNodes = TRUE, error = function(...){}) On 25 Nov, 16:21, Peng Yu wrote: > On Wed, Nov 25, 2009 at 12:19 AM, cls59 wrote: > > > Peng Yu wrote: > > >> I'm interested in parsing an html page. I should use XML, right? Could > >> you somebody show me some example code? Is there a tutorial for this > >> package? > > > Did you try looking through the help pages for the XML package or browsing > > the Omegahat website? > > > Look at: > > > library(XML) > > ?htmlTreeParse > > > And the relevant web page for documentation and examples is: > > > http://www.omegahat.org/RSXML/ > > http://www.omegahat.org/RSXML/shortIntro.html > > I'm trying the example on the above webpage. But I'm not sure why I > got the following error. Would you help to take a look? > > $ Rscript main.R> library(XML) > > > download.file('http://www.omegahat.org/RSXML/index.html','index.html') > > trying URL 'http://www.omegahat.org/RSXML/index.html' > Content type 'text/html; charset=ISO-8859-1' length 3021 bytes > opened URL > == > downloaded 3021 bytes > > > > > doc = xmlInternalTreeParse("index.html") > > Opening and ending tag mismatch: dd line 68 and dl > Opening and ending tag mismatch: li line 67 and body > Opening and ending tag mismatch: dt line 66 and html > Premature end of data in tag dd line 64 > Premature end of data in tag li line 63 > Premature end of data in tag dt line 62 > Premature end of data in tag dl line 61 > Premature end of data in tag body line 5 > Premature end of data in tag html line 1 > Error: 1: Opening and ending tag mismatch: dd line 68 and dl > 2: Opening and ending tag mismatch: li line 67 and body > 3: Opening and ending tag mismatch: dt line 66 and html > 4: Premature end of data in tag dd line 64 > 5: Premature end of data in tag li line 63 > 6: Premature end of data in tag dt line 62 > 7: Premature end of data in tag dl line 61 > 8: Premature end of data in tag body line 5 > 9: Premature end of data in tag html line 1 > Execution halted > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
Peng Yu wrote: > On Wed, Nov 25, 2009 at 12:19 AM, cls59 wrote: >> >> Peng Yu wrote: >>> I'm interested in parsing an html page. I should use XML, right? Could >>> you somebody show me some example code? Is there a tutorial for this >>> package? >>> >> Did you try looking through the help pages for the XML package or browsing >> the Omegahat website? >> >> Look at: >> >> library(XML) >> ?htmlTreeParse >> >> And the relevant web page for documentation and examples is: >> >> http://www.omegahat.org/RSXML/ > > > http://www.omegahat.org/RSXML/shortIntro.html > > I'm trying the example on the above webpage. But I'm not sure why I > got the following error. Would you help to take a look? > > > $ Rscript main.R >> library(XML) >> >> download.file('http://www.omegahat.org/RSXML/index.html','index.html') > trying URL 'http://www.omegahat.org/RSXML/index.html' > Content type 'text/html; charset=ISO-8859-1' length 3021 bytes > opened URL > == > downloaded 3021 bytes > >> doc = xmlInternalTreeParse("index.html") You are trying to parse an HTML document as if it were XML. But HTML is often not well-formed. So use htmlParse() for a more forgiving parser. Or use the RTidyHTML package (www.omegahat.org/RTidyHTML) to make the HTML well-formed before passing it to xmlTreeParse() (aka xmlInternalTreeParse()). That package is an interface to libtidy. D. > Opening and ending tag mismatch: dd line 68 and dl > Opening and ending tag mismatch: li line 67 and body > Opening and ending tag mismatch: dt line 66 and html > Premature end of data in tag dd line 64 > Premature end of data in tag li line 63 > Premature end of data in tag dt line 62 > Premature end of data in tag dl line 61 > Premature end of data in tag body line 5 > Premature end of data in tag html line 1 > Error: 1: Opening and ending tag mismatch: dd line 68 and dl > 2: Opening and ending tag mismatch: li line 67 and body > 3: Opening and ending tag mismatch: dt line 66 and html > 4: Premature end of data in tag dd line 64 > 5: Premature end of data in tag li line 63 > 6: Premature end of data in tag dt line 62 > 7: Premature end of data in tag dl line 61 > 8: Premature end of data in tag body line 5 > 9: Premature end of data in tag html line 1 > Execution halted > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
On Wed, Nov 25, 2009 at 12:19 AM, cls59 wrote: > > > Peng Yu wrote: >> >> I'm interested in parsing an html page. I should use XML, right? Could >> you somebody show me some example code? Is there a tutorial for this >> package? >> > > Did you try looking through the help pages for the XML package or browsing > the Omegahat website? > > Look at: > > library(XML) > ?htmlTreeParse > > And the relevant web page for documentation and examples is: > > http://www.omegahat.org/RSXML/ http://www.omegahat.org/RSXML/shortIntro.html I'm trying the example on the above webpage. But I'm not sure why I got the following error. Would you help to take a look? $ Rscript main.R > library(XML) > > download.file('http://www.omegahat.org/RSXML/index.html','index.html') trying URL 'http://www.omegahat.org/RSXML/index.html' Content type 'text/html; charset=ISO-8859-1' length 3021 bytes opened URL == downloaded 3021 bytes > > doc = xmlInternalTreeParse("index.html") Opening and ending tag mismatch: dd line 68 and dl Opening and ending tag mismatch: li line 67 and body Opening and ending tag mismatch: dt line 66 and html Premature end of data in tag dd line 64 Premature end of data in tag li line 63 Premature end of data in tag dt line 62 Premature end of data in tag dl line 61 Premature end of data in tag body line 5 Premature end of data in tag html line 1 Error: 1: Opening and ending tag mismatch: dd line 68 and dl 2: Opening and ending tag mismatch: li line 67 and body 3: Opening and ending tag mismatch: dt line 66 and html 4: Premature end of data in tag dd line 64 5: Premature end of data in tag li line 63 6: Premature end of data in tag dt line 62 7: Premature end of data in tag dl line 61 8: Premature end of data in tag body line 5 9: Premature end of data in tag html line 1 Execution halted __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
Cls59 is correct that there is a lot of example code, just look in ? htmlTreeParse and you'll get most of what you need i think. here's some simplified code I use a lot of (XPath expressions are used to parse the code): # libraries library(RCurl) library(XML) # google url my.url <- "http://www.google.co.uk/search?hl=en&client=firefox- a&rls=org.mozilla%3Aen-GB%3Aofficial&hs=6Sd&q=google +wave&btnG=Search&meta=&aq=f&oq=" # download page html <- getURL(my.url) html.tree <- htmlTreeParse(html, useInternalNodes = TRUE, error = function(...){}) # the xpath expression is next nodes <- getNodeSet(html.tree, "//a...@href][@class='l']") links <- sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]) HTH Tony On 25 Nov, 01:49, Peng Yu wrote: > I'm interested in parsing an html page. I should use XML, right? Could > you somebody show me some example code? Is there a tutorial for this > package? > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
Peng Yu wrote: > > I'm interested in parsing an html page. I should use XML, right? Could > you somebody show me some example code? Is there a tutorial for this > package? > Did you try looking through the help pages for the XML package or browsing the Omegahat website? Look at: library(XML) ?htmlTreeParse And the relevant web page for documentation and examples is: http://www.omegahat.org/RSXML/ -Charlie - Charlie Sharpsteen Undergraduate Environmental Resources Engineering Humboldt State University -- View this message in context: http://old.nabble.com/XML-package-example-code--tp26506445p26508065.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] XML package example code?
I'm interested in parsing an html page. I should use XML, right? Could you somebody show me some example code? Is there a tutorial for this package? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.