Re: [R] Extracting a website text content using R
Yes, there are. (Please see and follow the posting guide if you wish to obtain something more specific) Bert Gunter Genetech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Am Stat Sent: Wednesday, August 01, 2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a website text content using R
All right, my question is, if there is(are) such function(s), what is(are) it(they) ? Best, Leon 2007/8/1, Bert Gunter [EMAIL PROTECTED]: Yes, there are. (Please see and follow the posting guide if you wish to obtain something more specific) Bert Gunter Genetech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Am Stat Sent: Wednesday, August 01, 2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a website text content using R
work with it as text. for text mining use: 1- http://wwwpeople.unil.ch/jean-pierre.mueller/ 2- tm by Ingo F. Am Stat wrote: Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a website text content using R
-Original Message- From: [EMAIL PROTECTED] on behalf of Am Stat Sent: Wed 8/1/2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon Is this what you had in mind? foo - scan(url(http://cran.r-project.org/;), what = character) Read 69 items paste(unlist(foo), collapse = ) [1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN html head titleThe Comprehensive R Archive Network/title link rel=\icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\shortcut icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\stylesheet\ type=\text/css\ href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0 FRAMESET rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\ frameborder=0 FRAME src=\navbar.html\ name=\contents\ frameborder=0 /FRAMESET FRAME src=\banner.shtml\ name=\banner\ frameborder=0 noframes h1The Comprehensive R Archive Network/h1 Your browser seems not to support frames, here is the A href=\navbar.html\contents page/A of CRAN. /noframes /FRAMESET Try the search phrase cran scan url in Google for more hits on info about R functions that can deal with URLs. In R try apropos(URL) [1] contourLines URLdecode URLencode browseURL contrib.urlmain.help.url url.show [8] loadURLread.table.url scan.url source.url url SteveM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting a website text content using R
Perhaps more fun is library(XML) res = htmlTreeParse(http://www.omegahat.org/RSXML/;, useInternalNodes=TRUE) xpathApply(res, //h1, xmlValue) [[1]] [1] An XML package for the S language Martin Quoting Steven McKinney [EMAIL PROTECTED]: -Original Message- From: [EMAIL PROTECTED] on behalf of Am Stat Sent: Wed 8/1/2007 2:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Extracting a website text content using R Dear useR, Just wandering whether it is possible that there is any function in R could let me get the text contents for a certain website. Thanks a lot! Best, Leon Is this what you had in mind? foo - scan(url(http://cran.r-project.org/;), what = character) Read 69 items paste(unlist(foo), collapse = ) [1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN html head titleThe Comprehensive R Archive Network/title link rel=\icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\shortcut icon\ href=\favicon.ico\ type=\image/x-icon\ link rel=\stylesheet\ type=\text/css\ href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0 FRAMESET rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\ frameborder=0 FRAME src=\navbar.html\ name=\contents\ frameborder=0 /FRAMESET FRAME src=\banner.shtml\ name=\banner\ frameborder=0 noframes h1The Comprehensive R Archive Network/h1 Your browser seems not to support frames, here is the A href=\navbar.html\contents page/A of CRAN. /noframes /FRAMESET Try the search phrase cran scan url in Google for more hits on info about R functions that can deal with URLs. In R try apropos(URL) [1] contourLines URLdecode URLencode browseURL contrib.urlmain.help.url url.show [8] loadURLread.table.url scan.url source.url url SteveM __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.