subject:"Re\: \[R\] Extracting a website text content using R"

Re: [R] Extracting a website text content using R

2007-08-01 Thread Bert Gunter

Yes, there are.

(Please see and follow the posting guide if you wish to obtain something
more specific)


Bert Gunter
Genetech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Am Stat
Sent: Wednesday, August 01, 2007 2:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Extracting a website text content using R

Dear useR,

Just wandering whether it is possible that there is any function in R could
let me get the text contents for a certain website.

Thanks a lot!

Best,

Leon

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a website text content using R

2007-08-01 Thread Am Stat

All right, my question is, if there is(are) such function(s), what is(are)
it(they) ?


Best,

Leon




2007/8/1, Bert Gunter [EMAIL PROTECTED]:

 Yes, there are.

 (Please see and follow the posting guide if you wish to obtain something
 more specific)


 Bert Gunter
 Genetech Nonclinical Statistics


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Am Stat
 Sent: Wednesday, August 01, 2007 2:19 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Extracting a website text content using R

 Dear useR,

 Just wandering whether it is possible that there is any function in R
 could
 let me get the text contents for a certain website.

 Thanks a lot!

 Best,

 Leon

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a website text content using R

2007-08-01 Thread Saeed Abu Nimeh

work with it as text. for text mining use:
1- http://wwwpeople.unil.ch/jean-pierre.mueller/
2- tm by Ingo F.

Am Stat wrote:
 Dear useR,
 
 Just wandering whether it is possible that there is any function in R could
 let me get the text contents for a certain website.
 
 Thanks a lot!
 
 Best,
 
 Leon
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a website text content using R

2007-08-01 Thread Steven McKinney

-Original Message-
From: [EMAIL PROTECTED] on behalf of Am Stat
Sent: Wed 8/1/2007 2:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Extracting a website text content using R

Dear useR,

Just wandering whether it is possible that there is any function in R could
let me get the text contents for a certain website.

Thanks a lot!

Best,

Leon

Is this what you had in mind?

 foo - scan(url(http://cran.r-project.org/;), what = character)
Read 69 items
 paste(unlist(foo), collapse =  )
[1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN  html head titleThe 
Comprehensive R Archive Network/title link rel=\icon\ href=\favicon.ico\ 
type=\image/x-icon\ link rel=\shortcut icon\ href=\favicon.ico\ 
type=\image/x-icon\ link rel=\stylesheet\ type=\text/css\ 
href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0 FRAMESET 
rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\ frameborder=0 FRAME 
src=\navbar.html\ name=\contents\ frameborder=0 /FRAMESET FRAME 
src=\banner.shtml\ name=\banner\ frameborder=0 noframes h1The 
Comprehensive R Archive Network/h1 Your browser seems not to support frames, 
here is the A href=\navbar.html\contents page/A of CRAN. /noframes 
/FRAMESET

Try the search phrase

cran scan url

in Google for more hits on
info about R functions that
can deal with URLs.

In R try

 apropos(URL)
 [1] contourLines   URLdecode  URLencode  browseURL  
contrib.urlmain.help.url  url.show  
 [8] loadURLread.table.url scan.url   source.url url  

SteveM

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a website text content using R

2007-08-01 Thread mtmorgan

Perhaps more fun is

 library(XML)
 res = htmlTreeParse(http://www.omegahat.org/RSXML/;, useInternalNodes=TRUE)
 xpathApply(res, //h1, xmlValue)
[[1]]
[1] An XML package for the S language

Martin

Quoting Steven McKinney [EMAIL PROTECTED]:

 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Am Stat
 Sent: Wed 8/1/2007 2:19 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Extracting a website text content using R
  
 Dear useR,
 
 Just wandering whether it is possible that there is any function in R could
 let me get the text contents for a certain website.
 
 Thanks a lot!
 
 Best,
 
 Leon
 
   
 
 
 Is this what you had in mind?
 
  foo - scan(url(http://cran.r-project.org/;), what = character)
 Read 69 items
  paste(unlist(foo), collapse =  )
 [1] !DOCTYPE HTML PUBLIC -//IETF//DTD HTML//EN  html head titleThe
 Comprehensive R Archive Network/title link rel=\icon\
 href=\favicon.ico\ type=\image/x-icon\ link rel=\shortcut icon\
 href=\favicon.ico\ type=\image/x-icon\ link rel=\stylesheet\
 type=\text/css\ href=\R.css\ /head FRAMESET cols=\1*, 4*\ border=0
 FRAMESET rows=\120, 1*\ FRAME src=\logo.html\ name=\logo\
 frameborder=0 FRAME src=\navbar.html\ name=\contents\ frameborder=0
 /FRAMESET FRAME src=\banner.shtml\ name=\banner\ frameborder=0
 noframes h1The Comprehensive R Archive Network/h1 Your browser seems
 not to support frames, here is the A href=\navbar.html\contents page/A
 of CRAN. /noframes /FRAMESET
 
 
 Try the search phrase
 
 cran scan url
 
 in Google for more hits on
 info about R functions that
 can deal with URLs.
 
 In R try
 
  apropos(URL)
  [1] contourLines   URLdecode  URLencode  browseURL 
 contrib.urlmain.help.url  url.show  
  [8] loadURLread.table.url scan.url   source.url
 url   
 
 
 SteveM
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting a website text content using R

Re: [R] Extracting a website text content using R

Re: [R] Extracting a website text content using R

Re: [R] Extracting a website text content using R

Re: [R] Extracting a website text content using R

5 matches

Site Navigation

Mail list logo

Footer information