Re: [R] Scrap java scripts and styles from an html document

2011-04-07 Thread antujsrv
Hi , I am working on developing a web crawler. Removing javascripts and styles is a part of the cleaning of the html document. What I want is a cleaned html document with only the html tags and textual information, so that i can figure out the pattern of the web page. This is being done to

Re: [R] Scrap java scripts and styles from an html document

2011-04-07 Thread Mike Marchywka
Date: Thu, 7 Apr 2011 04:15:50 -0700 From: antuj...@gmail.com To: r-help@r-project.org Subject: Re: [R] Scrap java scripts and styles from an html document Hi , I am working on developing a web crawler. Comments like this come up

[R] Scrap java scripts and styles from an html document

2011-03-29 Thread antujsrv
Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt =

Re: [R] Scrap java scripts and styles from an html document

2011-03-29 Thread Duncan Temple Lang
On 3/28/11 11:38 PM, antujsrv wrote: Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt =