Re: retrieving remote web content

2002-11-10 Thread Bill Barker
There is also the Jakarta HttpClient:
http://jakarta.apache.org/commons/httpclient/.

Reynir Hübner [EMAIL PROTECTED] wrote in message
news:88500E0F870AA542B6340D8BC05A9E1B0AD087;rup.hugsmidjan.is...
Hi,

I haven't made a servlet to do this, but I made a jsp-tag that can do this.

If you don't want to move the images from one server to another (from google
to yours) as a proxy would do it, then you must parse the HTML, and change
all the urls for css, img, hrefs, javascripts and a lot more so that they
are fully qualified urls such as http://google.com/images/logo.gif but not
only /images/logo.gif or such.

This is usually not very complicated, but it can be a little tricky,
especially with javascripts and such.
I used regular expression to do this, more specifically the jakarta-oro
package.. I still recommend some serverside cacheing of parsed pages, as
this can be quite process demanding procedure.

If you find some library to do this, please tell us about it.

There are some libraries that might help doing the http-requests, so check
that one out, its HTTPClient:
http://www.innovation.ch/java/HTTPClient/

Hope it helps,
-reynir

 -Original Message-
 From: Jason Novotny [mailto:jdnovotny;lbl.gov]
 Sent: 9. nóvember 2002 22:44
 To: Tomcat Users List; Jetspeed Developers List
 Subject: retrieving remote web content



 Hi,

 I'm trying to develop a servlet that can act as a proxy
 for another
 web site-- lets' say I'm trying to provide the content of
 www.google.com. It seems I can retrieve and cache the HTML using a
 URLConnection, but what about the resources used by the HTML
 like gif's
 and jpg's. Somehow I need to parse the HTML and get those
 separately? Is
 there a library out there for doing what I describe? Maybe
 I'm missing
 something relaly simple...


 Thanks, Jason


 --
 To unsubscribe, e-mail:
 mailto:tomcat-user- [EMAIL PROTECTED]
 For
 additional commands,
 e-mail: mailto:tomcat-user-help;jakarta.apache.org







--
To unsubscribe, e-mail:   mailto:tomcat-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:tomcat-user-help;jakarta.apache.org




RE: retrieving remote web content

2002-11-09 Thread Reynir Hübner
Hi, 

I haven't made a servlet to do this, but I made a jsp-tag that can do this. 

If you don't want to move the images from one server to another (from google to yours) 
as a proxy would do it, then you must parse the HTML, and change all the urls for css, 
img, hrefs, javascripts and a lot more so that they are fully qualified urls such as 
http://google.com/images/logo.gif but not only /images/logo.gif or such. 

This is usually not very complicated, but it can be a little tricky, especially with 
javascripts and such. 
I used regular expression to do this, more specifically the jakarta-oro package.. I 
still recommend some serverside cacheing of parsed pages, as this can be quite process 
demanding procedure. 

If you find some library to do this, please tell us about it.

There are some libraries that might help doing the http-requests, so check that one 
out, its HTTPClient:
http://www.innovation.ch/java/HTTPClient/

Hope it helps, 
-reynir

 -Original Message-
 From: Jason Novotny [mailto:jdnovotny;lbl.gov] 
 Sent: 9. nóvember 2002 22:44
 To: Tomcat Users List; Jetspeed Developers List
 Subject: retrieving remote web content
 
 
 
 Hi,
 
 I'm trying to develop a servlet that can act as a proxy 
 for another 
 web site-- lets' say I'm trying to provide the content of 
 www.google.com. It seems I can retrieve and cache the HTML using a 
 URLConnection, but what about the resources used by the HTML 
 like gif's 
 and jpg's. Somehow I need to parse the HTML and get those 
 separately? Is 
 there a library out there for doing what I describe? Maybe 
 I'm missing 
 something relaly simple...
 
 
 Thanks, Jason
 
 
 --
 To unsubscribe, e-mail:   
 mailto:tomcat-user- [EMAIL PROTECTED]
 For 
 additional commands, 
 e-mail: mailto:tomcat-user-help;jakarta.apache.org
 
 

--
To unsubscribe, e-mail:   mailto:tomcat-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:tomcat-user-help;jakarta.apache.org