If you go the java httpclient path, you will probably find tagsoup (http://home.ccil.org/~cowan/XML/tagsoup/) helpful.
Also, you might want to check out Open QA's Selenium (http://www.openqa.org/selenium/). It's intended to be used as a test tool, but you might find it useful if faced with particularly nasty javascript in the webpages you are intending to scrape. Josh On 10/16/06, Phillip Rhodes <[EMAIL PROTECTED]> wrote:
Owen Berry wrote: > If you can write Perl code, take a look at LWP::UserAgent, > HTML::TreeBuilder and HTTP::Cookies (if you need cookies) for it to > work. I've used this to bulk retrieve information off a website (with > permission) using forms, cookies etc. > Of if Java appeals to you, take a look at Jakarta HTTPClient: <http://jakarta.apache.org/commons/httpclient/> TTYL, Phil -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
-- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
