Andrzej Bialecki wrote:
Philipp Suter wrote:
does anybody know how to crawl frames? Or how to extend nutch to be
able to crawl frames? We are using the api.
The development version (available from SVN) should handle frames just
fine, i.e. it should follow the src=... attributed in frames in order
to retrieve the frame contents. Please download the nightly snapshot
and try it out.
When do you think will it be released officially? we have some mision
critical stuff running with nutch, therefore I don't know if the nightly
snapshot is working for us but I'll try it out.
Have you ever thought about integrating a javascript interpreter into
nutch? this could be another big step thowards a wider range of
crawlable websites. If you need any help on this I would be very much
interested to support anybody (timewise) implementing such a functionality.
Have you evaluated flash either? is it possible to parse it?
cheers
ph
-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP,
AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general