Andrzej Bialecki wrote:

Philipp Suter wrote:

does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api.


The development version (available from SVN) should handle frames just fine, i.e. it should follow the src=... attributed in frames in order to retrieve the frame contents. Please download the nightly snapshot and try it out.


When do you think will it be released officially? we have some mision critical stuff running with nutch, therefore I don't know if the nightly snapshot is working for us but I'll try it out.

Have you ever thought about integrating a javascript interpreter into nutch? this could be another big step thowards a wider range of crawlable websites. If you need any help on this I would be very much interested to support anybody (timewise) implementing such a functionality.

Have you evaluated flash either? is it possible to parse it?

cheers
ph


-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to