Andrzej Bialecki wrote:

Philipp Suter wrote:

does anybody know how to crawl frames? Or how to extend nutch to be able to crawl frames? We are using the api.


The development version (available from SVN) should handle frames just fine, i.e. it should follow the src=... attributed in frames in order to retrieve the frame contents. Please download the nightly snapshot and try it out.


When do you think will it be released officially? we have some mision critical stuff running with nutch, therefore I don't know if the nightly snapshot is working for us but I'll try it out.

Have you ever thought about integrating a javascript interpreter into nutch? this could be another big step thowards a wider range of crawlable websites. If you need any help on this I would be very much interested to support anybody (timewise) implementing such a functionality.

Have you evaluated flash either? is it possible to parse it?

cheers
ph

Reply via email to