Andrzej Bialecki wrote:
Philipp Suter wrote:
does anybody know how to crawl frames? Or how to extend nutch to be
able to crawl frames? We are using the api.
The development version (available from SVN) should handle frames just
fine, i.e. it should follow the src=... attributed in frames in order
to retrieve the frame contents. Please download the nightly snapshot
and try it out.
When do you think will it be released officially? we have some mision
critical stuff running with nutch, therefore I don't know if the nightly
snapshot is working for us but I'll try it out.
Have you ever thought about integrating a javascript interpreter into
nutch? this could be another big step thowards a wider range of
crawlable websites. If you need any help on this I would be very much
interested to support anybody (timewise) implementing such a functionality.
Have you evaluated flash either? is it possible to parse it?
cheers
ph