Hello,

Looking at my crawler output, I noticed that some pages are not
captured, because they do some sort of js loading on pageLoad() -
these are not per se - lets say an ajax request to get some json, and
render it with in dom with js - however these are XHR calls that
return plain html.

Could nutch crawl these as well? The fetcher has to figure out the
pages to make calls, and then fetch them as well. If there were a
mechanism lets say to extract outlinks from the target page, and these
outlinks could include GET statements, but then how could this be
associated with the original page?

Best Regards,
C.B.

Reply via email to