Have you tried using the protocol-selenium plugin? I've had luck using to fetch pages with dynamically loaded content.
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-selenium -- Jimmy On Fri, Jun 5, 2015 at 4:16 AM, Imtiaz Shakil Siddique < [email protected]> wrote: > Hi, > > I am using apache-nutch-1.9. My configuration ignores external links. > > I've some urls in my seed file. But the problem is , nutch crawler doesn't > find the links in those pages because the site popuates content using ajax > call. I've removed all possible regex filters inside conf folder of nutch. > > How can I collect those links. Any advice ? > Thanks in advance. >

