Re: How to Collect dynamically created anchors from a page

Michael Joyce Fri, 05 Jun 2015 08:40:45 -0700

Have you tried using the protocol-selenium plugin? I've had luck using to
fetch pages with dynamically loaded content.


https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-selenium


-- Jimmy

On Fri, Jun 5, 2015 at 4:16 AM, Imtiaz Shakil Siddique <
[email protected]> wrote:

> Hi,
>
> I am using apache-nutch-1.9. My configuration ignores external links.
>
> I've some urls in my seed file. But the problem is , nutch crawler doesn't
> find the links in those pages because the site popuates content using ajax
> call. I've removed all possible regex filters inside conf folder of nutch.
>
> How can I collect those links. Any advice ?
> Thanks in advance.
>

Re: How to Collect dynamically created anchors from a page

Reply via email to