Re: Nutch use a Browser or phantomjs as fetcher

Julien Nioche Tue, 10 Jun 2014 01:53:24 -0700

Hi

You can do that as a custom protocol implementation. The fetcher code would
stay the same but the byte content returned for a given URL would be
produced by phantomjs or whichever selenuim backend you'd to use.


HTH

Julien


On 7 June 2014 11:35, remi tassing <[email protected]> wrote:

> I'm currently looking at those separately but an integrated option would be
> more efficient.
>
> Looking forward for any experience sharing
>
>
> On Sat, Jun 7, 2014 at 6:25 PM, Patrick Kirsch <[email protected]> wrote:
>
> > Hey list,
> >  I'm sure this issue was asked several times, but a quick look in the
> > nutch user archive did not help, so:
> >
> > Has anyone documentation or tried to use a browser (like chromium) or
> > phantomjs etc. for fetching web pages?
> >
> > Due to a heavily loaded javascript site, nutch needs to see the fully
> > rendered page.
> >
> > Second question, would it be better to implement it as plugin or rather
> > native in the fetcher class?
> >
> > Regards,
> >  Patrick
> >
> >
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Nutch use a Browser or phantomjs as fetcher

Reply via email to