Hi,

I'm planning on modifying protocol-httpclient (HttpResponse.java) based on
this PhantonJSDriver tutorial:
http://assertselenium.com/2013/03/25/getting-started-with-ghostdriver-phantomjs/

I will let you know how it works out

Remi


On Wed, Jun 11, 2014 at 5:25 AM, Julien Nioche <
[email protected]> wrote:

> Hi Patrick
>
> You could look at the protocol-http plugin as an example.
>
> Julien
>
>
> On 10 June 2014 10:22, Patrick Kirsch <[email protected]> wrote:
>
> > Hey,
> >
> > On 06/10/2014 10:52 AM, Julien Nioche wrote:
> >
> >> Hi
> >>
> >> You can do that as a custom protocol implementation. The fetcher code
> >> would
> >> stay the same but the byte content returned for a given URL would be
> >> produced by phantomjs or whichever selenuim backend you'd to use.
> >>
> > Do you have a documentation/wiki link or example to start from?
> >
> > Currently I implemented it in
> > src/java/org/apache/nutch/fetcher/Fetcher.java
> > as hook, if it contains "html" and "head" in the first 500 characters.
> >
> > Regards,
> >  Patrick
> >
> >
> >> HTH
> >>
> >> Julien
> >>
> >>
> >> On 7 June 2014 11:35, remi tassing <[email protected]> wrote:
> >>
> >>  I'm currently looking at those separately but an integrated option
> would
> >>> be
> >>> more efficient.
> >>>
> >>> Looking forward for any experience sharing
> >>>
> >>>
> >>> On Sat, Jun 7, 2014 at 6:25 PM, Patrick Kirsch <[email protected]>
> wrote:
> >>>
> >>>  Hey list,
> >>>>   I'm sure this issue was asked several times, but a quick look in the
> >>>> nutch user archive did not help, so:
> >>>>
> >>>> Has anyone documentation or tried to use a browser (like chromium) or
> >>>> phantomjs etc. for fetching web pages?
> >>>>
> >>>> Due to a heavily loaded javascript site, nutch needs to see the fully
> >>>> rendered page.
> >>>>
> >>>> Second question, would it be better to implement it as plugin or
> rather
> >>>> native in the fetcher class?
> >>>>
> >>>> Regards,
> >>>>   Patrick
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to