Hi, I'm planning on modifying protocol-httpclient (HttpResponse.java) based on this PhantonJSDriver tutorial: http://assertselenium.com/2013/03/25/getting-started-with-ghostdriver-phantomjs/
I will let you know how it works out Remi On Wed, Jun 11, 2014 at 5:25 AM, Julien Nioche < [email protected]> wrote: > Hi Patrick > > You could look at the protocol-http plugin as an example. > > Julien > > > On 10 June 2014 10:22, Patrick Kirsch <[email protected]> wrote: > > > Hey, > > > > On 06/10/2014 10:52 AM, Julien Nioche wrote: > > > >> Hi > >> > >> You can do that as a custom protocol implementation. The fetcher code > >> would > >> stay the same but the byte content returned for a given URL would be > >> produced by phantomjs or whichever selenuim backend you'd to use. > >> > > Do you have a documentation/wiki link or example to start from? > > > > Currently I implemented it in > > src/java/org/apache/nutch/fetcher/Fetcher.java > > as hook, if it contains "html" and "head" in the first 500 characters. > > > > Regards, > > Patrick > > > > > >> HTH > >> > >> Julien > >> > >> > >> On 7 June 2014 11:35, remi tassing <[email protected]> wrote: > >> > >> I'm currently looking at those separately but an integrated option > would > >>> be > >>> more efficient. > >>> > >>> Looking forward for any experience sharing > >>> > >>> > >>> On Sat, Jun 7, 2014 at 6:25 PM, Patrick Kirsch <[email protected]> > wrote: > >>> > >>> Hey list, > >>>> I'm sure this issue was asked several times, but a quick look in the > >>>> nutch user archive did not help, so: > >>>> > >>>> Has anyone documentation or tried to use a browser (like chromium) or > >>>> phantomjs etc. for fetching web pages? > >>>> > >>>> Due to a heavily loaded javascript site, nutch needs to see the fully > >>>> rendered page. > >>>> > >>>> Second question, would it be better to implement it as plugin or > rather > >>>> native in the fetcher class? > >>>> > >>>> Regards, > >>>> Patrick > >>>> > >>>> > >>>> > >>> > >> > >> > >> > > > > > -- > > Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble >

