Hi Patrick You could look at the protocol-http plugin as an example.
Julien On 10 June 2014 10:22, Patrick Kirsch <[email protected]> wrote: > Hey, > > On 06/10/2014 10:52 AM, Julien Nioche wrote: > >> Hi >> >> You can do that as a custom protocol implementation. The fetcher code >> would >> stay the same but the byte content returned for a given URL would be >> produced by phantomjs or whichever selenuim backend you'd to use. >> > Do you have a documentation/wiki link or example to start from? > > Currently I implemented it in > src/java/org/apache/nutch/fetcher/Fetcher.java > as hook, if it contains "html" and "head" in the first 500 characters. > > Regards, > Patrick > > >> HTH >> >> Julien >> >> >> On 7 June 2014 11:35, remi tassing <[email protected]> wrote: >> >> I'm currently looking at those separately but an integrated option would >>> be >>> more efficient. >>> >>> Looking forward for any experience sharing >>> >>> >>> On Sat, Jun 7, 2014 at 6:25 PM, Patrick Kirsch <[email protected]> wrote: >>> >>> Hey list, >>>> I'm sure this issue was asked several times, but a quick look in the >>>> nutch user archive did not help, so: >>>> >>>> Has anyone documentation or tried to use a browser (like chromium) or >>>> phantomjs etc. for fetching web pages? >>>> >>>> Due to a heavily loaded javascript site, nutch needs to see the fully >>>> rendered page. >>>> >>>> Second question, would it be better to implement it as plugin or rather >>>> native in the fetcher class? >>>> >>>> Regards, >>>> Patrick >>>> >>>> >>>> >>> >> >> >> > -- Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

