We did this using selenium where we turned off protocol-http and used
custom protocol-selenium where 'http' was bound to it.

Simple way is to let the page render and get the entire text i.e. in Nutch
terminology it becomes ParseText;


On Sat, Jun 22, 2013 at 1:28 AM, Julien Nioche <
[email protected]> wrote:

> One way around this is to have a custom protocol implementation and get it
> to fetch via Selenium
>
> J.
>
> On 21 June 2013 19:54, Lewis John Mcgibbney <[email protected]
> >wrote:
>
> > Hi,
> > Nearly all of this page is generated by JS right?
> > Right now my answer is no. We fetch then parse page source... which in
> this
> > case is mostly all JS. The magic happens in the browser.
> > ...
> > Lewis
> >
> >
> > On Tue, Jun 18, 2013 at 10:59 PM, Deals Collect <[email protected]
> > >wrote:
> >
> > > Hi all,
> > >
> > > Can Nutch get the HTML content generated by Javascript? For example,
> this
> > > job site
> > >
> > >
> >
> https://schneiderele.taleo.net/careersection/2/jobdetail.ftl?job=72522&lang=en
> > >
> > >
> > > Many thanks,
> > >
> >
> >
> >
> > --
> > *Lewis*
> >
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
> http://twitter.com/digitalpebble
>

Reply via email to