Re: nutch javascript capabilities

Tejas Patil Tue, 15 Jan 2013 02:43:42 -0800

AFAIK, you cannot configure Fetcher to make use of firefox or htmlunit. You
will perhaps have to change the nutch source by yourself.


Thanks,
Tejas Patil


On Tue, Jan 15, 2013 at 12:02 AM, Michael Gang <[email protected]>wrote:

> Hi,
>
> I understand.
> Is there a way to use for a set of predefined pages another browser as
> fetcher?
> For example, would it be possible to say nutch that he should use firefox
> or htmlunit as a fetcher?
> There are many internet sites with ajax loads and where a click makes a
> form submit, where no real html snippets exist.
>
> Thanks,
> David
>
>
> On Sun, Jan 13, 2013 at 8:08 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > This should be correct yes.
> > If you look at the plugin source you can see the patterns it uses to
> > extract links.
> > Also you can check what's iyour crawldb using the readdb command
> > Hth
> > Lewis
> >
> > On Saturday, January 12, 2013, Michael Gang <[email protected]>
> wrote:
> > > Hi,
> > >
> > > So if there is a javascript which actually submits a form, nutch won't
> > > follow the link, because it just deals with urls.
> > > Is this correct?
> > >
> > > Thanks,
> > > David
> > >
> > >
> > > On Tue, Jan 8, 2013 at 5:15 PM, Michael Gang <[email protected]>
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> From the features of nutch
> > >> http://wiki.apache.org/nutch/Features
> > >> i understand that there is a sort of javascript support
> > >>
> > >> JavaScript (for extracting links only?) (parse-js)
> > >>
> > >> I don't understand what this exactly means.
> > >> Let's say if i have a link
> > >> <a onclick="do_something">
> > >> or a jquery binding in onready
> > >> and in this code i open a new window and show there a result of a form
> > >> submit
> > >> will nutch extract for me the resulting page as link ?
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >>
> > >
> >
> > --
> > *Lewis*
> >
>

Re: nutch javascript capabilities

Reply via email to