Re: nutch javascript capabilities

Michael Gang Tue, 15 Jan 2013 00:03:27 -0800

Hi,

I understand.
Is there a way to use for a set of predefined pages another browser as
fetcher?
For example, would it be possible to say nutch that he should use firefox
or htmlunit as a fetcher?
There are many internet sites with ajax loads and where a click makes a
form submit, where no real html snippets exist.


Thanks,
David


On Sun, Jan 13, 2013 at 8:08 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> This should be correct yes.
> If you look at the plugin source you can see the patterns it uses to
> extract links.
> Also you can check what's iyour crawldb using the readdb command
> Hth
> Lewis
>
> On Saturday, January 12, 2013, Michael Gang <[email protected]> wrote:
> > Hi,
> >
> > So if there is a javascript which actually submits a form, nutch won't
> > follow the link, because it just deals with urls.
> > Is this correct?
> >
> > Thanks,
> > David
> >
> >
> > On Tue, Jan 8, 2013 at 5:15 PM, Michael Gang <[email protected]>
> wrote:
> >
> >> Hi all,
> >>
> >> From the features of nutch
> >> http://wiki.apache.org/nutch/Features
> >> i understand that there is a sort of javascript support
> >>
> >> JavaScript (for extracting links only?) (parse-js)
> >>
> >> I don't understand what this exactly means.
> >> Let's say if i have a link
> >> <a onclick="do_something">
> >> or a jquery binding in onready
> >> and in this code i open a new window and show there a result of a form
> >> submit
> >> will nutch extract for me the resulting page as link ?
> >>
> >> Thanks,
> >> David
> >>
> >>
> >
>
> --
> *Lewis*
>

Re: nutch javascript capabilities

Reply via email to