Hi Lewis,

The idea is to move some of the processing from indexing to parsing, hoping
to limit the latency on Solr.

I've looked at the wiki, and it may be me, I am having a difficult time
understanding the process as a whole.  I am very unfamiliar with crawling,
parsing and indexing.   I'm just trying to understand how everything works
together and at which point the plugins are run.

Thanks,
Jim

On Thu, Apr 26, 2012 at 4:49 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Jim,
>
> On Thu, Apr 26, 2012 at 2:23 PM, Jim Chandler <[email protected]>
> wrote:
> > I am in the
> > process of trying to change a plugin from an IndexingFilter to a Parser.
>
> Personally I wouldn't do this, I would pick up an existing parser and
> edit it into another parser! Do you have any specific reasons for
> doing this the other way around?
>
> > am having difficultying understanding where in the nutch process each one
> > of these is run.
>
> Well the parser is run once you have fetched your pages and you wish
> to extract content from them.
> The indexingfilter is used when you wish to send things to be indexed
> in some sort of custom manner.
>
>
> >  Does anyone have any recommendations of sites or books that  would be
> > helpful?
> >
>
> What I think your speaking about is getting up to speed with plugins;
> how they are used, what they comprise of, and how they can be built to
> solve your domain specific problems.
>
> Check out our wiki, it's the best source of Nutch info on the web...
>
> http://wiki.apache.org/nutch/
> http://wiki.apache.org/nutch/PluginCentral
>
> hth
>
> Lewis
>
>
>
> --
> Lewis
>

Reply via email to