Richard.

So would I do something like
>
> 1. parse out the citation
> 2. metadata.put(<citation>, <citation>);



Yes, I think that is the way to proceed. And then on implementing the
Indexing and Query FIlters, all as desribed in the WritingPlugin tutorial:
http://wiki.apache.org/nutch/WritingPluginExample

Rgrds, Thomas

?
>
> Thanks for your help on this.
>
>
> -----Original Message-----
> From: Raghavendra Prabhu [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 09, 2006 2:53 AM
> To: [email protected]
> Subject: Re: writing a metadata content tag
>
>
> Hi Howie
>
> That is what i am looking at it
>
> But as you said generalize for all requirements including intranet
> requirement
>
> I am better off doing what u said
>
> Rgds
> Prabu
>
>
> On 3/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
> >
> > >What i want to do is i should add some header info in parse-filter
> > >which will be used by index-filter to add my own nature of the new
> > >FIELD
> > >
> > >Rgds
> > >Prabhu
> >
> > I would recommend doing it at the index phase if possible. If the end
> > goal is to have it searchable from the index, ask if you really need
> > to have the information at the parsing stage. If you decide you want
> > to tweak your keywords, it's easy to re-index. If you do it at the
> > parsing stage, it will take twice as long since you have to re-parse
> > and then re-index. Plus re-parsing is not complicated, but involves
> > kind of a hack with renaming a bunch of directories.
> >
> > One reason to do your analysis at parse time is that it's easier to
> > get the entire page contents like HTML tags in case you need that for
> > categorization. If you don't need this stuff, you probably don't need
> > to categorize at the parsing phase.
> >
> > If you really want to do it at parse time, it's not difficult. Take a
> > look at parse-html. You can use the metadata object to store your
> > category. Look in HtmlParseFilter.java in getParse. Just do:
> >
> > metadata.put("myfield", "sports");
> >
> > In your index filter, you can then do a metadata.get to get your
> > category and then index it.
> >
> > Howie
> >
> >
> >
>
>

Reply via email to