Hi Howie
That is what i am looking at it
But as you said generalize for all requirements including intranet
requirement
I am better off doing what u said
Rgds
Prabu
On 3/9/06, Howie Wang <[EMAIL PROTECTED]> wrote:
>
> >What i want to do is i should add some header info in parse-filter which
> >will be used by index-filter to add my own nature of the new FIELD
> >
> >Rgds
> >Prabhu
>
> I would recommend doing it at the index phase if possible. If the end
> goal is to have it searchable from the index, ask if you really need to
> have
> the information at the parsing stage. If you decide you want to
> tweak your keywords, it's easy to re-index. If you do it at the parsing
> stage, it will take twice as long since you have to re-parse and then
> re-index. Plus re-parsing is not complicated, but involves kind of a
> hack with renaming a bunch of directories.
>
> One reason to do your analysis at parse time is that it's easier to
> get the entire page contents like HTML tags in case you need that
> for categorization. If you don't need this stuff, you probably don't
> need to categorize at the parsing phase.
>
> If you really want to do it at parse time, it's not difficult. Take a
> look at parse-html. You can use the metadata object to store
> your category. Look in HtmlParseFilter.java in getParse. Just do:
>
> metadata.put("myfield", "sports");
>
> In your index filter, you can then do a metadata.get to get your
> category and then index it.
>
> Howie
>
>
>