Hi group,
I just want to index certain documents based on URL type and
reject other documents. I understand that I can specify the URL pattern in
crawl-urlfiter.txt, but it is difficult to generated pattern for so many URLs
so I thought to maintain a separate properties file for those URLs and dont add
document to Index for these URLs. In my custom filter, I added a meta- tag
parse.getData().getParseMeta().set("indexit", new
Boolean(shouldIndex).toString());
And check the value of this meta-tag in write method of RecordWriter, however
that does not seem to work.
Any idea? I think, I have to check for this meta-tag somewhere in Indexer
class, I am not sure if you can guide, would be great.
- BR
---------------------------------
Never miss a thing. Make Yahoo your homepage.