Hi,

 

I have nutch and solr based crawling setup and done but have a use case in
mind to implement.

I wish to remove meta description tag content (e.g. <meta name="description"
content="some page content"/>)from being parsed as part of the content part
of the crawled page.

How do I achieve that?

I have already written a index plugin to get meta description as separate
field for solr to index.

Any pointers will be much appreciated.

 

Regards,

Swaraj Yadav

Reply via email to