You may want to check the headings plugin, it reads content from those elements 
and writes them to some field. Very basic.

 
 
-----Original message-----
> From:Vishal Sharma <[email protected]>
> Sent: Thursday 27th November 2014 17:59
> To: user <[email protected]>
> Subject: How to parse specific html tag in nutch+solr while crawling
> 
> I tried this on Google also. But, nothing useful. Appreciate any help.
> 
> Is there a way to parse specific html tag while doing the crawling with
> nutch and then indexing it to solr.
> 
> For-example I don't want all html page to go to content node. I would want
> to parse h1 h2 tags into separate nodes.
> 
> 
> 
> *Vishal Sharma**TL, SFDC*T: +1 650 288 6711
> E: [email protected] <[email protected]>
> www.grazitti.com [image: Description: LinkedIn]
> <http://www.linkedin.com/company/grazitti-interactive>[image: Description:
> Twitter] <https://twitter.com/grazitti>[image: fbook]
> <https://www.facebook.com/grazitti.interactive>*Zak*Calendar
> Salesforce1TM Calendar
> App for Teams
> <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>
> 

Reply via email to