You may want to check the headings plugin, it reads content from those elements and writes them to some field. Very basic.
-----Original message----- > From:Vishal Sharma <[email protected]> > Sent: Thursday 27th November 2014 17:59 > To: user <[email protected]> > Subject: How to parse specific html tag in nutch+solr while crawling > > I tried this on Google also. But, nothing useful. Appreciate any help. > > Is there a way to parse specific html tag while doing the crawling with > nutch and then indexing it to solr. > > For-example I don't want all html page to go to content node. I would want > to parse h1 h2 tags into separate nodes. > > > > *Vishal Sharma**TL, SFDC*T: +1 650 288 6711 > E: [email protected] <[email protected]> > www.grazitti.com [image: Description: LinkedIn] > <http://www.linkedin.com/company/grazitti-interactive>[image: Description: > Twitter] <https://twitter.com/grazitti>[image: fbook] > <https://www.facebook.com/grazitti.interactive>*Zak*Calendar > Salesforce1TM Calendar > App for Teams > <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3> >

