On Tue, 20 Dec 2011, Swapna Vuppala wrote:
Can someone please suggest the method to capture the content within the "div" tag of a particular class ?

You'll likely have more luck asking on the SOLR list, as this looks to be a SOLR specific query and not Tika related

Nick

From: Swapna Vuppala [mailto:[email protected]]
Sent: Thursday, December 15, 2011 12:30 PM
To: [email protected]
Subject: Capture and map div tags

Hi,

I understand that we can specify parameters in ExtractingRequestHandler in 
solrconfig.xml to capture HTML tags of a particular type and map them to 
desired solr fields, like something below.

<str name="capture">div</str>
<str name="fmap.div">mysolrfield</str>

The above setting will capture content in "div" tags and copy to the solr field 
"mysolrfield".

What am interested is in capturing div tags with a particular class name to a solr field. When extracting 
content from outlook messages, I would like to capture the content within <div 
class="message-body"> to go into a solr field and the content within <div 
class="attachment-entry"> to go into another solr field.

Can someone please let me know how to achieve this ?

Thanks and Regards,
Swapna.

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

Reply via email to