FW: Extract footer/header text out of Word docs

Alex Cougarman Thu, 30 Aug 2012 06:30:36 -0700

Dear friends. Sorry, I posted to Solr. Any ideas on this question?

Sincerely,
Alex



-----Original Message-----
From: Otis Gospodnetic [mailto:[email protected]] 
Sent: 30 August 2012 4:28 PM
To: [email protected]
Subject: Re: Extract footer/header text out of Word docs

Hi Alex,

I think you may get better help on the Tika mailing list - Solr uses Tika to 
parse rich text docs and extract text from them.  I don't know if Tika can 
figure out what's from a header and a footer...

Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



----- Original Message -----
> From: Alex Cougarman <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc: 
> Sent: Thursday, August 30, 2012 9:25 AM
> Subject: Extract footer/header text out of Word docs
> 
> Hi. Is it possible to specifically extract footer/header and body text out of 
> a 
> Word document using Solr? In other words, we'd like to index/store those 
> items in different Solr fields.
> 
> Also, is it possible to search on specific styles within a Word document? Can 
> these attributes be indexed? Thanks.
> 
> Sincerely,
> Alex
>

FW: Extract footer/header text out of Word docs

Reply via email to