Hello.

Try using EmptyParser for those types of document which you want to avoid 
indexing(document content) in your tika-config file.

Thanks and regards,
Nilay Parmar


-----Original Message-----
From: Patrick Welfringer [mailto:[email protected]] 
Sent: Wednesday, December 18, 2013 3:21 PM
To: [email protected]
Subject: Can Lucene be configured to avoid downloading file contents?

Hi,



*Can anyone familiar with Lucene please share their insight?*

The question is this: *is there any way to configure Lucene to index only
certain whitelisted metadata*, or exclude blacklisted metadata?



Indeed, we believe that excluding the “file” metadata could dramatically
reduce the time it takes Lucene to download and process the large number of
PDF files in our particular setup.



We don’t need file contents to be indexed, only other metadata like
“creation date”, “keywords” etc.

The “Luke” tool tells us that none of the file contents are indexed. Yet
during the hour long indexing, we see all of the metadata being downloaded
and written to disk, including document contents.



If you can help us find a way to prevent Lucene to index the entire
Jackrabbit repository, you’ll cheer up many mailing list subscribers that
have similar issues!



Cheers,

Patrick

"Legal Disclaimer: This electronic message and all contents contain information 
from Cybage Software Private Limited which may be privileged, confidential, or 
otherwise protected from disclosure. The information is intended to be for the 
addressee(s) only. If you are not an addressee, any disclosure, copy, 
distribution, or use of the contents of this message is strictly prohibited. If 
you have received this electronic message in error please notify the sender by 
reply e-mail to and destroy the original message and all copies. Cybage has 
taken every reasonable precaution to minimize the risk of malicious content in 
the mail, but is not liable for any damage you may sustain as a result of any 
malicious content in this e-mail. You should carry out your own malicious 
content checks before opening the e-mail or attachment." 
www.cybage.com

Reply via email to