I think we looked into Lucene awhile ago and it seemed like a good
option except for its restriction to text-only, making it tough to index
PDF or Word docs.

Is there any open-source tool out there to read formats like Word,
Excel, PDF, etc. and turn them to text so the can be handled by Lucene
indexing? If not, anyone know what the options are for handling these
other formats?

d




________________________________

<< ella for Spam Control >> has removed 1014 Spam messages and set aside
0 Newsletters for me
You can use it too - and it's FREE!  www.ellaforspam.com        

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Find out how CFTicket can increase your company's customer support 
efficiency by 100%
http://www.houseoffusion.com/banners/view.cfm?bannerid=49

Message: http://www.houseoffusion.com/lists.cfm/link=i:17:5657
Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/17
Subscription: http://www.houseoffusion.com/lists.cfm/link=s:17
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.17
Donations & Support: http://www.houseoffusion.com/tiny.cfm/54

Reply via email to