I think we looked into Lucene awhile ago and it seemed like a good option except for its restriction to text-only, making it tough to index PDF or Word docs.
Is there any open-source tool out there to read formats like Word, Excel, PDF, etc. and turn them to text so the can be handled by Lucene indexing? If not, anyone know what the options are for handling these other formats? d ________________________________ << ella for Spam Control >> has removed 1014 Spam messages and set aside 0 Newsletters for me You can use it too - and it's FREE! www.ellaforspam.com ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Find out how CFTicket can increase your company's customer support efficiency by 100% http://www.houseoffusion.com/banners/view.cfm?bannerid=49 Message: http://www.houseoffusion.com/lists.cfm/link=i:17:5657 Archives: http://www.houseoffusion.com/cf_lists/threads.cfm/17 Subscription: http://www.houseoffusion.com/lists.cfm/link=s:17 Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.17 Donations & Support: http://www.houseoffusion.com/tiny.cfm/54
