On 7/1/11 6:08 PM, Victor Villa wrote:
> I'm in the market for an indexer that can index doc, docx, xls, xlsx,
> pdf, html and mysql fields.
> 
> the file formats are important, if i have to search in mysql and
> aggregate the result set, that's fine too.

You can check out Zend Framework Lucene:

http://framework.zend.com/manual/en/zend.search.lucene.html

Lucene will handle the document types that you want. Lucene though is a java
application, but I think the zend framework does a great job of wrapping that up
for you.

I know with lucene and sphinx you need to get the metadata into the application
in order to do the indexing. You can use sphinx; however, for some of the ms doc
types you will have to provide a method of getting the metadata for sphinx. It
is not impossible to do; but might take some work on your part as that
functionality is not provided out of the box.

Let me know how it goes or if you have any questions. I built a help system
doing exactly the same type of indexing that you are doing and it turned out
really good. I even added pictures and videos to it too.


-- 
thebigdog

_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to