: In that respect I agree with the original posting that Solr lacks
: functionality with respect to desired functionality. One can argue that
: more or less random data should be structured by the user writing a
: decent application. However a more easy to use and configurable plugin
: architectur
: the text out of these types of documents. You could borrow the
: document parsing pieces from Lucene's contrib and Nutch and glue them
: together into your client that speaks to Solr, or perhaps Solr isn't
: the right approach for your needs? It certainly is possible to add
: these capabiliti
On Aug 30, 2006, at 2:42 AM, Bruno wrote:
browsing through the message thread I tried to find a trail
addressing file
system crawls. I want to implement an enterprise search over a
networked
filesystem, crawling all sorts of documents, such as html, doc, ppt
and pdf.
Nutch provides plugins