You might try remote streaming with Solr (see http://wiki.apache.org/solr/SolrConfigXml ). Otherwise, look into a crawler such as Nutch or Droids or Heretrix.

-Grant

On Oct 27, 2009, at 11:14 AM, Insight 49, LLC wrote:

Hi,

If I use the ExtractingRequestHandler <http://wiki.apache.org/solr/ExtractingRequestHandler > on a local file (as shown in http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput ), all works well, but how do I do this for files located on a server?

e.g. (works)
curl http://localhost:8983/solr/update/extract?extractOnly=true -- data-binary @mylocalfile.htm -H "Content-type:text/html"

e.g (doesn't work)
curl http://localhost:8983/solr/update/extract?extractOnly=true -- data-binary @http://myweb.com/mylocalfile.htm -H "Content-type:text/ html"

Thanks,

Dan


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to