You can send PDF files via SolrJ:  
http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/

I'm sure the various other clients could do the same thing. All you really need is a way to upload the files.

Still, sending lots of rich docs over the wire isn't always the best way, either. You may want to write your own client side API using Tika to do that.

-Grant

On Oct 27, 2009, at 6:49 AM, <markus.rietz...@rzf.fin-nrw.de> <markus.rietz...@rzf.fin-nrw.de > wrote:

thanxs,
i know and read that page. sending additional meta-tags with the curl call is no problem. i only thought that there might be a way to use the xml- approach
also with PDF files. i'll go the "curl"-way for that files.

--
mit freundlichen Grüßen

Markus Rietzler - <rietzler_software/>
Rechenzentrum der Finanzverwaltung NRW
0211/4572-2130


-----Ursprüngliche Nachricht-----
Von: Grant Ingersoll [mailto:gsing...@apache.org]
Gesendet: Dienstag, 27. Oktober 2009 11:43
An: solr-user@lucene.apache.org
Betreff: Re: solr cell/tika: pdf import with xml metatags


On Oct 27, 2009, at 6:36 AM, <markus.rietz...@rzf.fin-nrw.de>
<markus.rietz...@rzf.fin-nrw.de
wrote:

hi,

we want to use SOLR as our intranet search engine.
i downloaded the nightly bild of solr 1.4. pdf extraction does via
Solr Cell/Tika. i can send the pdf via curl
to solr.

we do have a large set of meta-tags to all our intranet documents,
including PDF, PPT etc. to import html
files from our CMS i have access to all of this meta tags
and create
a xml document which i send to SOLR,

eg.

<?xml version='1.0' encoding='UTF-8'?>
<add>
<doc>
<field name="id">1</field>
<field name="title">this is the title</field>
</doc>
<doc>
<field name="id">2</field>
<field name="title">this is another title</field>
</doc>
<doc>
<field name="id">3</field>
<field name="title">this is the third title</field>
</doc>
</add>

this works fine with html files where i can grab all the
meta tags,
including "body".

so my question is, can i use this xml-document to send a pdf file
also?

I'm not sure what you mean here, can you clarify?  PDF and other
"rich" documents can't be sent by XML.

ok, one way would be to use
the extracthandler with extract only and put the data in
the "body"-
field.

I guess all I can point you at right now is the wiki:
http://wiki.apache.org/solr/ExtractingRequestHandler

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search



--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to