Re: Indexing pdf files - question.

Adam Estrada Mon, 13 Dec 2010 08:59:57 -0800

Hi,

I use the following command to post PDF files.


$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\document.docx&stream.contentType=application/msword&literal.id
=esc.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\features.pdf&stream.contentType=application/pdf&literal.id
=esc2.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\Memo_ocrd.pdf&stream.contentType=application/pdf&literal.id
=Memo_ocrd.pdf&defaultField=text&commit=true"

The PDF's have to be OCR'd.

Adam

On Mon, Dec 13, 2010 at 11:01 AM, Siebor, Wlodek [USA] <
siebor_wlo...@bah.com> wrote:

> HI,
> Can sombody, please, send me a command for indexing a sample pdf with
> ExtractngRequestHandler file available in the /docs directory. I have
> lucidworks solr installed on linux, with standard schema.xml and
> solrconfig.xml files (unchanged). I want to pass as the unique id the name
> of the file.
> I’m trying various curl commands and so far I have either  “… missing
> required field: id” or “.. missing content stream” errors.
> Thanks for your help,
> Wlodek
>

Re: Indexing pdf files - question.

Reply via email to