Be sure to direct the "content" to a "stored" field (such as "content") which you can add to your "fl" field list to return. Then use a copyField to copy that stored field to the "text" field for searching.

Again, this is all simplified in Solr 4.0-BETA.

-- Jack Krupansky

-----Original Message----- From: Alexander Troost
Sent: Sunday, September 16, 2012 11:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell

Hi, first of all: Thank you for that quick response!

But i am not sure if i am doing this right.

For my point of view the command now has to look like:

curl "
http://localhost:8983/solr/update/extract?literal.id=doc11&literal.filename=markus&fmap.content=text&commit=true";
-F "myfile=@markus.pdf"

When I am seaching now for Text in the PDF, i am getting the result:

<result name="response" numFound="1" start="0">
<doc>
<str name="author">A28240</str>
<arr name="content_type"><str>application/pdf</str></arr>
<str name="id">doc11</str>
<date name="last_modified">2012-09-17T03:49:39Z</date>
</doc>
</result>

SORRY for being such a newbie and sorry for my bad english. It's 6 AM here
and i spend the whole night at the computer :-)

Greetz

A


2012/9/17 Jack Krupansky <j...@basetechnology.com>

The content will be sent to the "content" field, which you can redirect
using the &fmap.content=some-field request parameter. You need to
explicitly set the file name field yourself, using the
&literal.your-file-name-field=**file-name request parameter.

Also, if using Solr 4.0-BETA, you can simply use the SimplePostTool
(post.jar) to send documents to SolrCell, which will automatically take
care of these extra steps.

-- Jack Krupansky

-----Original Message----- From: Alexander Troost
Sent: Sunday, September 16, 2012 10:16 PM
To: solr-user@lucene.apache.org
Subject: Indexing PDF-Files using Solr Cell


Hello *,

I've got a problem indexing and searching PDF-Files.

It seems like Solr doenst index the name of the file.

In returning i only get
<result name="response" numFound="1" start="0"><doc><str
name="author">A28240</str><arr
name="content_type"><str>**application/pdf</str></arr><**str
name="id">doc5</str><date
name="last_modified">2012-09-**17T01:45:39Z</date></doc></**result>

He founds the right document, but no content or title is displayed in the
XML-Response. Where do i config that?

I index my documents (right now) via curl

e.g.:

curl "http://localhost:8983/solr/**update/extract?literal.id=**
doc7&commit=true<http://localhost:8983/solr/update/extract?literal.id=doc7&commit=true>
"
-F "myfile=@xyz.pdf"


Where is my mistake?

Greeting

Alex


Reply via email to