Be sure to direct the "content" to a "stored" field (such as "content")
which you can add to your "fl" field list to return. Then use a copyField to
copy that stored field to the "text" field for searching.
Again, this is all simplified in Solr 4.0-BETA.
-- Jack Krupansky
-----Original Message-----
From: Alexander Troost
Sent: Sunday, September 16, 2012 11:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell
Hi, first of all: Thank you for that quick response!
But i am not sure if i am doing this right.
For my point of view the command now has to look like:
curl "
http://localhost:8983/solr/update/extract?literal.id=doc11&literal.filename=markus&fmap.content=text&commit=true"
-F "myfile=@markus.pdf"
When I am seaching now for Text in the PDF, i am getting the result:
<result name="response" numFound="1" start="0">
<doc>
<str name="author">A28240</str>
<arr name="content_type"><str>application/pdf</str></arr>
<str name="id">doc11</str>
<date name="last_modified">2012-09-17T03:49:39Z</date>
</doc>
</result>
SORRY for being such a newbie and sorry for my bad english. It's 6 AM here
and i spend the whole night at the computer :-)
Greetz
A
2012/9/17 Jack Krupansky <j...@basetechnology.com>
The content will be sent to the "content" field, which you can redirect
using the &fmap.content=some-field request parameter. You need to
explicitly set the file name field yourself, using the
&literal.your-file-name-field=**file-name request parameter.
Also, if using Solr 4.0-BETA, you can simply use the SimplePostTool
(post.jar) to send documents to SolrCell, which will automatically take
care of these extra steps.
-- Jack Krupansky
-----Original Message----- From: Alexander Troost
Sent: Sunday, September 16, 2012 10:16 PM
To: solr-user@lucene.apache.org
Subject: Indexing PDF-Files using Solr Cell
Hello *,
I've got a problem indexing and searching PDF-Files.
It seems like Solr doenst index the name of the file.
In returning i only get
<result name="response" numFound="1" start="0"><doc><str
name="author">A28240</str><arr
name="content_type"><str>**application/pdf</str></arr><**str
name="id">doc5</str><date
name="last_modified">2012-09-**17T01:45:39Z</date></doc></**result>
He founds the right document, but no content or title is displayed in the
XML-Response. Where do i config that?
I index my documents (right now) via curl
e.g.:
curl "http://localhost:8983/solr/**update/extract?literal.id=**
doc7&commit=true<http://localhost:8983/solr/update/extract?literal.id=doc7&commit=true>
"
-F "myfile=@xyz.pdf"
Where is my mistake?
Greeting
Alex