On 16 February 2012 14:33, alessio crisantemi
<alessio.crisant...@gmail.com> wrote:
> Hi all,
> I have a problem to configure a pdf indexing from a directory in my solr
> wit DIH:
>
> with this data-config
>
>
> <dataConfig>
>  <dataSource type="BinFileDataSource" />
>  <document>
>  <entity
>    name="tika-test"
>    processor="FileListEntityProcessor"
>    baseDir="D:\gioconews_archivio\marzo2011"
>    fileName=".*pdf"
>    recursive="true"
>    rootEntity="false"
>    dataSource="null"/>
>  <entity processor="FileListEntityProcessor"
> url="D:\gioconews_archivio\marzo2011" format="text" >
>   <field column="author"  name="author" meta="true"/>
>   <field column="title" name="title" meta="true"/>
>     <field column="description" name="description" />
>     <field column="comments" name="comments" />
>
>     <field column="content_type" name="content_type" />
>     <field column="last_modified" name="last_modified" />
>  </entity>
>  </document>
> </dataConfig>
[...]

You should look in your Solr logs for more details about
the exception, but as things stand, the above setup will
not work for indexing PDF files. You need Tika. Searching
Google for "solr tika index pdf" turns up many possibilities,
e.g.,
http://www.abcseo.com/tech/search/integrating-solr-and-tika
http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/

Regards,
Gora

Reply via email to