On 16 February 2012 14:33, alessio crisantemi <alessio.crisant...@gmail.com> wrote: > Hi all, > I have a problem to configure a pdf indexing from a directory in my solr > wit DIH: > > with this data-config > > > <dataConfig> > <dataSource type="BinFileDataSource" /> > <document> > <entity > name="tika-test" > processor="FileListEntityProcessor" > baseDir="D:\gioconews_archivio\marzo2011" > fileName=".*pdf" > recursive="true" > rootEntity="false" > dataSource="null"/> > <entity processor="FileListEntityProcessor" > url="D:\gioconews_archivio\marzo2011" format="text" > > <field column="author" name="author" meta="true"/> > <field column="title" name="title" meta="true"/> > <field column="description" name="description" /> > <field column="comments" name="comments" /> > > <field column="content_type" name="content_type" /> > <field column="last_modified" name="last_modified" /> > </entity> > </document> > </dataConfig> [...]
You should look in your Solr logs for more details about the exception, but as things stand, the above setup will not work for indexing PDF files. You need Tika. Searching Google for "solr tika index pdf" turns up many possibilities, e.g., http://www.abcseo.com/tech/search/integrating-solr-and-tika http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/ Regards, Gora