On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote:
> hi all,
> I would index on solr my pdf files wich includeds on my directory c:\myfile\
> 
> so, I add on my solr/conf directory the file data-config.xml like the
> following:
> 
> 
> <dataConfig>
> <dataSource type="BinFileDataSource" />
> <document>
> <entity name="f" dataSource="null" rootEntity="false"

Why do you set rootEntity="false" on the root entity?
This looks odd to me - but I can be wrong, of course.

If DIH shows this:
"""
<str name="*Total Requests made to DataSource*">*0*</str>
"""

DIH hasn't even retrieved any data from you data source. Check that the
call you have configured really returns any documents.


Chantal




> processor="FileListEntityProcessor"
> baseDir="c:\myfile\" fileName="*.pdf"
> recursive="true">
> <entity name="tika-test" processor="TikaEntityProcessor"
> url="${f.fileAbsolutePath}" format="text">
> <field column="author" name="author" meta="true"/>
> <field column="title" name="title" meta="true"/>
>  <field column="content_type" name="content_type" meta="true"/>
> </entity>
> </entity>
> </document>
> </dataConfig>
> 
> before, I add this part into solr-config.xml:
> 
> 
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">c:\solr\conf\data-config.xml</str>
>     </lst>
>   </requestHandler>
> 
> 
> but this is the result:
> 
> ....
> * * <str name="*command*">*delta-import*</str>
>  * * <str name="*status*">*idle*</str>
>  * * <str name="*importResponse*" />
>  
> *-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#>
> <lst name="*statusMessages*">
>  * * <str name="*Time Elapsed*">*0:0:2.512*</str>
>  * * <str name="*Total Requests made to DataSource*">*0*</str>
>  * * <str name="*Total Rows Fetched*">*0*</str>
>  * * <str name="*Total Documents Processed*">*0*</str>
>  * * <str name="*Total Documents Skipped*">*0*</str>
>  * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
>  * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
>  * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
> * * </lst>
>  * * <str name="*WARNING*">*This response format is experimental. It is
> likely to change in the future.*</str>
> * * </response>
> 
> suggestions?
> thanks
> alessio

Reply via email to