On Thu, 2012-02-09 at 23:45 +0100, alessio crisantemi wrote: > hi all, > I would index on solr my pdf files wich includeds on my directory c:\myfile\ > > so, I add on my solr/conf directory the file data-config.xml like the > following: > > > <dataConfig> > <dataSource type="BinFileDataSource" /> > <document> > <entity name="f" dataSource="null" rootEntity="false"
Why do you set rootEntity="false" on the root entity? This looks odd to me - but I can be wrong, of course. If DIH shows this: """ <str name="*Total Requests made to DataSource*">*0*</str> """ DIH hasn't even retrieved any data from you data source. Check that the call you have configured really returns any documents. Chantal > processor="FileListEntityProcessor" > baseDir="c:\myfile\" fileName="*.pdf" > recursive="true"> > <entity name="tika-test" processor="TikaEntityProcessor" > url="${f.fileAbsolutePath}" format="text"> > <field column="author" name="author" meta="true"/> > <field column="title" name="title" meta="true"/> > <field column="content_type" name="content_type" meta="true"/> > </entity> > </entity> > </document> > </dataConfig> > > before, I add this part into solr-config.xml: > > > <requestHandler name="/dataimport" > class="org.apache.solr.handler.dataimport.DataImportHandler"> > <lst name="defaults"> > <str name="config">c:\solr\conf\data-config.xml</str> > </lst> > </requestHandler> > > > but this is the result: > > .... > * * <str name="*command*">*delta-import*</str> > * * <str name="*status*">*idle*</str> > * * <str name="*importResponse*" /> > > *-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#> > <lst name="*statusMessages*"> > * * <str name="*Time Elapsed*">*0:0:2.512*</str> > * * <str name="*Total Requests made to DataSource*">*0*</str> > * * <str name="*Total Rows Fetched*">*0*</str> > * * <str name="*Total Documents Processed*">*0*</str> > * * <str name="*Total Documents Skipped*">*0*</str> > * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str> > * * <str name="**">*Indexing failed. Rolled back all changes.*</str> > * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str> > * * </lst> > * * <str name="*WARNING*">*This response format is experimental. It is > likely to change in the future.*</str> > * * </response> > > suggestions? > thanks > alessio