Hi,

we are trying to import medline into a solr core. everthing works fine except the problem, that in the xml files from medline, sometimes certain tags are missing. If we define them in the data-config.xml file for our core, the dataimporthandler throws an exception for every tag, that is missing:

SCHWERWIEGEND: Exception while solr commit.
java.lang.IllegalArgumentException: no such field ChemicalNameOfSubstance
at org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49) at org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
 at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
 at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683)
 at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663)
at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107) at org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380)

Can we tell the DataImportHandler that it should write a default value if the tag is missing?

Here is our data-config.xml (skipped most of the lines that work for simplicity)

|<dataConfig>
<dataSource  name="medline"  type="FileDataSource"  encoding="UTF-8"  />
    <document  name="MedlineCitations">
        <entity  name="file"  processor="FileListEntityProcessor"  baseDir="/home/"  fileName=".*xml"  
recursive="true"  rootEntity="false"  dataSource="null">
<entity name="MedlineCitation" processor="XPathEntityProcessor"
        stream="true"
        forEach="/MedlineCitationSet/MedlineCitation"
        url="${file.fileAbsolutePath}"
                        >
        
        <field  column="PMID"                                              
xpath="/MedlineCitationSet/MedlineCitation/PMID"  />
        
        <field  column="CreationYear"                                      
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Year"  />
        <field  column="CreationMonth"                                     
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Month"  />
        <field  column="CreationDay"                                               
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Day"  />
        
        <!-- These cause DataImportHandler exceptions!
                    <field column="RevisionYear"                                   
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Year" />
                    <field column="RevisionMonth"                                  
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Month" />
                    <field column="RevisionDay"                                            
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Day" />
                    -->
        </entity>
        </entity>
    </document>
</dataConfig>|




With kind regards,
Konrad Lötzsch.

--
*Konrad Loetzsch*
Dipl. Math

*antibodies-online GmbH*
Schloß-Rahe-Str. 15
DE-52072 Aachen

Tel.: +49(0)241 9367-2544
konrad.loetz...@antibodies-online.com <mailto:konrad.loetz...@antibodies-online.com> www.antikoerper-online.de <http://www.antikoerper-online.de> | www.antibodies-online.com <http://www.antibodies-online.com>

Eingetragen beim Amtsgericht Aachen unter HRB 13919
Geschäftsführer: Dr. Tim Hiddemann, Dr. Andreas Kessell

Reply via email to