Hi,
we are trying to import medline into a solr core. everthing works fine
except the problem, that in the xml files from medline, sometimes
certain tags are missing. If we define them in the data-config.xml file
for our core, the dataimporthandler throws an exception for every tag,
that is missing:
SCHWERWIEGEND: Exception while solr commit.
java.lang.IllegalArgumentException: no such field ChemicalNameOfSubstance
at
org.apache.solr.core.DefaultCodecFactory$1.getPostingsFormatForField(DefaultCodecFactory.java:49)
at
org.apache.lucene.codecs.lucene40.Lucene40Codec$1.getPostingsFormatForField(Lucene40Codec.java:52)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.addField(PerFieldPostingsFormat.java:94)
at
org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:335)
at
org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85)
at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117)
at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53)
at
org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82)
at
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480)
at
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422)
at
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554)
at
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2547)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2683)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2663)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:414)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:919)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
at
org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:107)
at
org.apache.solr.handler.dataimport.DocBuilder.finish(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:256)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:333)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:399)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:380)
Can we tell the DataImportHandler that it should write a default value
if the tag is missing?
Here is our data-config.xml (skipped most of the lines that work for
simplicity)
|<dataConfig>
<dataSource name="medline" type="FileDataSource" encoding="UTF-8" />
<document name="MedlineCitations">
<entity name="file" processor="FileListEntityProcessor" baseDir="/home/" fileName=".*xml"
recursive="true" rootEntity="false" dataSource="null">
<entity name="MedlineCitation"
processor="XPathEntityProcessor"
stream="true"
forEach="/MedlineCitationSet/MedlineCitation"
url="${file.fileAbsolutePath}"
>
<field column="PMID"
xpath="/MedlineCitationSet/MedlineCitation/PMID" />
<field column="CreationYear"
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Year" />
<field column="CreationMonth"
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Month" />
<field column="CreationDay"
xpath="/MedlineCitationSet/MedlineCitation/DateCreated/Day" />
<!-- These cause DataImportHandler exceptions!
<field column="RevisionYear"
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Year" />
<field column="RevisionMonth"
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Month" />
<field column="RevisionDay"
xpath="/MedlineCitationSet/MedlineCitation/DateRevised/Day" />
-->
</entity>
</entity>
</document>
</dataConfig>|
With kind regards,
Konrad Lötzsch.
--
*Konrad Loetzsch*
Dipl. Math
*antibodies-online GmbH*
Schloß-Rahe-Str. 15
DE-52072 Aachen
Tel.: +49(0)241 9367-2544
konrad.loetz...@antibodies-online.com
<mailto:konrad.loetz...@antibodies-online.com>
www.antikoerper-online.de <http://www.antikoerper-online.de> |
www.antibodies-online.com <http://www.antibodies-online.com>
Eingetragen beim Amtsgericht Aachen unter HRB 13919
Geschäftsführer: Dr. Tim Hiddemann, Dr. Andreas Kessell