[ https://issues.apache.org/jira/browse/SOLR-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761124#action_12761124 ]
Fergus McMenemie commented on SOLR-1437: ---------------------------------------- I am quite pleased with it as far as it goes and think it would be good for 1.4. I have tested it against my test set of 3000 XML documents and replacing: {code} <field column="para1" name="text" xpath="/record/sect1/para" flatten="true"/> <field column="para2" name="text" xpath="/record/list/listitem/para" flatten="true"/> <field column="para32" name="text" xpath="/record/address/para" flatten="true" /> <field column="para40" name="text" xpath="/record/authoredBy/para" flatten="true" /> <field column="para43" name="text" xpath="/record/dataGroup/address/para" flatten="true" /> <field column="para47" name="text" xpath="/record/dataGroup/keyPersonnel/doubleList/first/para" flatten="true" /> <field column="para49" name="text" xpath="/record/dataGroup/keyPersonnel/doubleList/second/para" flatten="true" /> <field column="para50" name="text" xpath="/record/dataGroup/keyPersonnel/para" flatten="true" /> <field column="para51" name="text" xpath="/record/dataGroup/para" flatten="true" /> <field column="para57" name="text" xpath="/record/doubleList/first/para" flatten="true" /> <field column="para59" name="text" xpath="/record/doubleList/second/para" flatten="true" /> <field column="para63" name="text" xpath="/record/keyPersonnel/doubleList/first/para" flatten="true" /> <field column="para65" name="text" xpath="/record/keyPersonnel/doubleList/second/para" flatten="true" /> <field column="para68" name="text" xpath="/record/list/listItem/para" flatten="true" /> <field column="para75" name="text" xpath="/record/mediaBlock/doubleList/first/para" flatten="true" /> <field column="para77" name="text" xpath="/record/mediaBlock/doubleList/second/para" flatten="true" /> <field column="para172" name="text" xpath="/record/noteGroup/note/para" flatten="true" /> <field column="para174" name="text" xpath="/record/para" flatten="true" /> <field column="para179" name="text" xpath="/record/relatedInfo/list/listItem/relatedArticle/para" flatten="true" /> <field column="para184" name="text" xpath="/record/sect1/address/dataGroup/para" flatten="true" /> <field column="para185" name="text" xpath="/record/sect1/address/para" flatten="true" /> <field column="para195" name="text" xpath="/record/sect1/dataGroup/address/para" flatten="true" /> <field column="para199" name="text" xpath="/record/sect1/dataGroup/keyPersonnel/doubleList/first/para" flatten="true" /> <field column="para201" name="text" xpath="/record/sect1/dataGroup/keyPersonnel/doubleList/second/para" flatten="true" /> <field column="para202" name="text" xpath="/record/sect1/dataGroup/keyPersonnel/para" flatten="true" /> <field column="para203" name="text" xpath="/record/sect1/dataGroup/para" flatten="true" /> <field column="para208" name="text" xpath="/record/sect1/doubleList/first/para" flatten="true" /> <field column="para212" name="text" xpath="/record/sect1/doubleList/second/list/listItem/para" flatten="true" /> <field column="para213" name="text" xpath="/record/sect1/doubleList/second/para" flatten="true" /> <field column="para217" name="text" xpath="/record/sect1/keyPersonnel/doubleList/first/para" flatten="true" /> <field column="para219" name="text" xpath="/record/sect1/keyPersonnel/doubleList/second/para" flatten="true" /> <field column="para220" name="text" xpath="/record/sect1/keyPersonnel/para" flatten="true" /> <field column="para225" name="text" xpath="/record/sect1/list/listItem/list/listItem/para" flatten="true" /> <field column="para226" name="text" xpath="/record/sect1/list/listItem/para" flatten="true" /> <field column="para240" name="text" xpath="/record/sect1/para" flatten="true" /> <field column="para244" name="text" xpath="/record/sect1/sect2/doubleList/first/para" flatten="true" /> <field column="para246" name="text" xpath="/record/sect1/sect2/doubleList/second/para" flatten="true" /> <field column="para251" name="text" xpath="/record/sect1/sect2/list/listItem/list/listItem/para" flatten="true" /> <field column="para252" name="text" xpath="/record/sect1/sect2/list/listItem/para" flatten="true" /> <field column="para258" name="text" xpath="/record/sect1/sect2/noteGroup/note/para" flatten="true" /> <field column="para259" name="text" xpath="/record/sect1/sect2/para" flatten="true" /> <field column="para265" name="text" xpath="/record/sect1/sect2/sect3/list/listItem/list/listItem/para" flatten="true" /> <field column="para266" name="text" xpath="/record/sect1/sect2/sect3/list/listItem/para" flatten="true" /> <field column="para271" name="text" xpath="/record/sect1/sect2/sect3/para" flatten="true" /> <field column="para275" name="text" xpath="/record/sect1/sect2/sect3/sect4/list/listItem/para" flatten="true" /> <field column="para279" name="text" xpath="/record/sect1/sect2/sect3/sect4/para" flatten="true" /> <field column="para284" name="text" xpath="/record/sect1/sect2/sect3/sect4/sect5/para" flatten="true" /> <field column="para295" name="text" xpath="/record/sect1/sect2/sect3/table/tgroup/tbody/row/entry/noteGroup/note/para" flatten="true" /> <field column="para297" name="text" xpath="/record/sect1/sect2/sect3/table/tgroup/tbody/row/entry/para" flatten="true" /> <field column="para301" name="text" xpath="/record/sect1/sect2/sect3/table/tgroup/thead/row/entry/para" flatten="true" /> <field column="para312" name="text" xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/list/listItem/para" flatten="true" /> <field column="para315" name="text" xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/noteGroup/note/para" flatten="true" /> <field column="para316" name="text" xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/noteGroup/para" flatten="true" /> <field column="para318" name="text" xpath="/record/sect1/sect2/table/tgroup/tbody/row/entry/para" flatten="true" /> <field column="para322" name="text" xpath="/record/sect1/sect2/table/tgroup/thead/row/entry/para" flatten="true" /> <field column="para341" name="text" xpath="/record/sect1/table/tgroup/tbody/row/entry/noteGroup/note/para" flatten="true" /> <field column="para342" name="text" xpath="/record/sect1/table/tgroup/tbody/row/entry/noteGroup/para" flatten="true" /> <field column="para344" name="text" xpath="/record/sect1/table/tgroup/tbody/row/entry/para" flatten="true" /> <field column="para348" name="text" xpath="/record/sect1/table/tgroup/thead/row/entry/para" flatten="true" /> <field column="para371" name="text" xpath="/record/table/tgroup/tbody/row/entry/noteGroup/note/para" flatten="true" /> <field column="para373" name="text" xpath="/record/table/tgroup/tbody/row/entry/para" flatten="true" /> <field column="para377" name="text" xpath="/record/table/tgroup/thead/row/entry/para" flatten="true" /> {code] with {code} <field column="text" xpath="//para" flatten="true"/> {code} The indexes seemed equivalent and time to index was also equivalent. I have one concern which should be addressed before any 1.4 release. I still do not understand the purpose of the HashSet childrenFound and putNulls, if its important then I suspect that whatever is done to childNodes when an end_element is parsed also needs done to descNodes; but I have a feeling the whole lot may be unnecessary and can be removed. If it is required we need to explain it. The last change I would like to see, which I am happy to leave to 1.5, involves making sure emitted records do not contain tags from parent nodes unless they are stipulated by "commonField" > DIH: Enhance XPathRecordReader to deal with //tagname and other improvments. > ---------------------------------------------------------------------------- > > Key: SOLR-1437 > URL: https://issues.apache.org/jira/browse/SOLR-1437 > Project: Solr > Issue Type: Improvement > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Fergus McMenemie > Assignee: Noble Paul > Priority: Minor > Fix For: 1.5 > > Attachments: SOLR-1437.patch, SOLR-1437.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > As per > http://www.nabble.com/Re%3A-Extract-info-from-parent-node-during-data-import-%28redirect%3A%29-td25471162.html > it would be nice to be able to use expressions such as //tagname when > parsing XML documents. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.