Hi Lance, your are right: XPathEntityProcessor has the attribut "xsl", so I can use xslt to generate a xml-File "in the form of the standard Solr update schema". I will check the performance of this.
Best regards Karsten btw. "flatten" is an attribute of the "field"-Tag, not of XPathEntityProcessor (like wrongly specified it the wiki) -------- Lance > There is an option somewhere to use the full XML DOM implementation > for using xpaths. The purpose of the XPathEP is to be as simple and > dumb as possible and handle most cases: RSS feeds and other open > standards. > > Search for xsl(optional) > > http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 > -------- Karsten > On Sat, Apr 9, 2011 at 5:32 AM > > Hi Folks, > > > > does anyone improve DIH XPathRecordReader to deal with nested xpaths? > > e.g. > > data-config.xml with > > <entity .. processor="XPathEntityProcessor" .. > > <field column="title" xpath="//body/h1"/> > > <field column="alltext” xpath="//body" flatten="true"/> > > and the XML stream contains > > /html/body/h1... > > will only fill field “alltext” but field “title” will be empty. > > > > This is a known issue from 2009 > > > https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose > > > > So three questions: > > 1. How to fill a “search over all”-Field without nested xpaths? > > (schema.xml <copyField source="*" dest="alltext"/> will not help, > because we lose the original token order) > > 2. Does anyone try to improve XPathRecordReader to deal with nested > xpaths? > > 3. Does anyone else need this feature? > > > > > > Best regards > > Karsten > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html