Hi Lance, I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in the form of the standard Solr update schema". I lost a lot of performance, it is a pity that XPathEntityProcessor does only use one thread.
My tests with a collection of 350T Document: 1. use of XPathRecordReader without xslt: 28min 2. use of XPathEntityProcessor with xslt (Standard solr-war / Xalan): 44min 2. use of XPathEntityProcessor with saxon-xslt: 36min Best regards Karsten -------- Lance > There is an option somewhere to use the full XML DOM implementation > for using xpaths. The purpose of the XPathEP is to be as simple and > dumb as possible and handle most cases: RSS feeds and other open > standards. > > Search for xsl(optional) > > http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 > ----------karsten > > Hi Folks, > > > > does anyone improve DIH XPathRecordReader to deal with nested xpaths? > > e.g. > > data-config.xml with > > <entity .. processor="XPathEntityProcessor" .. > > <field column="title" xpath="//body/h1"/> > > <field column="alltext” xpath="//body" flatten="true"/> > > and the XML stream contains > > /html/body/h1... > > will only fill field “alltext” but field “title” will be empty. > > > > This is a known issue from 2009 > > > https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose > > > > So three questions: > > 1. How to fill a “search over all”-Field without nested xpaths? > > (schema.xml <copyField source="*" dest="alltext"/> will not help, > because we lose the original token order) > > 2. Does anyone try to improve XPathRecordReader to deal with nested > xpaths? > > 3. Does anyone else need this feature? > > > > > > Best regards > > Karsten > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html