The DIH has multi-threading. You can have one thread fetching files and then give them to different threads.
On Mon, Apr 11, 2011 at 11:40 AM, <[email protected]> wrote: > Hi Lance, > > I used XPathEntityProcessor with attribut "xsl" and generate a xml-File "in > the form of the standard Solr update schema". > I lost a lot of performance, it is a pity that XPathEntityProcessor does only > use one thread. > > My tests with a collection of 350T Document: > 1. use of XPathRecordReader without xslt: 28min > 2. use of XPathEntityProcessor with xslt (Standard solr-war / Xalan): 44min > 2. use of XPathEntityProcessor with saxon-xslt: 36min > > > Best regards > Karsten > > > > -------- Lance >> There is an option somewhere to use the full XML DOM implementation >> for using xpaths. The purpose of the XPathEP is to be as simple and >> dumb as possible and handle most cases: RSS feeds and other open >> standards. >> >> Search for xsl(optional) >> >> http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 >> > ----------karsten >> > Hi Folks, >> > >> > does anyone improve DIH XPathRecordReader to deal with nested xpaths? >> > e.g. >> > data-config.xml with >> > <entity .. processor="XPathEntityProcessor" .. >> > <field column="title" xpath="//body/h1"/> >> > <field column="alltext” xpath="//body" flatten="true"/> >> > and the XML stream contains >> > /html/body/h1... >> > will only fill field “alltext” but field “title” will be empty. >> > >> > This is a known issue from 2009 >> > >> https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose >> > >> > So three questions: >> > 1. How to fill a “search over all”-Field without nested xpaths? >> > (schema.xml <copyField source="*" dest="alltext"/> will not help, >> because we lose the original token order) >> > 2. Does anyone try to improve XPathRecordReader to deal with nested >> xpaths? >> > 3. Does anyone else need this feature? >> > >> > >> > Best regards >> > Karsten >> > > > http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html > -- Lance Norskog [email protected]
