Hi Lance,

your are right:
XPathEntityProcessor has the attribut "xsl", so I can use xslt to generate a 
xml-File "in the form of the standard Solr update schema".
I will check the performance of this.


Best regards
  Karsten


btw. "flatten" is an attribute of the "field"-Tag, not of XPathEntityProcessor 
(like wrongly specified it the wiki)


-------- Lance
> There is an option somewhere to use the full XML DOM implementation
> for using xpaths. The purpose of the XPathEP is to be as simple and
> dumb as possible and handle most cases: RSS feeds and other open
> standards.
> 
> Search for xsl(optional)
> 
> http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1
> 
-------- Karsten
> On Sat, Apr 9, 2011 at 5:32 AM
> > Hi Folks,
> >
> > does anyone improve DIH XPathRecordReader to deal with nested xpaths?
> > e.g.
> > data-config.xml with
> >  <entity .. processor="XPathEntityProcessor" ..
> >  <field column="title" xpath="//body/h1"/>
> >  <field column="alltext” xpath="//body" flatten="true"/>
> > and the XML stream contains
> >  /html/body/h1...
> > will only fill field “alltext” but field “title” will be empty.
> >
> > This is a known issue from 2009
> >
> https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose
> >
> > So three questions:
> > 1. How to fill a “search over all”-Field without nested xpaths?
> >   (schema.xml  <copyField source="*" dest="alltext"/> will not help,
> because we lose the original token order)
> > 2. Does anyone try to improve XPathRecordReader to deal with nested
> xpaths?
> > 3. Does anyone else need this feature?
> >
> >
> > Best regards
> >  Karsten
> >
http://lucene.472066.n3.nabble.com/DIH-Enhance-XPathRecordReader-to-deal-with-body-FLATTEN-true-and-body-h1-td2799005.html

Reply via email to