The problem is that XPathEntityProcessor implements Xpath on its own, and
implements a subset of XPath.  So, if the input document is small enough,
it makes no sense to fight it.   One possibility is to apply an XSLT to the
file before processing ite

This blog post
<http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx>
shows a worked example.   The XSL transform takes place before the forEach
or field specifications, which is the principal question I had about it
from the documentation.  This is also illustrated in the initQuery()
private method of XPathEntityProcessor.    You can see the transformation
being applied before the forEach.  This will not scale to extremely large
XML documents including millions of rows - that is why they have the
stream="true" argument there, so that you don't preprocess the document.
In my case, the entire XML file is 29M, and so I think I could do the XSL
transformation and then do for each document.

This potentially shortens my time frame of moving to Apache Solr
substantially, because the common case with our previous indexer is to run
XSLT to trasform to the document format desired by the indexer.

On Mon, Dec 8, 2014 at 5:10 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I don't believe there are any alternatives. At least I could not get
> anything but the full path to work.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 8 December 2014 at 17:01, Dan Davis <dansm...@gmail.com> wrote:
> > In experimentation with a much simpler and smaller XML file, it doesn't
> > look like '//health-topic/@url" will not work, nor will '//@url' etc.
> So
> > far, only spelling it all out will work.
> > With child elements, such as <title>, an xpath of "//title" works fine,
> but
> > it  is beginning to same dangerous.
> >
> > Is there any short-hand for the current node or the match?
> >
> > On Mon, Dec 8, 2014 at 4:42 PM, Dan Davis <dansm...@gmail.com> wrote:
> >
> >> When I have a forEach attribute like the following:
> >>
> >>
> >>
> forEach="/medical-topics/medical-topic/health-topic[@language='English']"
> >>
> >> And then need to match an attribute of that, is there any alternative to
> >> spelling it all out:
> >>
> >>      <field column="url"
> >>
> xpath="/medical-topics/medical-topic/health-topic[@language='English']/@url"/>
> >>
> >> I suppose I could do "//health-topic/@url" since the document should
> then
> >> have a single health-topic (as long as I know they don't nest).
> >>
> >>
>

Reply via email to