The problem is that XPathEntityProcessor implements Xpath on its own, and implements a subset of XPath. So, if the input document is small enough, it makes no sense to fight it. One possibility is to apply an XSLT to the file before processing ite
This blog post <http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx> shows a worked example. The XSL transform takes place before the forEach or field specifications, which is the principal question I had about it from the documentation. This is also illustrated in the initQuery() private method of XPathEntityProcessor. You can see the transformation being applied before the forEach. This will not scale to extremely large XML documents including millions of rows - that is why they have the stream="true" argument there, so that you don't preprocess the document. In my case, the entire XML file is 29M, and so I think I could do the XSL transformation and then do for each document. This potentially shortens my time frame of moving to Apache Solr substantially, because the common case with our previous indexer is to run XSLT to trasform to the document format desired by the indexer. On Mon, Dec 8, 2014 at 5:10 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > I don't believe there are any alternatives. At least I could not get > anything but the full path to work. > > Regards, > Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 8 December 2014 at 17:01, Dan Davis <dansm...@gmail.com> wrote: > > In experimentation with a much simpler and smaller XML file, it doesn't > > look like '//health-topic/@url" will not work, nor will '//@url' etc. > So > > far, only spelling it all out will work. > > With child elements, such as <title>, an xpath of "//title" works fine, > but > > it is beginning to same dangerous. > > > > Is there any short-hand for the current node or the match? > > > > On Mon, Dec 8, 2014 at 4:42 PM, Dan Davis <dansm...@gmail.com> wrote: > > > >> When I have a forEach attribute like the following: > >> > >> > >> > forEach="/medical-topics/medical-topic/health-topic[@language='English']" > >> > >> And then need to match an attribute of that, is there any alternative to > >> spelling it all out: > >> > >> <field column="url" > >> > xpath="/medical-topics/medical-topic/health-topic[@language='English']/@url"/> > >> > >> I suppose I could do "//health-topic/@url" since the document should > then > >> have a single health-topic (as long as I know they don't nest). > >> > >> >