Granted, a proper XML parse of the input field is better. I didn't see an
obvious solution at first, but I did run across this:
"Use a fielddatasource for reading field from database and then use
xpathentityprocessor. Field datasource will give you the stream that is
needed by xpathentity processor."
See:
http://osdir.com/ml/solr-user.lucene.apache.org/2011-02/msg00769.html
-- Jack Krupansky
-----Original Message-----
From: Michael Della Bitta
Sent: Monday, May 14, 2012 5:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Index an xml filed that is saved in a database
That answer may serve the OP well, but I can't help but propagate this
link when the idea of parsing XML with regex comes up:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
:)
Michael
On Mon, 2012-05-14 at 17:03 -0400, Jack Krupansky wrote:
A regex transformer should do the trick:
http://wiki.apache.org/solr/DataImportHandler#RegexTransformer
-- Jack Krupansky
-----Original Message-----
From: Ramo Karahasan
Sent: Monday, May 14, 2012 4:54 PM
To: solr-user@lucene.apache.org
Subject: Index an xml filed that is saved in a database
Hi,
I have an XML document saved in a column of a database table. Is it
possible
to index just one part of that xml string, e.g. <content>.</content> with
the DIH handler or is it necessary to extract this information previously?
Thanks,
Ramo