Granted, a proper XML parse of the input field is better. I didn't see an obvious solution at first, but I did run across this:

"Use a fielddatasource for reading field from database and then use xpathentityprocessor. Field datasource will give you the stream that is needed by xpathentity processor."

See:
http://osdir.com/ml/solr-user.lucene.apache.org/2011-02/msg00769.html

-- Jack Krupansky

-----Original Message----- From: Michael Della Bitta
Sent: Monday, May 14, 2012 5:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Index an xml filed that is saved in a database

That answer may serve the OP well, but I can't help but propagate this
link when the idea of parsing XML with regex comes up:

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

:)

Michael


On Mon, 2012-05-14 at 17:03 -0400, Jack Krupansky wrote:
A regex transformer should do the trick:

http://wiki.apache.org/solr/DataImportHandler#RegexTransformer

-- Jack Krupansky

-----Original Message----- From: Ramo Karahasan
Sent: Monday, May 14, 2012 4:54 PM
To: solr-user@lucene.apache.org
Subject: Index an xml filed that is saved in a database

Hi,



I have an XML document saved in a column of a database table. Is it possible
to index just one part of that xml string, e.g. <content>.</content> with
the DIH handler or is it necessary to extract this information previously?



Thanks,

Ramo


Reply via email to