Hi,

I am trying to index an xml file as a field in lucene, see example below:

<add>
 <doc>
  <field name="title">As You Like it</field>
  <field name="author">Shakespeare, William</field>
  <field name="record"><myxml>here goes the xml...</myxml></field>
 </doc>
</add>

I can index the title and author fields because they are strings, but the
record field is an xml itself and I bump into some problems as I cannot
directly input an xml file using the post.sh script (solr complains).


I wonder what would be the correct (and relatively simple) way of doing it. 
Ideally, I would like to store the xml as is, and index only the content
removing the xml-tags (I believe there is HTMLStripWhitespaceAnalyzer for
that).
And output the result as an xml (so, simple escaping does not work for me).


So far, I had the idea of escaping the xml record and then unescaping it for
inner storage and using the analyzer for indexing (which would possible
require creating a class like XMLField or such).

thanks,
mirko

Reply via email to