Is it possible to configure solr cell to only extract and store the body of
a document when indexing?  I'm currently doing the following which I
thought would work

ModifiableSolrParams params = new ModifiableSolrParams();

 params.set("defaultField", "content");

 params.set("xpath", "/xhtml:html/xhtml:body/descendant::node()");

 ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(
"/update/extract");

 up.setParams(params);

 FileStream f = new FileStream(new File(".."));

 up.addContentStream(f);

up.setAction(ACTION.COMMIT, true, true);

solrServer.request(up);


But the result of content is as follows

<arr name="content_mvtxt">
<str/>
<str>null</str>
<str>ISO-8859-1</str>
<str>text/plain; charset=ISO-8859-1</str>
<str>Just a little test</str>
</arr>


What I had hoped for was just

<arr name="content_mvtxt">
<str>Just a little test</str>
</arr>

Reply via email to