I think I may have identified a bug with FVH. So I have two questions:
1) Does anyone know how to make FVH return a highlighted snippet when the
query matches all of one string in a multivalued field?
2) If not, does anyone know how to make DIH concatenate all the values in a
multivalued field into one single field?
Imagine a document which looks like this:
<doc>
<str name="department_name">Obstetrics and Gynaecology</str>
<arr name="node_names">
<str>Refer to specialist</str>
<str>Identify adverse psycho social factors</str>
</arr>
</doc>
If I search the document and ask for matches to be highlighted with the
original highlighter I get 'node_names' in the highlighting results
q=node_names:("Refer to specialist")&hl=true*hl.fl=*
But if I repeat the search using the FVH, 'node_names' does not appear in
the highlighting results
q=node_names:("Refer to
specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true
A search for something less than the full string (e.g. "Refer to") works in
both cases.
I have tried every combination of hl.requireFieldMatch,
hl.usePhraseHighlighter with no effect.
node_names is defined as either:
<field name="node_names" type="text_en_splitting" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>
OR:
<field name="node_names" type="text_en" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>
And I have tried setting preserveOriginal="1" on the
WordDelimiterFilterFactory.
Now FVH seems to work fine with single-valued fields, so doing a query
q=department_name:("Obstetrics and Gynaecology") works as expected. Given
that, I have tried unsuccessfully to use either a Javascript or native Java
transformer to merge the contents of node_names into a single
node_names_flat field during data import. This fails because child entities
have no access to their parent entity.
<entity name="pathway">
<entity name="pages">
<entity name="nodes">
-- produces multiple node_names and there seems to be no way to push
them up into 'pages' or 'pathway'
</entity>
</entity>
</entity>
Duncan.