Yes that sounds about right. I also have in mind an optimization for
highlighting so it doesn't need to pull the whole field value. The fast
vector highlighter is working with offsets into the field, and should
work better w/random access into the field value(s). But that should
come as a later optimization.
Another thing that bugs me about fvh is that it seems to need to
recompute all the terms that matched the query for each retrieved field
value when it seems like it ought to be able to make use of information
gleaned during the actual query process, but that probably involves some
deep change to cache that info during query scoring, and that is beyond
my ken at the moment.
-Mike
On 06/20/2011 10:00 AM, Jamie Johnson wrote:
perhaps it should be an array that gets returned to be consistent with
getValues(fieldName);
On Mon, Jun 20, 2011 at 9:59 AM, Jamie Johnson <jej2...@gmail.com
<mailto:jej2...@gmail.com>> wrote:
Yes, in that case the code becomes
if(!schemaField.stored()){
SchemaField keyField = schema.getUniqueKeyField();
String key = doc.getValues(keyField.getName())[0];
docTexts = doc.getValues(fieldName);
if(key != null && key.length() > 0){
for(int x = 0; x < docTexts.length; x++){
docTexts[x] = docTexts[x] + " some added text";
}
}
}
I'd imagine that we'd want some type of interface to actually pull
the text so you can plugin different providers, something like
ISolrExternalFieldProvider {
public String getFieldContent(String key, SchemaField field);
}
not sure if there is anything else that interface should include
but that's all I would need at present.
On Mon, Jun 20, 2011 at 9:54 AM, Mike Sokolov
<soko...@ifactory.com <mailto:soko...@ifactory.com>> wrote:
Another option for determining whether to go to external
storage would be to examine the SchemaField, see if it is
stored, and if not, try to fetch from a file or whatever.
That way you won't have to configure anything.
-Mike
On 06/20/2011 09:46 AM, Jamie Johnson wrote:
In my case chucking the external storage is simply not an
option. I'll definitely share anything I find, the
following is a very simple example of adding text to the
default solr highlighter (had to copy a large portion of the
class since the method that actually does the highlighting is
private along with some classes to get this to run). If you
look at the source it should hopefully make sense.
String[] docTexts = null;
if(fieldName.equals("title")){
SchemaField keyField = schema.getUniqueKeyField();
String key =
doc.getValues(keyField.getName())[0]; //I know this field
exists and is not multivalued
docTexts = doc.getValues(fieldName); //this
would be loaded from external store, but below just appends
some information
if(key != null && key.length > 0){
for(int x = 0; x < docTexts.length; x++){
docTexts[x] = docTexts[x] + " some added
text";
}
}
}
I have cheated since I know the name of the field that
(title) which I am doing this for but it would probably be
useful to allow this to be set on the highlighter class
through configuration in solrconfig (I'm not familiar at all
with doing this and have spent 0 time looking into it). Once
configured the if(fieldName.equals("title")) line would be
replaced with something like
if(externalFields.contains(fieldName)){...} or something like
that.
Thoughts/comments?
On Mon, Jun 20, 2011 at 9:05 AM, Mike Sokolov
<soko...@ifactory.com <mailto:soko...@ifactory.com>> wrote:
I'd be very interested in this, as well, if you do it
before me and are willing to share...
A related question I have tried to ask on this list, and
have never really gotten a good answer to, is whether it
makes sense to just chuck the external storage and treat
the lucene index as the primary storage for documents. I
have a feeling the answer is no; perhaps because of
increased I/O costs for lucene and solr, but I don't
really know. I've been considering doing some
experimentation, but would really love an expert opinion...
-Mike
On 06/20/2011 08:41 AM, Jamie Johnson wrote:
I am trying to index data where I'm concerned that
storing the contents of a
specific field will be a bit of a hog so we are
planning to retrieve this
information as needed for highlighting from an
external source. I am
looking to extend the default solr highlighting
capability to work with
information pulled from this external source and it
looks like this is
possible by extending DefaultSolrHighlighter (line
418 to pull a particular
field from external source) for standard highlighting and
BaseFragmentsBuilder (line 99) for
FastVectorHighlighter. I could just hard
code this to say if the field name is a specific
value look into the
external source, is this the best way to accomplish
this? Are there any
other extension points to do what I'm suggesting?