Re: Extending Solr Highlighter to pull information from external source

Mike Sokolov Mon, 20 Jun 2011 07:18:47 -0700

Yes that sounds about right. I also have in mind an optimization forhighlighting so it doesn't need to pull the whole field value. The fastvector highlighter is working with offsets into the field, and shouldwork better w/random access into the field value(s). But that shouldcome as a later optimization.

Another thing that bugs me about fvh is that it seems to need torecompute all the terms that matched the query for each retrieved fieldvalue when it seems like it ought to be able to make use of informationgleaned during the actual query process, but that probably involves somedeep change to cache that info during query scoring, and that is beyondmy ken at the moment.


-Mike

On 06/20/2011 10:00 AM, Jamie Johnson wrote:

perhaps it should be an array that gets returned to be consistent withgetValues(fieldName);

On Mon, Jun 20, 2011 at 9:59 AM, Jamie Johnson <jej2...@gmail.com<mailto:jej2...@gmail.com>> wrote:


    Yes, in that case the code becomes

            if(!schemaField.stored()){


                SchemaField keyField = schema.getUniqueKeyField();
                String key = doc.getValues(keyField.getName())[0];
                docTexts = doc.getValues(fieldName);

                if(key != null && key.length() > 0){
                    for(int x = 0; x < docTexts.length; x++){
                        docTexts[x] = docTexts[x] + " some added text";
                    }
                }
            }


    I'd imagine that we'd want some type of interface to actually pull
    the text so you can plugin different providers, something like

    ISolrExternalFieldProvider {
          public String getFieldContent(String key, SchemaField field);
    }

    not sure if there is anything else that interface should include
    but that's all I would need at present.



    On Mon, Jun 20, 2011 at 9:54 AM, Mike Sokolov
    <soko...@ifactory.com <mailto:soko...@ifactory.com>> wrote:

        Another option for determining whether to go to external
        storage would be to examine the SchemaField, see if it is

stored, and if not, try to fetch from a file or whatever.That way you won't have to configure anything.


        -Mike


        On 06/20/2011 09:46 AM, Jamie Johnson wrote:

        In my case chucking the external storage is simply not an
        option.  I'll definitely share anything I find,  the
        following is a very simple example of adding text to the
        default solr highlighter (had to copy a large portion of the
        class since the method that actually does the highlighting is
        private along with some classes to get this to run).  If you
        look at the source it should hopefully make sense.


                String[] docTexts = null;

                if(fieldName.equals("title")){

                    SchemaField keyField = schema.getUniqueKeyField();
                    String key =
        doc.getValues(keyField.getName())[0];  //I know this field
        exists and is not multivalued
                    docTexts = doc.getValues(fieldName);  //this
        would be loaded from external store, but below just appends
        some information
                    if(key != null && key.length > 0){
                        for(int x = 0; x < docTexts.length; x++){
                            docTexts[x] = docTexts[x] + " some added
        text";
                        }
                    }
                }

        I have cheated since I know the name of the field that
        (title) which I am doing this for but it would probably be
        useful to allow this to be set on the highlighter class
        through configuration in solrconfig (I'm not familiar at all
        with doing this and have spent 0 time looking into it).  Once
        configured the if(fieldName.equals("title")) line would be
        replaced with something like
        if(externalFields.contains(fieldName)){...} or something like
        that.

        Thoughts/comments?

        On Mon, Jun 20, 2011 at 9:05 AM, Mike Sokolov
        <soko...@ifactory.com <mailto:soko...@ifactory.com>> wrote:

            I'd be very interested in this, as well, if you do it
            before me and are willing to share...

            A related question I have tried to ask on this list, and
            have never really gotten a good answer to, is whether it
            makes sense to just chuck the external storage and treat
            the lucene index as the primary storage for documents.  I
            have a feeling the answer is no; perhaps because of
            increased I/O costs for lucene and solr, but I don't
            really know.  I've been considering doing some
            experimentation, but would really love an expert opinion...

            -Mike


            On 06/20/2011 08:41 AM, Jamie Johnson wrote:

                I am trying to index data where I'm concerned that
                storing the contents of a
                specific field will be a bit of a hog so we are
                planning to retrieve this
                information as needed for highlighting from an
                external source.  I am
                looking to extend the default solr highlighting
                capability to work with
                information pulled from this external source and it
                looks like this is
                possible by extending DefaultSolrHighlighter (line
                418 to pull a particular
                field from external source) for standard highlighting and
                BaseFragmentsBuilder (line 99) for
                FastVectorHighlighter.  I could just hard
                code this to say if the field name is a specific
                value look into the
                external source, is this the best way to accomplish
                this?  Are there any
                other extension points to do what I'm suggesting?

Re: Extending Solr Highlighter to pull information from external source

Reply via email to