HighlighterTest.java

Otis Gospodnetic Fri, 23 May 2008 20:16:34 -0700

Ah, sorry about that - I was looking at that issue trying to remember why I 
made a reference to Lucene.  Now I remember. :)


I assume nobody minds getting 2.4-dev in there, right?  Shall I move all 
lib/lucene*jar to 2.4-dev or just the highlighter?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Grant Ingersoll <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, May 23, 2008 6:47:28 PM
> Subject: Re: svn commit: r659664 - in /lucene/solr/trunk: CHANGES.txt 
> src/java/org/apache/solr/common/params/HighlightParams.java 
> src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java 
> src/test/org/apache/solr/highlight/HighlighterTest.java
> 
> I'm getting compile errors on clean :
>   [mkdir] Created dir:/solr-trunk/build/core
>      [javac] Compiling 314 source files to ...solr-trunk/build/core
>      [javac] ...solr-trunk/src/java/org/apache/solr/highlight/ 
> DefaultSolrHighlighter.java:45: cannot find symbol
>      [javac] symbol  : class SpanScorer
>      [javac] location: package org.apache.lucene.search.highlight
>      [javac] import org.apache.lucene.search.highlight.SpanScorer;
>      [javac]                                           ^
>      [javac] ...solr-trunk/src/java/org/apache/solr/highlight/ 
> DefaultSolrHighlighter.java:144: cannot find symbol
>      [javac] symbol  : class SpanScorer
>      [javac] location: class  
> org.apache.solr.highlight.DefaultSolrHighlighter
>      [javac]   private SpanScorer getSpanQueryScorer(Query query,  
> String fieldName, CachingTokenFilter tokenStream, SolrQueryRequest  
> request) throws IOException {
>      [javac]           ^
>      [javac] ...solr-trunk/src/java/org/apache/solr/highlight/ 
> DefaultSolrHighlighter.java:147: cannot find symbol
>      [javac] symbol  : class SpanScorer
>      [javac] location: class  
> org.apache.solr.highlight.DefaultSolrHighlighter
>      [javac]       return new SpanScorer(query, fieldName, tokenStream);
>      [javac]                  ^
>      [javac] ...solr-trunk/src/java/org/apache/solr/highlight/ 
> DefaultSolrHighlighter.java:150: cannot find symbol
>      [javac] symbol  : class SpanScorer
>      [javac] location: class  
> org.apache.solr.highlight.DefaultSolrHighlighter
>      [javac]       return new SpanScorer(query, null, tokenStream);
>      [javac]                  ^
>      [javac] Note: Some input files use or override a deprecated API.
>      [javac] Note: Recompile with -Xlint:deprecation for details.
>      [javac] Note: Some input files use unchecked or unsafe operations.
>      [javac] Note: Recompile with -Xlint:unchecked for details.
>      [javac] 4 errors
> 
> SVN Info:
> 
> >svn info
> Path: .
> URL: https://svn.apache.org/repos/asf/lucene/solr/trunk
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 659696
> Node Kind: directory
> Schedule: normal
> Last Changed Author: otis
> Last Changed Rev: 659668
> Last Changed Date: 2008-05-23 17:32:45 -0400 (Fri, 23 May 2008)
> 
> Doesn't this require new Lucene jars?
> 
> -Grant
> 
> 
> 
> On May 23, 2008, at 5:23 PM, [EMAIL PROTECTED] wrote:
> 
> > Author: otis
> > Date: Fri May 23 14:23:25 2008
> > New Revision: 659664
> >
> > URL: http://svn.apache.org/viewvc?rev=659664&view=rev
> > Log:
> > SOLR-553 Use SpanScorer when highlighting phrase terms and  
> > hl.usePhraseHighlighter=true
> >
> > Modified:
> >    lucene/solr/trunk/CHANGES.txt
> >    lucene/solr/trunk/src/java/org/apache/solr/common/params/ 
> > HighlightParams.java
> >    lucene/solr/trunk/src/java/org/apache/solr/highlight/ 
> > DefaultSolrHighlighter.java
> >    lucene/solr/trunk/src/test/org/apache/solr/highlight/ 
> > HighlighterTest.java
> >
> > Modified: lucene/solr/trunk/CHANGES.txt
> > URL: 
> http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt?rev=659664&r1=659663&r2=659664&view=diff
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > ======================================================================
> > --- lucene/solr/trunk/CHANGES.txt (original)
> > +++ lucene/solr/trunk/CHANGES.txt Fri May 23 14:23:25 2008
> > @@ -409,7 +409,14 @@
> >
> > 31. SOLR-514: Added explicit media-type with UTF* charset to *.xsl  
> > files that
> >     don't already have one. (hossman)
> > -
> > +
> > +32. SOLR-505: Give RequestHandlers the possiblity to suppress the  
> > generation
> > +    of HTTP caching headers.  (Thomas Peuss via Otis Gospodnetic)
> > +
> > +33. SOLR-553: Handle highlighting of phrase terms better when
> > +    hl.usePhraseHighligher=true URL param is used.
> > +    (Bojan Smid via Otis Gospodnetic)
> > +
> > Other Changes
> >  1. SOLR-135: Moved common classes to org.apache.solr.common and  
> > altered the
> >     build scripts to make two jars: apache-solr-1.3.jar and
> >
> > Modified: lucene/solr/trunk/src/java/org/apache/solr/common/params/ 
> > HighlightParams.java
> > URL: 
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/common/params/HighlightParams.java?rev=659664&r1=659663&r2=659664&view=diff
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > ======================================================================
> > --- lucene/solr/trunk/src/java/org/apache/solr/common/params/ 
> > HighlightParams.java (original)
> > +++ lucene/solr/trunk/src/java/org/apache/solr/common/params/ 
> > HighlightParams.java Fri May 23 14:23:25 2008
> > @@ -33,6 +33,8 @@
> >   public static final String FIELD_MATCH = HIGHLIGHT 
> > +".requireFieldMatch";
> >   public static final String ALTERNATE_FIELD = HIGHLIGHT 
> > +".alternateField";
> >   public static final String ALTERNATE_FIELD_LENGTH = HIGHLIGHT 
> > +".maxAlternateFieldLength";
> > +
> > +  public static final String USE_PHRASE_HIGHLIGHTER = HIGHLIGHT 
> > +".usePhraseHighlighter";
> >
> >   public static final String MERGE_CONTIGUOUS_FRAGMENTS = HIGHLIGHT  
> > + ".mergeContiguous";
> >   // Formatter
> >
> > Modified: lucene/solr/trunk/src/java/org/apache/solr/highlight/ 
> > DefaultSolrHighlighter.java
> > URL: 
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java?rev=659664&r1=659663&r2=659664&view=diff
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > ======================================================================
> > --- lucene/solr/trunk/src/java/org/apache/solr/highlight/ 
> > DefaultSolrHighlighter.java (original)
> > +++ lucene/solr/trunk/src/java/org/apache/solr/highlight/ 
> > DefaultSolrHighlighter.java Fri May 23 14:23:25 2008
> > @@ -32,6 +32,7 @@
> > import javax.xml.xpath.XPathConstants;
> >
> > import org.apache.lucene.analysis.Analyzer;
> > +import org.apache.lucene.analysis.CachingTokenFilter;
> > import org.apache.lucene.analysis.Token;
> > import org.apache.lucene.analysis.TokenFilter;
> > import org.apache.lucene.analysis.TokenStream;
> > @@ -41,6 +42,7 @@
> > import org.apache.lucene.search.highlight.Fragmenter;
> > import org.apache.lucene.search.highlight.Highlighter;
> > import org.apache.lucene.search.highlight.QueryScorer;
> > +import org.apache.lucene.search.highlight.SpanScorer;
> > import org.apache.lucene.search.highlight.TextFragment;
> > import org.apache.lucene.search.highlight.TokenSources;
> > import org.apache.solr.common.SolrException;
> > @@ -55,7 +57,6 @@
> > import org.apache.solr.search.DocIterator;
> > import org.apache.solr.search.DocList;
> > import org.apache.solr.search.SolrIndexSearcher;
> > -import org.apache.solr.util.SolrPluginUtils;
> > import org.apache.solr.util.plugin.NamedListPluginLoader;
> > import org.w3c.dom.NodeList;
> >
> > @@ -92,6 +93,27 @@
> >     formatters.put( null, fmt );
> >   }
> >
> > +  /**
> > +   * Return a phrase Highlighter appropriate for this field.
> > +   * @param query The current Query
> > +   * @param fieldName The name of the field
> > +   * @param request The current SolrQueryRequest
> > +   * @param tokenStream document text CachingTokenStream
> > +   * @throws IOException
> > +   */
> > +  protected Highlighter getPhraseHighlighter(Query query, String  
> > fieldName, SolrQueryRequest request, CachingTokenFilter tokenStream)  
> > throws IOException {
> > +    SolrParams params = request.getParams();
> > +    Highlighter highlighter = null;
> > +
> > +    highlighter = new Highlighter(getFormatter(fieldName, params),  
> > getSpanQueryScorer(query, fieldName, tokenStream, request));
> > +
> > +    highlighter.setTextFragmenter(getFragmenter(fieldName, params));
> > +    highlighter.setMaxDocBytesToAnalyze(params.getFieldInt(
> > +        fieldName, HighlightParams.MAX_CHARS,
> > +        Highlighter.DEFAULT_MAX_DOC_BYTES_TO_ANALYZE));
> > +
> > +    return highlighter;
> > +  }
> >
> >   /**
> >    * Return a Highlighter appropriate for this field.
> > @@ -112,6 +134,24 @@
> >   }
> >
> >   /**
> > +   * Return a SpanScorer suitable for this Query and field.
> > +   * @param query The current query
> > +   * @param tokenStream document text CachingTokenStream
> > +   * @param fieldName The name of the field
> > +   * @param request The SolrQueryRequest
> > +   * @throws IOException
> > +   */
> > +  private SpanScorer getSpanQueryScorer(Query query, String  
> > fieldName, CachingTokenFilter tokenStream, SolrQueryRequest request)  
> > throws IOException {
> > +    boolean reqFieldMatch =  
> > request.getParams().getFieldBool(fieldName,  
> > HighlightParams.FIELD_MATCH, false);
> > +    if (reqFieldMatch) {
> > +      return new SpanScorer(query, fieldName, tokenStream);
> > +    }
> > +    else {
> > +      return new SpanScorer(query, null, tokenStream);
> > +    }
> > +  }
> > +
> > +  /**
> >    * Return a QueryScorer suitable for this Query and field.
> >    * @param query The current query
> >    * @param fieldName The name of the field
> > @@ -230,32 +270,59 @@
> >           fieldName = fieldName.trim();
> >           String[] docTexts = doc.getValues(fieldName);
> >           if (docTexts == null) continue;
> > +
> > +          TokenStream tstream = null;
> > +
> > +          // create TokenStream
> > +          if (docTexts.length == 1) {
> > +            // single-valued field
> > +            try {
> > +              // attempt term vectors
> > +              tstream =  
> > TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
> > +            }
> > +            catch (IllegalArgumentException e) {
> > +              // fall back to anaylzer
> > +              tstream = new  
> > TokenOrderingFilter(schema.getAnalyzer().tokenStream(fieldName, new  
> > StringReader(docTexts[0])), 10);
> > +            }
> > +          }
> > +          else {
> > +            // multi-valued field
> > +            tstream = new MultiValueTokenStream(fieldName,  
> > docTexts, schema.getAnalyzer(), true);
> > +          }
> > +
> > +          Highlighter highlighter;
> > +
> > +          if  
> > (Boolean 
> > .valueOf 
> > (req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) {
> > +            // wrap CachingTokenFilter around TokenStream for reuse
> > +            tstream = new CachingTokenFilter(tstream);
> > +
> > +            // get highlighter
> > +            highlighter = getPhraseHighlighter(query, fieldName,  
> > req, (CachingTokenFilter) tstream);
> > +
> > +            // after highlighter initialization, reset tstream  
> > since construction of highlighter already used it
> > +            tstream.reset();
> > +          }
> > +          else {
> > +            // use "the old way"
> > +            highlighter = getHighlighter(query, fieldName, req);
> > +          }
> >
> > -          // get highlighter, and number of fragments for this field
> > -          Highlighter highlighter = getHighlighter(query,  
> > fieldName, req);
> >           int numFragments = getMaxSnippets(fieldName, params);
> >           boolean mergeContiguousFragments =  
> > isMergeContiguousFragments(fieldName, params);
> >
> >            String[] summaries = null;
> >            TextFragment[] frag;
> >            if (docTexts.length == 1) {
> > -              // single-valued field
> > -              TokenStream tstream;
> > -              try {
> > -                 // attempt term vectors
> > -                 tstream =  
> > TokenSources.getTokenStream(searcher.getReader(), docId, fieldName);
> > -              }
> > -              catch (IllegalArgumentException e) {
> > -                 // fall back to analyzer
> > -                 tstream = new  
> > TokenOrderingFilter(schema.getAnalyzer().tokenStream(fieldName, new  
> > StringReader(docTexts[0])), 10);
> > -              }
> >               frag = highlighter.getBestTextFragments(tstream,  
> > docTexts[0], mergeContiguousFragments, numFragments);
> >            }
> >            else {
> > -              // multi-valued field
> > -              MultiValueTokenStream tstream;
> > -              tstream = new MultiValueTokenStream(fieldName,  
> > docTexts, schema.getAnalyzer(), true);
> > -              frag = highlighter.getBestTextFragments(tstream,  
> > tstream.asSingleValue(), false, numFragments);
> > +               StringBuilder singleValue = new StringBuilder();
> > +
> > +               for (String txt:docTexts) {
> > +                   singleValue.append(txt);
> > +               }
> > +
> > +              frag = highlighter.getBestTextFragments(tstream,  
> > singleValue.toString(), false, numFragments);
> >            }
> >            // convert fragments back into text
> >            // TODO: we can include score and position information in  
> > output as snippet attributes
> > @@ -303,12 +370,8 @@
> >   }
> > }
> >
> > -
> > -
> > -
> > /**
> > - * Helper class which creates a single TokenStream out of values  
> > from a
> > - * multi-valued field.
> > + * Creates a single TokenStream out multi-value field values.
> >  */
> > class MultiValueTokenStream extends TokenStream {
> >   private String fieldName;
> > @@ -378,7 +441,6 @@
> >       sb.append(str);
> >     return sb.toString();
> >   }
> > -
> > }
> >
> >
> > @@ -424,5 +486,3 @@
> >     return queue.isEmpty() ? null : queue.removeFirst();
> >   }
> > }
> > -
> > -
> >
> > Modified: lucene/solr/trunk/src/test/org/apache/solr/highlight/ 
> > HighlighterTest.java
> > URL: 
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java?rev=659664&r1=659663&r2=659664&view=diff
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > = 
> > ======================================================================
> > --- lucene/solr/trunk/src/test/org/apache/solr/highlight/ 
> > HighlighterTest.java (original)
> > +++ lucene/solr/trunk/src/test/org/apache/solr/highlight/ 
> > HighlighterTest.java Fri May 23 14:23:25 2008
> > @@ -481,4 +481,59 @@
> >             "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']/ 
> > [EMAIL PROTECTED]'t_text']/str[.='a piece of text']"
> >             );
> >   }
> > +
> > +  public void testPhraseHighlighter() {
> > +    HashMapargs = new HashMap();
> > +    args.put("hl", "true");
> > +    args.put("hl.fl", "t_text");
> > +    args.put("hl.fragsize", "40");
> > +    args.put("hl.snippets", "10");
> > +
> > +    TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(
> > +      "standard", 0, 200, args);
> > +
> > +    // String borrowed from Lucene's HighlighterTest
> > +    String t = "This piece of text refers to Kennedy at the  
> > beginning then has a longer piece of text that is very long in the  
> > middle and finally ends with another reference to Kennedy";
> > +
> > +    assertU(adoc("t_text", t, "id", "1"));
> > +    assertU(commit());
> > +    assertU(optimize());
> > +
> > +    String oldHighlight1 = "//[EMAIL PROTECTED]'1']/[EMAIL 
> > PROTECTED]'t_text']/ 
> > str[.='This piece of text refers to Kennedy']";
> > +    String oldHighlight2 = "//[EMAIL PROTECTED]'1']/[EMAIL 
> > PROTECTED]'t_text']/ 
> > str[.=' at the beginning then has a longer piece of text']";
> > +    String oldHighlight3 = "//[EMAIL PROTECTED]'1']/[EMAIL 
> > PROTECTED]'t_text']/ 
> > str[.=' with another reference to Kennedy']";
> > +    String newHighlight1 = "//[EMAIL PROTECTED]'1']/[EMAIL 
> > PROTECTED]'t_text']/ 
> > str[.='This piece of text refers to Kennedy']";
> > +
> > +    // check if old functionality is still the same
> > +    assertQ("Phrase highlighting - old",
> > +        sumLRF.makeRequest("t_text:\"text refers\""),
> > +        "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
> > +        oldHighlight1, oldHighlight2, oldHighlight3
> > +        );
> > +
> > +    assertQ("Phrase highlighting - old",
> > +        sumLRF.makeRequest("t_text:text refers"),
> > +        "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
> > +        oldHighlight1, oldHighlight2, oldHighlight3
> > +        );
> > +
> > +    // now check if Lucene-794 highlighting works as expected
> > +    args.put("hl.usePhraseHighlighter", "true");
> > +
> > +    sumLRF = h.getRequestFactory("standard", 0, 200, args);
> > +
> > +    // check phrase highlighting
> > +    assertQ("Phrase highlighting - Lucene-794",
> > +        sumLRF.makeRequest("t_text:\"text refers\""),
> > +        "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
> > +        newHighlight1
> > +        );
> > +
> > +    // non phrase queries should be highlighted as they were before  
> > this fix
> > +    assertQ("Phrase highlighting - Lucene-794",
> > +        sumLRF.makeRequest("t_text:text refers"),
> > +        "//[EMAIL PROTECTED]'highlighting']/[EMAIL PROTECTED]'1']",
> > +        oldHighlight1, oldHighlight2, oldHighlight3
> > +        );
> > +  }
> > }
> >
> >
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ

Re: svn commit: r659664 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/common/params/HighlightParams.java src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java src/test/org/apache/solr/highlight/HighlighterTest.java

Reply via email to