Re: distributed search components

Yonik Seeley Fri, 21 Aug 2009 16:08:09 -0700

On Fri, Aug 21, 2009 at 12:52 PM, Mike Anderson<mik...@mit.edu> wrote:
> I'm trying to make my way through learning how to modify and write
> distributed search components.


The whole ResponseBuilder stuff is really a first pass - it obviously
could use refinement.  As you go through, it would be great if you
could keep in mind how things could be improved, in addition to how it
currently works.  Don't try to make sense of this as anyone's idea of
"ideal code" but rather "code that currently works".

> A few questions
>
> 1. in SearchHandler, when the query is broken down and sent to each shard,
> will this request make it's way to the process() method of the component
> (because it will look like a non-distributed request to the SearchHandler of
> the shard)?

Yes.

> 2. the comment above the response handling loop (in SearchHandler) says that
> if any requests are added while in the loop, the loop will break and make
> the request immediately. I see that the loop will exit if there is an
> exception or if there are no more responses, but  I don't see how the new
> requests will be called unless it goes through the entire loop again.

Here's the code.
          // now wait for replies, but if anyone puts more requests on
          // the outgoing queue, send them out immediately (by exiting
          // this loop)
          while (rb.outgoing.size() == 0) {
            [ receive a response, and process the response ]
          }
If any code processing the response adds another request to the
outgoing queue, then the loop will break and the new outgoing requests
will be sent.

So it's not *quite* immediate... it's after components have processed
the response.

> 3. if one adds a request to rb in the handleResponses method, this wouldn't
> necessarily be called, namely in the event that none of the components
> override the distributedProcess method, and the loop only goes through once.
>
> 4. where can I learn more about the shard.purpose variable? Where in the
> component should this be set, if anywhere?

  public final static int PURPOSE_PRIVATE         = 0x01;
  public final static int PURPOSE_GET_TERM_DFS    = 0x02;
  public final static int PURPOSE_GET_TOP_IDS     = 0x04;
  public final static int PURPOSE_REFINE_TOP_IDS  = 0x08;
  public final static int PURPOSE_GET_FACETS      = 0x10;
  public final static int PURPOSE_REFINE_FACETS   = 0x20;
  public final static int PURPOSE_GET_FIELDS      = 0x40;
  public final static int PURPOSE_GET_HIGHLIGHTS  = 0x80;
  public final static int PURPOSE_GET_DEBUG       =0x100;
  public final static int PURPOSE_GET_STATS       =0x200;

  public int purpose;  // the purpose of this request

It's for declaring what a request is for, so other components can
piggyback on that request if they want and avoid sending a separate
request. For example, the highlighting component chooses to request
highlighting only by piggybacking on requests to retrieve stored
fields.

    // Turn on highlighting only only when retrieving fields
    if ((sreq.purpose & ShardRequest.PURPOSE_GET_FIELDS) != 0) {
        sreq.purpose |= ShardRequest.PURPOSE_GET_HIGHLIGHTS;
        // should already be true...
        sreq.params.set(HighlightParams.HIGHLIGHT, "true");

The facet component will also look for suitable other outgoing
requests to piggyback on and modify, and if it can't find any, will
create a new request.  See FacetComponent.java:134

Some of these are currently unused - PURPOSE_GET_TERM_DFS for example,
would be for getting the doc freqs to implement a global idf.

-Yonik
http://www.lucidimagination.com


> I've taken a look at the wiki page, but if there is more documentation
> elsewhere please point me towards it.
>
> Thanks in advance,
> Mike

Re: distributed search components

Reply via email to