On 6/10/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
> Looking toward the future, and distributed search, this might be a
> natural place to add hooks to implement that distributed logic. This
> would allow other people to efficiently support their custom
> functionality in a distributed environment.
>
> Thoughts?
>
I like it. As is the prospect of adding field collapsing to standard,
dismax, *and* MLT is ugly -- it shouldn't not be.
Is this the basic architecture you are suggesting? A single handler
that chooses what components are used in the request pipeline.
Something like:
// maybe debug
debug = Debug?
// choose one query method
docs = Query( req, debug )
- standard
- dismax
- mlt (as input)
- ...
// zero or more...
info[] = Info( req, docs, debug )
+ facet
+ mlt (on each result)
+ ...
// zero or more (passed as a chain)
docs = Transform( req, docs, debug )
+ collapse
+ ???
// zero or more
fmt[] Format( req, docs )
+ highlight
// Build the response
rsp.add( docs );
rsp.add( info );
rsp.add( fmt );
rsp.add( debug );
I'm not sure how well this would work for distributed queries...
doesn't formatting need to happen in the same place as the query?
The most efficient way for distributed queries seems to be multi-pass,
and specific to the task at hand, as opposed to trying to hide the
fact that the environment is distributed.
Are there other general categories I'm missing?
That's the right type of question... It will result in a better
architecture if we brainstorm now.
- add or replace an existing component
- add some kind of security filters
- replace or augment how a query is created
- manipulate stored fields to return, or grab document fields from a
remote data source?
- multiple queries?
My first thought was not something "typed" (formatters, transformers,
etc), but more generic...
for (component : List<Component>) {
// a chance to communicate needs to other components?
// highlighting needs a Query or terms, and a list of fields
// faceting needs a base DocSet
component.prepare(req, rsp);
}
while (not all components done) {
for (component : List<Component>) {
boolean done = component.process(req,rsp);
}
}
One interesting/difficult part would be the standardization of the
communication between components.
This is all still so early, and at a high enough level, I'm not saying
I favor this approach over any other... It's just the first that
occurred to me.
-Yonik