Yonik Seeley wrote:
Thanks Andrew!

I still think that being able to configure how a field is highlighted
on a per-query basis is almost as basic as being able to ask for which
fields to highlight or which fields to return.  Regardless, per-query
overriding can be added later, but I think it should be kept in mind
at least while designing the configuration.


I'm not sure I agree, as I think that it's unlikely to have many different variations on how the highlighting should be configured, and multiple handlers can be configured to cope with a small number of variations. But that's getting into "matters of opinion", and I'm the Solr newbie...

One question that would need addressing if these configuration options were supported via the request is whether top-level settings on the request override per-field settings.

The other big question is the exact syntax, and without a clear idea for that I didn't want to tackle it.

A couple of points about the config interface:
I think the proliferation of parameter names may be confusing either
in solrconfig.xml or in query args.

I think we should be using a convention like namespaces, much like
java property files do (think what a property file would be like
without that).

Parameter names like "formatter" or "fields" are pretty confusing if
you don't know the context is highlighting.... people could easily
think formatter specifies the output format (XML, JSON, etc), and
could very easily think that "fields" were the stored fields to
return.


Point taken - although I could think of a better name for "fields" apart from something verbose like "perFieldConfiguration". Of course it could just be "f" as kind of suggested below.

Also, parameters like formatterPre, formatterPost, formatterMinFgCl,
etc aren't global... they only apply to specific highlighter
formatters, and you have to understand a lot about those formatters to
understand which apply.  Some hierarchy could be added to reflect that
also.

If we put more of this info into our parameter names, it would ease
the burden of understanding for new users (and experienced ones too
perhaps)

So query args could be:

hl.formatter=simple
hl.fragsize=100
hl.simple.pre=<em>
hl.simple.post=</em>
hl.color.minBg=#FFFF99
hl.color.maxBg=#FF9900

And solrconfig.xml config could be the simple form

<str name="hl.formatter">simple</str>
<int name="hl.fragsize">simple</str>

OR, maybe it could be more structured and put the hierarchy in the XML:
<lst name="hl>
 <str name="formatter">simple</name>
 <int name="fragsize">100</int>
 <lst name="simple">
   <str name="pre">&lt;em></str>
   <str name="post">&lt;/em></str>
 </lst>


I think that starts to get a bit confusing if you're already in a nested list of lists for a per-field configuration.

In either XML config schema, it makes sense to try and keep it easy to
figure out how to override it via a query param (simple name mapping).

Of course I've only addressed global defaults, not per-field defaults,
but the same style could be used.

After going through this exercise and thinking it through, I think I'm
coming back to the same preference that Mike Klass had from the list
of examples: field properties.  That way, if any other parameters need
per-field config, someone doesn't end up re-inventing the wheel all
over again in a non-consistent manner.  It's also relatively easy to
explain to someone.

http://www.nabble.com/Support-for-custom-Fragmenter-when-Highlighting-tf1962395.html#a5386994 : : #model things as properties on fields (with f. being the field namespace)
: :
: : f.foo.fragsize=0
: : f.bar.fragsize=1000
: : f.*.fragsize=100   #the default
:
: I like this option the best, though the wildcard specification might
get out of hand.
:
: There could be a top-level namespace:
: hl.fragsize = 100 #default
: And field-level overrides precisely matching the top-level general params:
: f.foo.hl.fragsize = 0

Plugins could then do something like: getFieldProperty("title","hl.formatter")
and get the built-in standard mechanism for checking a hierarchy of
places this property could be defined (handler defaults, handler field
specific config, query defaults, query field specific config).


It would be good to have a generic way for checking values set at field/handler/request/default levels, but what stopped me from doing this was the way CommonParams parses the NamedList it's given and sets a number of variables for the values it understands.

It would be a lot easier if CommonParams just kept the NamedList and we pull values from that by name - but that was more radical a change than I wanted to make.

I'll be honest - I only have so much patience for working on configuration issues (I can never find a good solution for this kind of thing), and I'm reluctant to spend any more time on this unless I feel there is a consensus amongst the people on this mailing list, so that I can come up with a patch that is likely to be accepted.

I guess what it comes down to is that we need a clear specification of whether there are going to be namespaces or something similar in query parameters, and also how per-field query parameters should look. And then how this maps to configuration in solrconfig.xml and how that is stored and accessed in the code.

We could take something like "." in a query parameter name to be equivalent to nested lists in solrconfig.xml, so "foo.bar.fubar=..." on the query string is comparable to
<lst name="foo">
  <lst name="bar">
     <str name="fubar">...</str>
  </lst>
</lst>

But if you also use "." as a way of introducing grouping between properties, then you could get a lot of nesting - e.g. specifying the pre tag for a simple formatter for the title field might be "f.title.hl.simple.pre=<em>" which is a bit more nesting than I'd be comfortable with (or am prepared to type out!).

If instead we separated on ":" and used "." just as part of a naming convention, then "f:title:hl.simple.pre=<em>" would be equivalent to:

<lst name="f">
  <lst name="title">
     <str name="hl.simple.pre">&lt;em></str>
  </lst>
</lst>

If something is specified globally, there's no wildcards, just a top level 
property.

How does that sound?

That leaves the issue of what to do with CommonParams. If we don't test the types in solrconfig.xml when Solr starts then it introduces the possibility of type errors occuring at runtime. Perhaps we could keep everything in the named list, but define the know typed properties elsewhere (e.g. an Enum) and validate when the handler is initialised?

Sigh. I just don't know how to write short technical emails.

-Andrew

Reply via email to