I'm moving a content management system's Lucene library over to Solr to reap
the benefits, but along the way I'm meeting some problems which I imagine
affect everyone doing the same kind of thing.
 
I realise that Solr began life as something which was primarily designed to
be used over HTTP or with non-Java clients.  That explains the use of String
name-value maps and other String representations in the outer API.  However,
constructs like NamedList are used right down into the code - e.g. in clever
classes like SimpleFacets - which means that you might as well not be using
an object oriented language at all.  Instead of being able to use an IDE to
tell at a glance what's inside facetInfo or highlightingInfo, for example,
you have to resort to reading the Wiki or to searching code for all
instances of "rsp.add(...)".  Having written software like this in the past,
in which object structures are in developers' heads rather than in the code,
I bet it's made things more difficult along the way.
 
I get the impression that Solr is ready for a bit of refactoring to give it
a more Java-friendly API.  This API should be the primary means of access
into Solr functionality; it should explicitly model searches (i.e. filters
plus queries plus sorts plus facet and highlighting cues), search results,
hits (SolrHit which has a SolrDocument plus scoring info, by way of analogy
with Lucene Hit) and hit documents (i.e. SolrDocument, so that's already
fine).  This API should be used _by_ the String-oriented request handlers,
not the other way round; request handlers (and all uses of NamedList) should
be reserved for implementations of that API which deal with non-Java-native
clients.  At the moment, the non-Java use cases are calling the shots in the
Java implementation, and that seems a pity.
 
Some of these considerations are clearly driving the implementation of
org.apache.solr.client.solrj, which is an important development - I bet
that's where most people start with Solr now.  But I think two things need
to happen here: (i) the work here should be moved into org.apache.solr,
because with the right API at the server end you don't _need_ any code for a
Java client - it would just call into the API, and would be a 'client' only
in the sense that any caller of a method is that method's client. And (ii)
the API which is currently in org.apache.solr.client.solrj should be using
the kinds of classes I listed above, with UpdateResponse etc containing
fields and getters which model what's actually returned (and do so without
recourse to NamedList).

I realise that some of this is already happening, but I think with 1.3 still
in its early stages now might be a good time to go the whle way.  With a
more heavily modelled and self-documenting API in place, people would find
it a lot easier to develop Solr integrations, and I expect it would speed up
the process of developing new core Solr functionality.
 
Any thoughts?
 
Jon

Reply via email to