On 8/9/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: > I'm moving a content management system's Lucene library over to Solr to reap > the benefits, but along the way I'm meeting some problems which I imagine > affect everyone doing the same kind of thing. > > I realise that Solr began life as something which was primarily designed to > be used over HTTP or with non-Java clients.
I think that's still the primary use... a search server like a database that can be accessed from many clients in differing languages. I think HTTP as the primary interface, and plugins or embedding as a last resort (again, similar to most databases). > That explains the use of String > name-value maps and other String representations in the outer API. However, > constructs like NamedList are used right down into the code - e.g. in clever > classes like SimpleFacets - which means that you might as well not be using > an object oriented language at all. Instead of being able to use an IDE to > tell at a glance what's inside facetInfo or highlightingInfo, for example, > you have to resort to reading the Wiki or to searching code for all > instances of "rsp.add(...)". Most users would need to make sense of the structure of the serialized response (XML or JSON), and I think it's relatively self-documenting on that point, but some clients do construct objects. But I understand where you are coming from as an integrator/embedder rather than a user. Much of the internals of Solr was the fastest way to get from point A to B, and was not meant as a Java interface. The end goal was to get the bits on the wire quickly. It was also meant to enable query plugins to add info to a response or create their own response info w/o having to worry about the details of serialization into XML/JSON, etc. Say we were to come up with a FacetResult class... it would be undesirable to have to add support in the response writers for specific classes that will keep growing over time. So I guess a FacetResult class would have to tell the writers how to access that info, or export some sort of generic interface that served the same purpose as the NamedList does now. > Having written software like this in the past, > in which object structures are in developers' heads rather than in the code, > I bet it's made things more difficult along the way. > > I get the impression that Solr is ready for a bit of refactoring to give it > a more Java-friendly API. This API should be the primary means of access > into Solr functionality; You mean primary for embedded use or for some kind of integration, right? > it should explicitly model searches (i.e. filters > plus queries plus sorts plus facet and highlighting cues), search results, > hits (SolrHit which has a SolrDocument plus scoring info, by way of analogy > with Lucene Hit) and hit documents (i.e. SolrDocument, so that's already > fine). This API should be used _by_ the String-oriented request handlers, > not the other way round; request handlers (and all uses of NamedList) should > be reserved for implementations of that API which deal with non-Java-native > clients. At the moment, the non-Java use cases are calling the shots in the > Java implementation, and that seems a pity. > > Some of these considerations are clearly driving the implementation of > org.apache.solr.client.solrj, which is an important development - I bet > that's where most people start with Solr now. > But I think two things need > to happen here: (i) the work here should be moved into org.apache.solr, > because with the right API at the server end you don't _need_ any code for a > Java client - it would just call into the API, The Java client was meant primarily for remote querying of solr... but it can be transparently used locally. > and would be a 'client' only > in the sense that any caller of a method is that method's client. And (ii) > the API which is currently in org.apache.solr.client.solrj should be using > the kinds of classes I listed above, with UpdateResponse etc containing > fields and getters which model what's actually returned (and do so without > recourse to NamedList). > > I realise that some of this is already happening, but I think with 1.3 still > in its early stages now might be a good time to go the whle way. With a > more heavily modelled and self-documenting API in place, people would find > it a lot easier to develop Solr integrations, and I expect it would speed up > the process of developing new core Solr functionality. > > Any thoughts? I'm certainly not against moving toward nicer internal Java APIs (but back compatibility is an issue here), but I think the external APIs are much more important. -Yonik