[ https://issues.apache.org/jira/browse/SOLR-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525231 ]
Hoss Man commented on SOLR-344: ------------------------------- I've only had a chance to skim the attached PDF ... I've printed it out in the hopes that I'll find some time to read in depth your specific ideas about what the ideal Solr API should be; but there are a few things that jumped out at me that I wanted to address while they were on my mind... -- Motivation -- - Direct Java is "better" - A key assumption in this proposal seems to be that "if you are writing a Java app, and you want to use Solr, you should not use the HTTP interface" I would argue strongly against this assumption. there are *lots* of reasons why it makes sense to treat Solr as a webservice and interact with it over HTTP instead of having a tight coupling with your Java application: redundancy, load balancing, ... Even if someone had a situation where they only had one machine in their entire operation, and all of their applications ran on that machine i would still suggest installing a servlet container and using Solr that way because it's likely they will have more then one application that will want to deal with their index. Solr can make a lot of good optimizations and assumptions that go right out the window if you try to embed Solr in 2 different apps reading and writing to the same physical index directory. Even if compelling stats can be presented that the HTTP+XML/JSON overhead is in fact a bottleneck, i would still think that pursuing something like an RMI based client/server API in addition to the HTTP API would make more sense then encouraging people to use directly in the JVM of their other applications. Even the Plugin model (for embedding your custom Java code into Solr) is something i only recommend in situations where it makes a lot of sense for that logic to tied closely with the Solr or Lucene internals (ie: as part of the TokenStream, or dealing with the DocSets before they are cached, etc...) The #1 "Value Add" that Solr has over Lucene is the Client/Server abstraction ... there are certainly other value adds -- some small (like added TokenFilters) and some big (like the IndexSchema concept) -- and many of these could probably be refactored into the Lucene core (or a Lucene contrib) so they could be reused by other Lucene applications in addition to Solr ... but Solr *is* an application. Arguing that you shouldn't bother using a client/server relationship to deal with Solr if your application is written in Java is like arguing that you shouldn't bother using a client/server relationship to deal with MySQL if your application is written in C. - Demand for direct access - the statement "a significant proportion of questions on the mailing lists are clearly from people who are attempting such integrations right now." does not serve as a clear call to action ... even if a significant number of recent questions have related to embedded Solr (and I'm not convinced the number is that significant) that one data point alone does not clearly indicate that it is important/urgent to make this easier to do. It just indicates that the people who are attempting to do this have questions about how to do it ... which isn't that suprising considering it's a relatively new concept that hasn't really been documented. Some of these people may just be assuming that they *need* to embed Solr in their existing Java applications because they don't realize it's intended to be used as a server. The [EMAIL PROTECTED] list gets lots of questions from people who misunderstand the the demo code in the Lucene distribution and think Lucene is an application that they can run on the command line to index files and search them -- that doesn't mean that the Lucene-Java project should revamp itself to focus on producing an application instead of a Library, it means the Lucene-Java community has to help educate users about: A) how they can use the Lucene library to build their own apps; and B) what apps are built on top of the Lucene library that might be useful to them. I think it would probably be more beneficial for the community as a whole if people spent more time/energy documenting the benefits/mechanisms of using Solr as a server, or improving the client APIs to make communicating with a Solr server faster/easier then it would to dedicate a lot of resources solely towards making Solr more of a library and less of an application. -- Strategy for making changes -- All that said -- i agree with you that a lot of improvements can and should be made to the internal APIs. Not because i think we need to make it easier to embed Solr, but to make it easier for new developers to work on the Solr internals (or to write plugins). if embedding Solr gets easier as a result -- great, but I don't see that as a compelling reason for change. Somewhere in your doc, you advocated the importance of a top down complete API overhaul instead of approaching things piecemeal (forgive me for not remembering exactly how you put it, I'm not trying to put words in your mouth i just remember there being a sentiment like this) ... while i think it would definitely make sense to have some discussions on solr-dev about what the big problems are with the internal APIs and come up with a high level picture of what the ideal API might be so we can aim for it, the best way to get there is with small patches that focuses on a single area. I say this from experience as someone who has submitted patches to projects, and as a committer who has to review patches: Big patches that change a lot of things take a lot more work/discussion/thought to review and generally spend a lot longer sitting in Jira then shorter most focused patches (some day I'll sit down and do the math and write out "Hoss'ss Patch Size Theorem" but for now take my word for it that there's an exponential factor in there somewhere). The best way to proceed is probable to start by tackling individual pieces of functionality, adding the API you think there should be, and refactoring the current code to implement/use that API (leaving the old one around as deprecated). -- Loose APIs vs tight APIs -- While i agree there are a lot of places where thing like NamedList are overused, don't discount the value add that this kind of "pass through" API allows ... the decision to use things like the SolrParams class in some utility classes was made consciously in a lot of cases, in order to make it easier for these utilities to grow and evolve without their callers needing to be aware of these new changes ... SimpleFacets for example takes in a generic SolrParams and returns a NamedList so that as new functionality is added and new params are added to control that functionality existing request handlers don't have to be specificly aware of all those param names in order to get that functionality. They can be if they want: they can construct a SolrParams instance just for driving SimpleFacets behavior instead of passing through the main request params, it's their choice ... but a very specific API, where every query param was mapped to a constructor arg or a setter method or a command pattern object or something else that had a tighter coupling would require changes in RequestHandlers anytime something like Date faceting was added (or even facet.mincount) if i remember correctly, you pointed out in the mailing list that things like SimpleFacets or the Highlighting utils shouldn't return NamedLists -- it should return a more specific FacetResults/HighlightResults objects ... i would definitely be on board patches like that. Refactoring the code to use a well typed response object certainly would make the code easier to understand, and new getters can always be added for pulling out new types of information as added -- the important thing is that Result objects like this would need to be able to translate themselves back into simple objects that can be understood by ResponseWriters so that the various RequestHandlers/ResponseWriters don't *need* to be aware of their details. > New Java API > ------------ > > Key: SOLR-344 > URL: https://issues.apache.org/jira/browse/SOLR-344 > Project: Solr > Issue Type: Improvement > Components: clients - java, search, update > Affects Versions: 1.3 > Reporter: Jonathan Woods > Attachments: New Java API for Solr.pdf > > > The core Solr codebase urgently needs to expose a new Java API designed for > use by Java running in Solr's JVM and ultimately by core Solr code itself. > This API must be (i) object-oriented ('typesafe'), (ii) self-documenting, > (iii) at the right level of granularity, (iv) designed specifically to expose > the value which Solr adds over and above Lucene. > This is an urgent issue for two reasons: > - Java-Solr integrations represent a use-case which is nearly as important as > the core Solr use-case in which non-Java clients interact with Solr over HTTP > - a significant proportion of questions on the mailing lists are clearly from > people who are attempting such integrations right now. > This point in Solr development - some way out from the 1.3 release - might be > the right time to do the development and refactoring necessary to produce > this API. We can do this without breaking any backward compatibility from > the point of view of XML/HTTP and JSON-like clients, and without altering the > core Solr algorithms which make it so efficient. If we do this work now, we > can significantly speed up the spread of Solr. > Eventually, this API should be part of core Solr code, not hived off into > some separate project nor in a non-first-class package space. It should be > capable of forming the foundation of any new Solr development which doesn't > need to delve into low level constructs like DocSet and so on - and any new > development which does need to do just that should be a candidate for > incorporation into the API at the some level. Whether or not it will ever be > worth re-writing existing code is a matter of opinion; but the Java API > should be such that if it had existed before core plug-ins were written, it > would have been natural to use it when writing them. > I've attached a PDF which makes the case for this API. Apologies for > delivering it as an attachment, but I wanted to embed pics and a bit of > formatting. > I'll update this issue in the next few days to give a prototype of this API > to suggest what it might look like at present. This will build on the work > already done in Solrj and SearchComponents > (https://issues.apache.org/jira/browse/SOLR-281), and will be a patch on an > up-to-date revision of Solr trunk. > [PS: > 1. Having written most of this, I then properly looked at > SearchComponents/SOLR-281 and read > http://www.nabble.com/forum/ViewPost.jtp?post=11050274&framed=y, which says > much the same thing albeit more quickly! And weeks ago, too. But this > proposal is angled slightly differently: > - it focusses on the value of creating an API not only for internal Solr > consumption, but for local Java clients > - it focusses on designing a Java API without constantly being hobbled by > HTTP-Java > - it's suggesting that the SearchComponents work should result in a Java API > which can be used as much by third party Java as by ResponseBuilder. > 2. I've made some attempt to address Hoss's point > (http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#6551097579454875774) > - that an API like this would need to maintain enough state e.g. to allow an > initial search to later be faceted, highlighted etc without going back to the > start each time - but clearly the proof of the pudding will be in the > prototype. > 3. Again, I've just discovered SOLR-212 (DirectSolrConnection). I think all > my comments about Solrj apply to this, useful though it clearly is.] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.