Hi Doug et al.,
I've been working with my partner at University of Southern California, Rami Al-Ghanmi, to develop such an RSS/XML service for Nutch and am happy to report that we are near complete on it. Our project proposal is available here: http://nunki.usc.edu:8088/599/presentations/Indexing%20and%20Presenting%20RS S%20Feeds%20with%20Nutch.doc If you like, we can submit it as a JIRA issue, and then folks can vote on it if they like it. Thanks much. Cheers, Chris On 3/30/05 10:21 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Andrzej Bialecki wrote: >> This is yet another case that speaks in favor of adding an >> "out-of-the-box" XML API to Nutch. > > Yes, I agree. > >> * REST - HTTP GET or POST request, with query parameters contained in >> GET or POST parameters. An XML data document with results is a response. >> Lightweight, easy to implement and create, and relatively easy to >> consume. Lack of high-level API-s in most programming languages could be >> a problem, though. > > In particular, I would love to see a REST contribution. It should not > require more than a simple servlet or jsp page that uses NutchBean. > This logic should be much the same as the current search.jsp, but the > output would be xml instead of html. Also this would need to provide > documentation of both the url parameters and the xml result schema. > > Once this is implemented, search.jsp can be replaced with a filter that > applies a stylesheet to XML search results. > >> * RSS - a special case of the above, where the response follows a >> standard schema. A big advantage to use this is its popularity and a >> large base of tools (libraries, readers, aggregators). > > This would also be very useful. This could even be the primary API. We > can use namespaces to provide, e.g., non-standard item elements. > >> * SOAP - SOAP-encoded request and response. Well integrated into most >> programming languages, but certainly less efficient (consumes more >> bandwidth, CPU and memory to create and consume). >> >> * XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm. > > These are a lower priority for me, but such contributions would be welcome. > >> AFAIK, there is a specification called OpenSearch, an extension to RSS, >> created by Amazon/A9. However, I was unable to find the terms of use for >> that specification, so it might be encumbered. As I wrote above, using >> RSS gives strong advantages, so it would be nice to figure out if we can >> use it. > > I have written to folks at A9 asking about this. I will report back if > I hear anything. I agree that it would be great if Nutch spoke RSS out > of the box. > >> I believe that Nutch community is uniquely positioned to propose and >> promote an open, unencumbered XML API for search results syndication. >> Let's have a discussion about this - I already implemented a REST >> interface, which I could clean up and contribute, there were other >> people on the list who planned to implement the SOAP interface. > > Do you think there is a need for a non-RSS REST interface? > > Doug > ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 Phone: 818-354-8810 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. ------------------------------------------------------- This SF.net email is sponsored by Demarc: A global provider of Threat Management Solutions. Download our HomeAdmin security software for free today! http://www.demarc.com/Info/Sentarus/hamr30 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
