Hi,
The CGI use case could be treated as a special case of integrating Nutch with another largely incompatible environment, in a loosely-coupled system. A popular way to do this would be to use an XML-based API from a CGI script.
This is yet another case that speaks in favor of adding an "out-of-the-box" XML API to Nutch. There are only a couple of ways to do it that make sense, IMHO:
* REST - HTTP GET or POST request, with query parameters contained in GET or POST parameters. An XML data document with results is a response. Lightweight, easy to implement and create, and relatively easy to consume. Lack of high-level API-s in most programming languages could be a problem, though.
* RSS - a special case of the above, where the response follows a standard schema. A big advantage to use this is its popularity and a large base of tools (libraries, readers, aggregators).
* SOAP - SOAP-encoded request and response. Well integrated into most programming languages, but certainly less efficient (consumes more bandwidth, CPU and memory to create and consume).
* XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm.
AFAIK, there is a specification called OpenSearch, an extension to RSS, created by Amazon/A9. However, I was unable to find the terms of use for that specification, so it might be encumbered. As I wrote above, using RSS gives strong advantages, so it would be nice to figure out if we can use it.
Existing API-s from other search engines are unfortunately encumbered by their restrictive terms of use, so it is dangerous to re-use them.
I believe that Nutch community is uniquely positioned to propose and promote an open, unencumbered XML API for search results syndication. Let's have a discussion about this - I already implemented a REST interface, which I could clean up and contribute, there were other people on the list who planned to implement the SOAP interface.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
