[ https://issues.apache.org/jira/browse/NUTCH-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928909#action_12928909 ]
Andrzej Bialecki commented on NUTCH-880: ----------------------------------------- Thanks - this issue is already fixed in NUTCH-932, to be committed soon. > REST API for Nutch > ------------------ > > Key: NUTCH-880 > URL: https://issues.apache.org/jira/browse/NUTCH-880 > Project: Nutch > Issue Type: New Feature > Affects Versions: 2.0 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Fix For: 2.0 > > Attachments: API-2.patch, API.patch > > > This issue is for discussing a REST-style API for accessing Nutch. > Here's an initial idea: > * I propose to use org.restlet for handling requests and returning > JSON/XML/whatever responses. > * hook up all regular tools so that they can be driven via this API. This > would have to be an async API, since all Nutch operations take long time to > execute. It follows then that we need to be able also to list running > operations, retrieve their current status, and possibly > abort/cancel/stop/suspend/resume/...? This also means that we would have to > potentially create & manage many threads in a servlet - AFAIK this is frowned > upon by J2EE purists... > * package this in a webapp (that includes all deps, essentially nutch.job > content), with the restlet servlet as an entry point. > Open issues: > * how to implement the reading of crawl results via this API > * should we manage only crawls that use a single configuration per webapp, or > should we have a notion of crawl contexts (sets of crawl configs) with CRUD > ops on them? this would be nice, because it would allow managing of several > different crawls, with different configs, in a single webapp - but it > complicates the implementation a lot. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.