Hi all, for NUTCH-251: I suppose that NUTCH-251 is relatively a significant issue by the votes. Stafan has written a good plugin for the admin gui and i have updated it to work with nutch-0.8, hadoop 0.4.
Some of the features in the patch is not appropriate for our use cases and it requires hadoop changes, thus I am currently working on an alternative implementation of the administration gui, which runs a hadoop server( like JobTraker) to listen to submitted Jobs, an web Gui to submit and track the jobs from the browser and a job runner. The architechture details of the patch is as follows : - An interface AdminJob which is an abstract class representing a Job in nutch. - various classes extending AdminJob. for ex FetchAdminJob, IndexAdminJob. - A queue which sorts the jobs in priority order, by a modified a topological sort(jobs can be dependent). - an interface to submit Jobs - a rpc server to listen to job submissions - an extension point (basically same as the previous) - a web server to serve plugin jsp's upon the features will be - submitting jobs from code, command line or web interface, - tracking jobs from the command line or web interface - scheduling jobs I could send the code or details if anyone is interested in pretesting. And i will appreciate any comments and suggestions on this. I am planning to complete the patch and submit it to Jira ASAP. Sami Siren wrote: > Hello, > > It has been a while from a previous release (0.8.1) and looking at the > great fixes done in trunk I'd start thinking about baking a new release > soon. > > Looking at the jira roadmaps there are 1 blocking issues (fixing the > license headers) for 0.8.2 and two other blocking issues for 0.9.0 of > which I think NUTCH-233 is safe to put in. > > The top 10 voted issues are currently: > > NUTCH-61 Adaptive re-fetch interval. Detecting umodified content > NUTCH-48 "Did you mean" query enhancement/refignment feature > NUTCH-251 Administration GUI > NUTCH-289 CrawlDatum should store IP address > NUTCH-36 Chinese in Nutch > NUTCH-185 XMLParser is configurable xml parser plugin. > NUTCH-59 meta > data support in webdb > NUTCH-92 DistributedSearch incorrectly scores results > NUTCH-68 A > tool to generate arbitrary fetchlists NUTCH-87 > Efficient > site-specific crawling for a large number of sites > > Are there any opinions about issues that should go in before the next > release (Answering yes means that you are willing to provide a patch for > it). > > -- > Sami Siren > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers