Hi all, for NUTCH-251:

I suppose that NUTCH-251 is relatively a significant issue by the votes. 
Stafan has written a good plugin for the admin gui and i have updated it 
to work with nutch-0.8, hadoop 0.4.

Some of the features in the patch is not appropriate for our use cases 
and it requires hadoop changes, thus I am currently working on an 
alternative implementation of the administration gui, which runs a 
hadoop server( like JobTraker) to listen to submitted Jobs, an web Gui 
to submit and track the jobs from the browser and a job runner.

The architechture details of the patch is as follows :

  - An interface AdminJob which is an abstract class representing a Job 
in nutch.
  - various classes extending AdminJob. for ex FetchAdminJob, IndexAdminJob.
  - A queue which sorts the jobs in priority order, by a modified a 
topological sort(jobs can be dependent).
  - an interface to submit Jobs
  - a rpc server to listen to job submissions
  - an extension point (basically same as the previous)
  - a web server to serve plugin jsp's

upon the features will be
    - submitting jobs from code, command line or web interface,
    - tracking jobs from the command line or web interface
    - scheduling jobs

I could send the code or details if anyone is interested in pretesting. 
And i will appreciate any comments and suggestions on this. I am 
planning to complete the patch and submit it to Jira ASAP.

Sami Siren wrote:
> Hello,
>
> It has been a while from a previous release (0.8.1) and looking at the
> great fixes done in trunk I'd start thinking about baking a new release
> soon.
>
> Looking at the jira roadmaps there are 1 blocking issues (fixing the
> license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
> which I think NUTCH-233 is safe to put in.
>
> The top 10 voted issues are currently:
>
> NUTCH-61       Adaptive re-fetch interval. Detecting umodified content
> NUTCH-48      "Did you mean" query enhancement/refignment feature
> NUTCH-251     Administration GUI
> NUTCH-289     CrawlDatum should store IP address
> NUTCH-36      Chinese in Nutch
> NUTCH-185     XMLParser is configurable xml parser plugin.            
> NUTCH-59        meta
> data support in webdb
> NUTCH-92      DistributedSearch incorrectly scores results            
> NUTCH-68        A
> tool to generate arbitrary fetchlists                 NUTCH-87        
> Efficient
> site-specific crawling for a large number of sites
>
> Are there any opinions about issues that should go in before the next
> release (Answering yes means that you are willing to provide a patch for
> it).
>
> --
>  Sami Siren
>
>   


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to