Hi,

great to hear people still working on things. It shows once more getting something in early would save some effort. :)
Just some random comments.

We run the gui in several production environemnts with patched hadoop code - since this is from our point of view the clean approach. Everything else feels like a workaround to fix some strange hadoop behaviors. It is may be a long time ago that I spoke to Doug and some other Hadoop developers but at this time I understand people that there is a general interest to have a nutch gui and support required functionality in hadoop.
I'm not sure if that is still the case or if I had a wrong impression.
In any case from my p.o.v. the clean way would be getting the required minor changes into hadoop (not critical simple stuff from my point of view) instead of implement working around in nutch. Since hadoop is a kind of child of nutch there should be a close relation at least to discuss things. Anyway no strong option, just my 2 cents. In any case I'm very happy if people see now the need for a gui as well and someone is working on that since I'm kind of busy with other projects.

Thanks.
Stefan


On 17.01.2007, at 06:42, Enis Soztutar wrote:

Hi all, for NUTCH-251:

I suppose that NUTCH-251 is relatively a significant issue by the votes. Stafan has written a good plugin for the admin gui and i have updated it to work with nutch-0.8, hadoop 0.4.

Some of the features in the patch is not appropriate for our use cases and it requires hadoop changes, thus I am currently working on an alternative implementation of the administration gui, which runs a hadoop server( like JobTraker) to listen to submitted Jobs, an web Gui to submit and track the jobs from the browser and a job runner.

The architechture details of the patch is as follows :

- An interface AdminJob which is an abstract class representing a Job in nutch. - various classes extending AdminJob. for ex FetchAdminJob, IndexAdminJob. - A queue which sorts the jobs in priority order, by a modified a topological sort(jobs can be dependent).
 - an interface to submit Jobs
 - a rpc server to listen to job submissions
 - an extension point (basically same as the previous)
 - a web server to serve plugin jsp's

upon the features will be
   - submitting jobs from code, command line or web interface,
   - tracking jobs from the command line or web interface
   - scheduling jobs

I could send the code or details if anyone is interested in pretesting. And i will appreciate any comments and suggestions on this. I am planning to complete the patch and submit it to Jira ASAP.

Sami Siren wrote:
Hello,

It has been a while from a previous release (0.8.1) and looking at the great fixes done in trunk I'd start thinking about baking a new release
soon.

Looking at the jira roadmaps there are 1 blocking issues (fixing the
license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
which I think NUTCH-233 is safe to put in.

The top 10 voted issues are currently:

NUTCH-61         Adaptive re-fetch interval. Detecting umodified content
NUTCH-48        "Did you mean" query enhancement/refignment feature
NUTCH-251       Administration GUI
NUTCH-289       CrawlDatum should store IP address
NUTCH-36        Chinese in Nutch
NUTCH-185 XMLParser is configurable xml parser plugin. NUTCH-59 meta
data support in webdb
NUTCH-92        DistributedSearch incorrectly scores results            
NUTCH-68        A
tool to generate arbitrary fetchlists           NUTCH-87        Efficient
site-specific crawling for a large number of sites

Are there any opinions about issues that should go in before the next
release (Answering yes means that you are willing to provide a patch for
it).

--
 Sami Siren





~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California
http://www.101tec.com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to