Hi,
great to hear people still working on things. It shows once more
getting something in early would save some effort. :)
Just some random comments.
We run the gui in several production environemnts with patched hadoop
code - since this is from our point of view the clean approach.
Everything else feels like a workaround to fix some strange hadoop
behaviors. It is may be a long time ago that I spoke to Doug and some
other Hadoop developers but at this time I understand people that
there is a general interest to have a nutch gui and support required
functionality in hadoop.
I'm not sure if that is still the case or if I had a wrong impression.
In any case from my p.o.v. the clean way would be getting the
required minor changes into hadoop (not critical simple stuff from my
point of view) instead of implement working around in nutch. Since
hadoop is a kind of child of nutch there should be a close relation
at least to discuss things.
Anyway no strong option, just my 2 cents. In any case I'm very happy
if people see now the need for a gui as well and someone is working
on that since I'm kind of busy with other projects.
Thanks.
Stefan
On 17.01.2007, at 06:42, Enis Soztutar wrote:
Hi all, for NUTCH-251:
I suppose that NUTCH-251 is relatively a significant issue by the
votes. Stafan has written a good plugin for the admin gui and i
have updated it to work with nutch-0.8, hadoop 0.4.
Some of the features in the patch is not appropriate for our use
cases and it requires hadoop changes, thus I am currently working
on an alternative implementation of the administration gui, which
runs a hadoop server( like JobTraker) to listen to submitted Jobs,
an web Gui to submit and track the jobs from the browser and a job
runner.
The architechture details of the patch is as follows :
- An interface AdminJob which is an abstract class representing a
Job in nutch.
- various classes extending AdminJob. for ex FetchAdminJob,
IndexAdminJob.
- A queue which sorts the jobs in priority order, by a modified a
topological sort(jobs can be dependent).
- an interface to submit Jobs
- a rpc server to listen to job submissions
- an extension point (basically same as the previous)
- a web server to serve plugin jsp's
upon the features will be
- submitting jobs from code, command line or web interface,
- tracking jobs from the command line or web interface
- scheduling jobs
I could send the code or details if anyone is interested in
pretesting. And i will appreciate any comments and suggestions on
this. I am planning to complete the patch and submit it to Jira ASAP.
Sami Siren wrote:
Hello,
It has been a while from a previous release (0.8.1) and looking at
the
great fixes done in trunk I'd start thinking about baking a new
release
soon.
Looking at the jira roadmaps there are 1 blocking issues (fixing the
license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
which I think NUTCH-233 is safe to put in.
The top 10 voted issues are currently:
NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
NUTCH-48 "Did you mean" query enhancement/refignment feature
NUTCH-251 Administration GUI
NUTCH-289 CrawlDatum should store IP address
NUTCH-36 Chinese in Nutch
NUTCH-185 XMLParser is configurable xml parser plugin. NUTCH-59
meta
data support in webdb
NUTCH-92 DistributedSearch incorrectly scores results
NUTCH-68 A
tool to generate arbitrary fetchlists NUTCH-87 Efficient
site-specific crawling for a large number of sites
Are there any opinions about issues that should go in before the next
release (Answering yes means that you are willing to provide a
patch for
it).
--
Sami Siren
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California
http://www.101tec.com
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers