[ http://issues.apache.org/jira/browse/NUTCH-251?page=all ]
Stefan Groschupf updated NUTCH-251:
-----------------------------------
Attachment: hadoop_nutch_gui_v1.patch
nutch_gui_v1.patch
nutch_gui_plugins_v1.zip
This is a early preview patch of the nutch gui.
There are known issues, however it is a starting point from where we can
continue building a solid administration user interface.
This patch introduce following functionalities:
+ web based administration gui via embed web container
+ gui is fully based on the plugin system, so it is customizable and
extendable using plugins
+ all plugins can be internationalized
+ introduce the concept of nutch instances, a mechanism to have separated
configurable nutch deployments using the same code base. (e.g intranet search,
webpage search)
+ plug able authentication, currently it comes with a default user - password
tuple based on the configuration but for example LDAP integration can be easily
realized.
The patch it comes with following plugins:
+ admin-listing
++ required by the web ui to show all deployed plugins as tabs on a webpage
+ admin-instance
++ lists all instances and allows to create a new instance
+ admin-configuration
++ configure a nutch instance (configuration will be written as nutch-site.xml
to hdd)
+ admin-inject
++ inject urls in a crawlDb
+admin-system
++ shows status of system
+admin-job
++ shows status of jobs
+ admin-crawldb-status
++ shows crawldb entries filtered by status or shows the status of a given url
(usefully to check if a page was already fetched)
+admin-management
++ generate segment
++ fetch segment
++ parse segment (if required)
++ update crawldb
++ invert links
++ index segment
++ delete segment, parse, index etc.
+admin-scheduling
++ quartz based cron job management to run a time driven "generate - fetch -
updatedb - invertlins - index" job
Known issues
+ require hadoop changes
+ local running jobs can not be stopped but distributed running jobs can be
stopped
+ index searcher does not use index folders inside of segment folders as in
nutch 0.7 but the gui place the index folder in the segment folder
++ searcher is unable to find indices
+ put to search does not work since searcher does not support dynamically
adding of index folders
+ linkdb inverter does not update but overwrite a linkdb - this is a general
nutch bug but affect the gui as well.
+ the nutch gui introduce locking by storing lock files in folders, this
mechanism is ignored by the nutch command line tools.
It would be great if users can test the gui and reports bugs and help to
improve the patch.
This is a very complex patch and it is difficult to stay in sync with the
latest changes so in case we miss something
until generation this patch and the patch does not work as expected please
don't blame us but give us some time and hints to fix the problems.
help is welcome by following tasks:
+ fixing languages issues in java doc, api and bundle files
+ translate bundles in more languages (currently it comes with english and
german bundles)
+ heavily test and find bugs and provide fixes :)
+ write help texts and documentation
How to:
+ checkout latest nutch sources
+ checkout hadoop sources
+ patch hadoop with the hadoop patch
+ build hadoop jar
+ remove old hadoop jar from nutch/lib
+ place new hadoop jar in nutch/lib
+ uncompress plugin zip file
+ place plugins in nutch/src/plugins (patch not possible since svn does not
support binary patches)
+ patch nutch with nutch patch
+ start gui with bin/nutch gui <folderWhereYourInstanceDataWillBeStored)
+ point your browser to: http://localhost:50060/general/
+ username and password are "admin". ( can be changed in nutch-default.xml)
+ select the "default" instance or create a new instance.
Thanks to everybody that helped to get this implement and do the first beta
tests, but specially to Marko hacking all jsp's!
I suggest to add this patch to a nutch 0.9 branch and add a gui component in
the jira to go from there.
I really hope I didn't miss anything or upload the wrong files now. :-O
> Administration GUI
> ------------------
>
> Key: NUTCH-251
> URL: http://issues.apache.org/jira/browse/NUTCH-251
> Project: Nutch
> Type: Improvement
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Priority: Minor
> Fix For: 0.8-dev
> Attachments: hadoop_nutch_gui_v1.patch, nutch_gui_plugins_v1.zip,
> nutch_gui_v1.patch
>
> Having a web based administration interface would help to make nutch
> administration and management much more user friendly.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers