Nitin Borwankar wrote:
Hi all,
First an intro. I am another Nutch newbie and am finding 0.7.2 to be
quite an effective single machine crawler.
[..]
The ability to keep db formats compatible would be nice to allow reuse
of existing results but is not necessary.
That's probably not going to happen - each branch has specific
requirements from the db and segment formats, which are incompatible.
However, given enough interest we could implement converters, even
bi-directional.
As a potential developer I would like to volunteer for the ongoing
maintenance and evolution of 0.7.2 as an effective single machine
crawler.
That's excellent! I imagine the procedure to get you involved would be
something like this:
* start collecting issues related to maintenance, bugfixes or
improvements of that branch,
* create JIRA issues, plus start collecting patches, tested and ready
for committing. One of the existing developers will commit them on your
behalf.
* after a while we would consider giving you committer rights so that
you could work directly with the code.
Consider this a proposal to maintain two separate versions by continuing
bug fix versions of 0.7 until one of two things happen
a) 0.8 evolves to something satisfactory for use as also as a single
machine search engine and everyone is happy moving to it
b) a critical mass of developers steps forward to support the ongoing
development of 0.7.2 into say Nutch-lite always and only meant for
single machine use.
I do hope that option a) becomes a reality sooner rather than later. But if
there is sufficient interest (and enough developers) in developing 0.7 branch,
then go for it - keeping in mind, though, that eventually these code bases will
diverge so much that maintaining them will require two mostly separate teams ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com