Hi Phil, Thanks for your comments. Mine below:
>> Unfortunately some parts of the documentation on Nutch (namely the >> tutorial, >> and other parts of the static site) have been out of date for a while. This >> has occurred really independent of the releases, and independent of the >> wiki >> [1], which hasn't really fallen out of date as quick. >> > > While documentation may not be part of the code, it's certainly part of the > project. And it's just as important as the code. Yes, I know that > documentation is the bane of programmers everywhere. I'm a coder. I get it. > But when you change the way things work in a fundamental way that leaves all > of your documentation behind, it's time to spend some time on it. Sure. So, what fundamental way has Nutch changed from 1.0 to 1.1? Can you elaborate? Also, in terms of spending time on Nutch's documentation, I'll try to as I get more time (as I'm sure other committers will as well), but I'd also say: if there's something to be improved, by all means, go for it, and patches welcome to contribute it back. > > >>> >>> For example, my find of broken code in bin/nutch crawl, a most basic way >> of >>> getting it running. >> >> Can you elaborate on your find of broken code? Did you file a JIRA issue >> for >> this in the Nutch JIRA system [2] ? >> > > Yes, it led to another release. The bug fix I contributed was incorporated. Great! >> >> The more information you provide here about your environment and your >> situation that caused the error, as well as e.g., detailed information (a >> stack trace, an exception, something), the easier it is to track down what >> you're seeing. >> > > Yes, that was all in the unanswered emails. it would be easier for you to > search your inbox than for me to send it all over again. I wouldn't assume that the inboxes of folks watching the list are always centered on the Nutch mailing lists. Realize that many of us are subscribed to several mailing lists, and sometimes, emails go unanswered for a while. > >> That said, one thing to realize is that this is open source software, so in >> the end, as they say in Apache, "those that do, decide", or "patches >> welcome!" In other words, if there are things that you see that could be >> fixed, improved, made more configurable, etc., including the code, but >> *also >> the documentation*, then by all means we'd appreciate your feedback and >> contribution. Nutch is not simply a product of the developers that >> contribute their (potentially and often unsalaried) time to work on it, but >> of its user community as well. >> > > I've been the leader of a major open source project for over 10 years. Last > fall I relinquished the reins of that project to a new project leader, so I > think I know how it works. We wrote an open source cross platform compiler > for xBase (Clipper) code named Harbour Project, now in release 2.0. > > That would be why I not only raised the flag that it's not ready to release, > but I tracked down a bug and submitted a bug fix. > > And I'm still saying it's not ready to release. There's still another bug > that I have found that goes unanswered. Right, so then you know that bugs aren't just "bugs" -- they must come with a priority. There are several categories, "High", "Medium", "Critical", or "Blocker", just to name a few. When I cut a release as the Release Manager (RM), I always run unit tests and try and at least run a basic crawl first before cutting the RC. So, hopefully that catches anything that would be a big problem, but sometimes even that process breaks down since not everyone has e.g., large scale deployments, or maybe we're missing a unit test we need, etc. I'd say at ~10 releases of Nutch to date, and many many features, etc., we have fairly decent regression. >> >> In certain cases you are right, but I would take your above comments as >> verbatim across the board. For example, if you believe there is >> documentation lacking, then the first step is typically to file JIRA issues >> to alert committers and other users of Nutch of your concern and then have >> discussion on the lists regarding the issues. At some point a patch is >> produced, and then attached to the issue, where the committers can review >> the patches and then work to get them committed to the code base. >> >> Nutch has a number of unit tests for regression that ship with the product >> that tell me that it's not broken, and users that are able to make it work >> in their environments. There have been some recent bug fixes in the 1.1 RC >> that we caught which have been fixed (NUTCH-812, NUTCH-814, etc.), but >> that's natural. >> > > No, not we. Me. I found a bug, told you about it and provided the fix. > Before I did that, I told you that your release candidate was broken. Just > like I'm still saying, unless I'm doing something grossly wrong, it's still > broken. Right, gotcha. I didn't map that you had been the guy that contributed the patch. Thanks for that. >> >> Good question. I'm not super familiar with the nightly tests, but my guess >> is that the scripts are outside the context of the tests since most of the >> tests use Junit and are testing the Java API and classes. I may be wrong >> though. >> > > Then that means that you need more unit and process tests that are run > before a release candidate. If the nightly build tests are this weak, you > can't depend on them to tell you all you need to know. It would keep you > from creating a release candidate that was plainly broken in a most > fundamental way. Hmmm...I'm not sure anything about Nutch is weak, and that's really a subjective/qualitative judgment. If you have ideas about how to improve the tests, we'd welcome them. Until then, the 100+ tests that exist are fairly decent, at least in my experience using Nutch. Furthermore, in my software development experience, I've never seen 100% coverage on tests -- it simply doesn't work that way. >> Ready in the sense of the release is a consensus decision made by the >> developers and community based on a variety of things: >> >> * issues being resolved in JIRA of a particular priority >> * time in-between last release >> * community requesting a release >> * according to some pre-defined schedule >> * making a feature release to get out new interesting features >> etc etc. >> > > Most of the above are Marketing issues, not release issues, but I'm not on > the staff here, so I won't critique. You have your priorities, that's good > enough for me. Marketing issues? Huh? They are in fact release criteria, in just about every software development job I've worked within. > > One of the pleasures of Open Source is that there is no marketing department > forcing you to release a product that is not yet ready. We've all lived with > products like that. In the short run it's not fun. And in the long run it > will give you a bad reputation. That's probably why the Nutch 1.1 RC hasn't turned into the Nutch 1.1 release. We work with the community during the release process, just like we do during development. > I have found at least two bugs, one of them I tracked down and fixed and > submitted code. The other I don't even know where to start the hunt and that > is what lead me to post some questions here. > > I'd appreciate it if someone knowledgeable would look at those questions > from last week and give me some feedback. > Sure, hopefully you'll find the answer you're looking for. In the meanwhile, it's my job to keep cutting release candidates as the RM, that at least pass the basic criteria for release and right now that involves what I mentioned above. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++