Re: Add ".settings" to svn:ignore on root Nutch folder?
> An end-to-end unit test would help coverage a lot: something that > performs a simple crawl and runs queries against it. Ideally this would > start an in-process web server serving test content, crawl that content, > then start a web server serving queries. This could be run in both > local and pseudo-distributed mode. I agree this kind of functional tests are needed. But having a lot of unit tests on basis classes has a lot of advantages: 1. If limit cases (null params, and so on) are tested it provides a kind of specification. So if a developper changes such limit cases then unit tests fails, and we immediatly know that we are not backward compatible. 2. Finding class level bugs is really more easy with unit tests rather than with functional tests (it is really painful to find a bug in a low level class such as StringUtil with functional tests, whereas it can be quickly checked by a good unitary test) In other words the most code is covered by unit tests, the more it is easy to find problems that occurs during functional tests. But sure, some tests that requires a lot of contextual objects are not easy to implement and functional tests are the solution. (we can perhaps plan to integrate such tests for 1.0) Jérôme
Re: Add ".settings" to svn:ignore on root Nutch folder?
Jérôme Charron wrote: I absolutely agree Dawid. But I don't think Nutch has enought human resources to have a Q&A person. I will make a try to integrate a code coverage tool, and see if it gives us some good indices on unit tests needed efforts. I think more unit tests could go a long way towards improving things. Ideally we'd also run nightly builds on Windows. An end-to-end unit test would help coverage a lot: something that performs a simple crawl and runs queries against it. Ideally this would start an in-process web server serving test content, crawl that content, then start a web server serving queries. This could be run in both local and pseudo-distributed mode. Doug
Re: Add ".settings" to svn:ignore on root Nutch folder?
> > My feeling was simply that the closest we are to Nutch-1.0, the more be > need > > some Q&A metrics (for us and for nutch users). No? > I absolutely agree Jérôme, really. It's just that developers usually > tend to hook up dozens of Q&A plugins and never look at what they output > (that's the usual scenario with Maven-built projects that I observed). Yes, that's right...;-) What I think we need is a Q&A _person_ rather than just tools. But I'm > always a bit skeptical, don't take it personally ;) I absolutely agree Dawid. But I don't think Nutch has enought human resources to have a Q&A person. I will make a try to integrate a code coverage tool, and see if it gives us some good indices on unit tests needed efforts. Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: Add ".settings" to svn:ignore on root Nutch folder?
My feeling was simply that the closest we are to Nutch-1.0, the more be need some Q&A metrics (for us and for nutch users). No? I absolutely agree Jérôme, really. It's just that developers usually tend to hook up dozens of Q&A plugins and never look at what they output (that's the usual scenario with Maven-built projects that I observed). What I think we need is a Q&A _person_ rather than just tools. But I'm always a bit skeptical, don't take it personally ;) D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
> With code coverage... I don't know. It's > up to you guys -- you spend much more time on Nutch code than I do and > you know best what is needed and what isn't. My feeling was simply that the closest we are to Nutch-1.0, the more be need some Q&A metrics (for us and for nutch users). No? Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
PMD integration (was: Re: Add ".settings" to svn:ignore on root Nutch folder?)
Hi, I have downloaded the patches and generally like them (I had only read them not applied yet). I have one question - am I reading it correctly that right now it is checking only main code (without plugins?). P. Dawid Weiss wrote: All right, I though I'd give it a go since I have a spare few minutes. Jura is off, so I made the patches available here -- http://ophelia.cs.put.poznan.pl/~dweiss/nutch/ pmd.patch is the build file patch and libraries (binaries are in a separate zip file pmd-ext.zip). pmd-fixes.patch fixes the current core code to go through pmd smoothly. I removed obvious unused code, but left FIXME comments where I wasn't sure if the removal can cause side effects (in these places PMD warnings are suppressed with NOPMD comments). I also discovered a bug in PMD... eh... nothing's perfect. https://sourceforge.net/tracker/?func=detail&atid=479921&aid=1465574&group_id=56262 D. Piotr Kosiorowski wrote: +1 - I offer my help - we can coordinate it and I can do a part of work. I will also try to commit your patches quickly. Piotr On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote: Other options (raised on the Hadoop list) are Checkstyle: PMD seems to be the best choice for an Apache project and they all seem to perform at a similar level. Anything that generates a lot of false positives is bad: it either causes us to skip analysis of lots of files, or ignore the warnings. Skipping the JavaCC-generated classes is reasonable, but I'm wary of skipping much else. I thought a bit about this. The warnings PMD may actually make sense to fix. Take a look at maxDoc here: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { private int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... maxDoc is accessed from LuceneQueryOptimizer which requires a synthetic accessor in LimitExceeded. It also may look confusing because you declare a field private to a class, but use it from the outside... changing declarations to something like this: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { final int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... removes the warning and also seems to make more sense (note that package scope of maxDoc doesn't really expose it much more than before because the entire class is private). So... if you agree to change existing warnings as shown above (there's not that many) then integrating PMD with a set of sensible rules may help detecting bad smells in the future (I couldn't resist -- it really is called like this in software engineering :). I only used dead code detection ruleset for now, other rulesets can be checked and we will see if they help or quite the contrary. If developers agree to the above I'll create a patch together with what needs to be fixed to cleanly compile. Otherwise I see little sense in integrating PMD. D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
All right, I though I'd give it a go since I have a spare few minutes. Jura is off, so I made the patches available here -- http://ophelia.cs.put.poznan.pl/~dweiss/nutch/ pmd.patch is the build file patch and libraries (binaries are in a separate zip file pmd-ext.zip). pmd-fixes.patch fixes the current core code to go through pmd smoothly. I removed obvious unused code, but left FIXME comments where I wasn't sure if the removal can cause side effects (in these places PMD warnings are suppressed with NOPMD comments). I also discovered a bug in PMD... eh... nothing's perfect. https://sourceforge.net/tracker/?func=detail&atid=479921&aid=1465574&group_id=56262 D. Piotr Kosiorowski wrote: +1 - I offer my help - we can coordinate it and I can do a part of work. I will also try to commit your patches quickly. Piotr On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote: Other options (raised on the Hadoop list) are Checkstyle: PMD seems to be the best choice for an Apache project and they all seem to perform at a similar level. Anything that generates a lot of false positives is bad: it either causes us to skip analysis of lots of files, or ignore the warnings. Skipping the JavaCC-generated classes is reasonable, but I'm wary of skipping much else. I thought a bit about this. The warnings PMD may actually make sense to fix. Take a look at maxDoc here: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { private int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... maxDoc is accessed from LuceneQueryOptimizer which requires a synthetic accessor in LimitExceeded. It also may look confusing because you declare a field private to a class, but use it from the outside... changing declarations to something like this: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { final int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... removes the warning and also seems to make more sense (note that package scope of maxDoc doesn't really expose it much more than before because the entire class is private). So... if you agree to change existing warnings as shown above (there's not that many) then integrating PMD with a set of sensible rules may help detecting bad smells in the future (I couldn't resist -- it really is called like this in software engineering :). I only used dead code detection ruleset for now, other rulesets can be checked and we will see if they help or quite the contrary. If developers agree to the above I'll create a patch together with what needs to be fixed to cleanly compile. Otherwise I see little sense in integrating PMD. D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
+1 - I offer my help - we can coordinate it and I can do a part of work. I will also try to commit your patches quickly. Piotr On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > > > Other options (raised on the Hadoop list) are Checkstyle: > > PMD seems to be the best choice for an Apache project and they all seem > to perform at a similar level. > > > Anything that generates a lot of false positives is bad: it either > > causes us to skip analysis of lots of files, or ignore the warnings. > > Skipping the JavaCC-generated classes is reasonable, but I'm wary of > > skipping much else. > > I thought a bit about this. The warnings PMD may actually make sense to > fix. Take a look at maxDoc here: > > class LuceneQueryOptimizer { > >private static class LimitExceeded extends RuntimeException { > private int maxDoc; > public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } >} > ... > > maxDoc is accessed from LuceneQueryOptimizer which requires a synthetic > accessor in LimitExceeded. It also may look confusing because you > declare a field private to a class, but use it from the outside... > changing declarations to something like this: > > class LuceneQueryOptimizer { > >private static class LimitExceeded extends RuntimeException { > final int maxDoc; > public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } >} > ... > > removes the warning and also seems to make more sense (note that package > scope of maxDoc doesn't really expose it much more than before because > the entire class is private). > > So... if you agree to change existing warnings as shown above (there's > not that many) then integrating PMD with a set of sensible rules may > help detecting bad smells in the future (I couldn't resist -- it really > is called like this in software engineering :). I only used dead code > detection ruleset for now, other rulesets can be checked and we will see > if they help or quite the contrary. > > If developers agree to the above I'll create a patch together with what > needs to be fixed to cleanly compile. Otherwise I see little sense in > integrating PMD. > > D. > > > >
Re: Add ".settings" to svn:ignore on root Nutch folder?
> Other options (raised on the Hadoop list) are Checkstyle: PMD seems to be the best choice for an Apache project and they all seem to perform at a similar level. Anything that generates a lot of false positives is bad: it either causes us to skip analysis of lots of files, or ignore the warnings. Skipping the JavaCC-generated classes is reasonable, but I'm wary of skipping much else. I thought a bit about this. The warnings PMD may actually make sense to fix. Take a look at maxDoc here: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { private int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... maxDoc is accessed from LuceneQueryOptimizer which requires a synthetic accessor in LimitExceeded. It also may look confusing because you declare a field private to a class, but use it from the outside... changing declarations to something like this: class LuceneQueryOptimizer { private static class LimitExceeded extends RuntimeException { final int maxDoc; public LimitExceeded(int maxDoc) { this.maxDoc = maxDoc; } } ... removes the warning and also seems to make more sense (note that package scope of maxDoc doesn't really expose it much more than before because the entire class is private). So... if you agree to change existing warnings as shown above (there's not that many) then integrating PMD with a set of sensible rules may help detecting bad smells in the future (I couldn't resist -- it really is called like this in software engineering :). I only used dead code detection ruleset for now, other rulesets can be checked and we will see if they help or quite the contrary. If developers agree to the above I'll create a patch together with what needs to be fixed to cleanly compile. Otherwise I see little sense in integrating PMD. D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
Other options (raised on the Hadoop list) are Checkstyle: http://checkstyle.sourceforge.net/ and FindBugs: http://findbugs.sourceforge.net/ Although these are both under LGPL and thus harder to include in Apache projects. Anything that generates a lot of false positives is bad: it either causes us to skip analysis of lots of files, or ignore the warnings. Skipping the JavaCC-generated classes is reasonable, but I'm wary of skipping much else. Sigh. Doug Dawid Weiss wrote: Ok, PMD seems like a good idea. I've added it to the build file. Unused code detection shows a few catches (javacc-generated classes need to be ignored because they contain a lot of junk), but unfortunately it also displays false positives such as in: MapWritable.java 429 {Avoid unused private fields such as 'fKeyClassId'} This field is private but is used in an outside class (through a synthetic accessor I presume, so a simple syntax tree analysis PMD does is insufficient to catch it). These things would need to be marked in the code as ignorable... Do you want me to create a JIRA issue for this, Doug? Or should we drop the subject? Oh, I forgot to say this: PMD's jars add a minimum of 1MB to the codebase (Xerces can be reused). D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
I'm a fan of automated testing and code analysis utilities, but I must say they only make sense if people actually use them and look at their results. So it's not really just about integration -- it's about looking at the results of these tools. PMD is neat because it can simply interrupt your build process so you'll have to either fix the warning or explicitly mark it as ignored. With code coverage... I don't know. It's up to you guys -- you spend much more time on Nutch code than I do and you know best what is needed and what isn't. Let me know about PMD. I'll create the patch tomorrow if there's a consensus on if and how we should use it. For those impatient, the patch is in the attachment. Place the required PMD JARs in lib/pmd-ext/ and run 'ant pmd'. D. Jérôme Charron wrote: I would not be opposed to integrating PMD or something similar into Nutch's build.xml. What do others think? Any volunteers? I'll do it. I meant to see PMD anyway so it'll be a good exercise. Dawid, what about integrating a Code Coverage Tool like EMMA ( http://emma.sourceforge.net/) while integrating PMD ? Jérôme Index: build.xml === --- build.xml (revision 391739) +++ build.xml (working copy) @@ -198,6 +198,34 @@ + + + + + + + + + + + unusedcode + + + + + + + + + + + + + FAILURE: PMD shows ${pmd.failures} rule violations. See ${pmd.report} for details. + + +
Re: Add ".settings" to svn:ignore on root Nutch folder?
> > I would not be opposed to integrating PMD or something similar into > > Nutch's build.xml. What do others think? Any volunteers? > I'll do it. I meant to see PMD anyway so it'll be a good exercise. Dawid, what about integrating a Code Coverage Tool like EMMA ( http://emma.sourceforge.net/) while integrating PMD ? Jérôme
Re: Add ".settings" to svn:ignore on root Nutch folder?
Ok, PMD seems like a good idea. I've added it to the build file. Unused code detection shows a few catches (javacc-generated classes need to be ignored because they contain a lot of junk), but unfortunately it also displays false positives such as in: MapWritable.java 429 {Avoid unused private fields such as 'fKeyClassId'} This field is private but is used in an outside class (through a synthetic accessor I presume, so a simple syntax tree analysis PMD does is insufficient to catch it). These things would need to be marked in the code as ignorable... Do you want me to create a JIRA issue for this, Doug? Or should we drop the subject? Oh, I forgot to say this: PMD's jars add a minimum of 1MB to the codebase (Xerces can be reused). D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
> PMD looks like a useful such tool: > http://pmd.sourceforge.net/ant-task.html > I would not be opposed to integrating PMD or something similar into > Nutch's build.xml. What do others think? Any volunteers? +1 (Very configurable, very good tool!)
Re: Add ".settings" to svn:ignore on root Nutch folder?
One can presumably disable such minor warnings in Eclipse. Arguably the bug is that Eclipse warns about such things by default, rather than in a 'pedantic' mode. I agree -- some of them are really annoying. Plus, Eclipse has been having notorious problems showing warnings for unused parameters in overriden methods... But I still think some of the warnings can be valuable and your idea with PMD is a good one. One caution: we have run into problems where includes were removed because a tool said they were unused, but they were required for the Javadoc. So code-analysis tools are not infallible! Eclipse deals with these properly -- I use it all the time. I believe it also shows warnings for classes referenced in JavaDocs and not imported. I would not be opposed to integrating PMD or something similar into Nutch's build.xml. What do others think? Any volunteers? I'll do it. I meant to see PMD anyway so it'll be a good exercise. D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
Dawid Weiss wrote: Eclipse shows a good few warnings in the present codebase. They are usually minor things like malformed JavaDocs, unused variables and such. One can presumably disable such minor warnings in Eclipse. Arguably the bug is that Eclipse warns about such things by default, rather than in a 'pedantic' mode. Does it make sense to fix them? I can correct at least some of them when I go through the code, but such patches are tedious to review and need to be applied promptly to avoid conflicts. With folks using a variety of tools to edit their code, some such warnings are inevitable. Fixing these can clutter patches whose intent is to fix something else. One way to deal with these is to ocassionally make a big commit which fixes all of them across the whole project (according to a particular warning system, anyway) and makes no other changes. One caution: we have run into problems where includes were removed because a tool said they were unused, but they were required for the Javadoc. So code-analysis tools are not infallible! Alternately, we could try to integrate some tool into the ant build process that checks for unused variables, etc. Then we could make "warning-free" a requirement for commits. PMD looks like a useful such tool: http://pmd.sourceforge.net/ant-task.html I would not be opposed to integrating PMD or something similar into Nutch's build.xml. What do others think? Any volunteers? Doug
Re: Add ".settings" to svn:ignore on root Nutch folder?
It works fine Doug, thanks. Please tell me if it is correct, since I don't use Eclipse. I'm at the vi (or rather vim) level very often, but emacs is still ahead of me ;) And on a more serious note, Eclipse shows a good few warnings in the present codebase. They are usually minor things like malformed JavaDocs, unused variables and such. Does it make sense to fix them? I can correct at least some of them when I go through the code, but such patches are tedious to review and need to be applied promptly to avoid conflicts. D.
Re: Add ".settings" to svn:ignore on root Nutch folder?
Dawid Weiss wrote: Would it be a problem to add Eclipse's ".settings" folder to ignored files (since Eclipse project files are already there anyway). This file is used when one wants to override default project configuration (code formatting, specific JVM etc). I just made this change. Please tell me if it is correct, since I don't use Eclipse. Thanks, Doug
Add ".settings" to svn:ignore on root Nutch folder?
Would it be a problem to add Eclipse's ".settings" folder to ignored files (since Eclipse project files are already there anyway). This file is used when one wants to override default project configuration (code formatting, specific JVM etc). Dawid