Marija, great. I have a small piece of advice relating to your regular expressions, JFYI. Java tools to parse code already exist and can be reused. For now, I suggest leaving things as is, just taking a look at ANTLR and JavaCC [1], [2], [3], [4]. I believe our parser design does not prevent us from plugging these popular grammar compilers later.
[1] http://www.antlr.org/ (please, note, that the last version of this tool has a license which is incompatible with APL) [2] http://www.antlr.org/grammar/list [3] http://javacc.dev.java.net/ [4] http://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110 On Mon, Jul 13, 2009 at 12:35 AM, <[email protected]> wrote: > Author: maka82 > Date: Sun Jul 12 13:34:46 2009 > New Revision: 39 > > Modified: > trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java > > trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java > > Log: > Decomposer of words is improved. > > Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java > ============================================================================== > --- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java > (original) > +++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java Sun > Jul 12 13:34:46 2009 > @@ -201,10 +201,11 @@ > private StringBuffer combineTokens(String[] tokens, int start, int > end) { > > StringBuffer sb = new StringBuffer(); > - for (int k = start; k <= end; k++) { > + for (int k = start; k < end; k++) { > sb.append(tokens[k]); > sb.append(" "); > } > + sb.append(tokens[end]); > return sb; > } > > @@ -212,6 +213,7 @@ > * extract tokens > */ > private String[] tokeniseString(String file) { > + file = file.replaceAll("\\n", "\n "); > String[] tokens = file.split(STRING_DELIMETER_REGEX); > // this simple tokeniser returns array {""} when "" is > tokenised > // I must avoid that behavior > > Modified: > trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java > ============================================================================== > --- > trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java > (original) > +++ > trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java > Sun Jul 12 13:34:46 2009 > @@ -33,7 +33,7 @@ > * This regular expression match comments in Java. More info > on:{...@link} > * http://ostermiller.org/findcomment.html > */ > - private static final String JAVA_COMMENT_REGEX = > "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])"; > + private static final String JAVA_COMMENT_REGEX = > "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])"; > > public JavaCommentHeuristicChecker(int limit) { > super(JAVA_COMMENT_REGEX, limit); > -- With best regards / с наилучшими пожеланиями, Alexei Fedotov / Алексей Федотов, http://www.telecom-express.ru/ http://harmony.apache.org/ http://code.google.com/p/openmeetings/
