Marija, great.
I have a small piece of advice relating to your regular expressions,
JFYI. Java tools to parse code already exist and can be reused. For
now, I suggest leaving things as is, just taking a look at ANTLR and
JavaCC [1], [2], [3], [4]. I believe our parser design does not
prevent us from plugging these popular grammar compilers later.

[1] http://www.antlr.org/ (please, note, that the last version of this
tool has a license which is incompatible with APL)
[2] http://www.antlr.org/grammar/list
[3] http://javacc.dev.java.net/
[4] http://javacc.dev.java.net/servlets/ProjectDocumentList?folderID=110



On Mon, Jul 13, 2009 at 12:35 AM, <[email protected]> wrote:
> Author: maka82
> Date: Sun Jul 12 13:34:46 2009
> New Revision: 39
>
> Modified:
>   trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
>
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>
> Log:
> Decomposer of words is improved.
>
> Modified: trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
> ==============================================================================
> --- trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java
>  (original)
> +++ trunk/src/main/java/org/apache/rat/pd/core/SourceCodeAnalyser.java  Sun
> Jul 12 13:34:46 2009
> @@ -201,10 +201,11 @@
>        private StringBuffer combineTokens(String[] tokens, int start, int
> end) {
>
>                StringBuffer sb = new StringBuffer();
> -               for (int k = start; k <= end; k++) {
> +               for (int k = start; k < end; k++) {
>                        sb.append(tokens[k]);
>                        sb.append(" ");
>                }
> +               sb.append(tokens[end]);
>                return sb;
>        }
>
> @@ -212,6 +213,7 @@
>         * extract tokens
>         */
>        private String[] tokeniseString(String file) {
> +               file = file.replaceAll("\\n", "\n ");
>                String[] tokens = file.split(STRING_DELIMETER_REGEX);
>                // this simple tokeniser returns array {""} when "" is
> tokenised
>                // I must avoid that behavior
>
> Modified:
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
> ==============================================================================
> ---
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>        (original)
> +++
> trunk/src/main/java/org/apache/rat/pd/heuristic/comment/JavaCommentHeuristicChecker.java
>        Sun Jul 12 13:34:46 2009
> @@ -33,7 +33,7 @@
>         * This regular expression match comments in Java. More info
> on:{...@link}
>         * http://ostermiller.org/findcomment.html
>         */
> -       private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(//.*[\\n\\r])";
> +       private static final String JAVA_COMMENT_REGEX =
> "(/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/[\\n\\r]*)|(//.*[\\n\\r])";
>
>        public JavaCommentHeuristicChecker(int limit) {
>                super(JAVA_COMMENT_REGEX, limit);
>



-- 
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://www.telecom-express.ru/
http://harmony.apache.org/
http://code.google.com/p/openmeetings/

Reply via email to