Hi, 

I'm posting this here before entering a bug in bugzilla just to make sure
that it's not related to bug #125...

My results were the same in both oro-dev-2.0.2-dev-2 and oro-dev-2.0.3


MULTILINE_MASK patterns that use the end anchor '$' are not matching
non-UNIX files.
                (I don't have a Macintosh to test with,
                 but I've confirmed this with Windows NT)


So, if I have the regular expression
   /<matching pattern>$/m
   
   -Slurp a file into a string (using the System's line.separator between lines)

   -Try to match the file string (which contains a line that matches)
   
     + On Solaris, the pattern matches without any problems.
     + On WinNt, the pattern doesn't match.


The fix I'm using now is to write the regular expression so that it looks like
   /<matching pattern>([\r\n]|$)/sm


I see that there are many checks in the oro code that look for a character
equal to '\n'... My suggested fix is to create a helper class (what package
this belongs in I don't know). But, this helper class could have a static
method "boolean isLineEnding( char )" or similar that could replace all the
current "<char> == '\n'" code. I'd be happy to implement this (with a little
guidance as to where it belongs).



Notes on test results using the attached TestEndAnchor code:

  On a Solaris 2.6 machine, there were two failures when trying to match the
  first pattern. On a WinNT 4.0 machine, there were three failures when trying
  to match the same pattern. The added failure was from the string that uses
  the System.getProperty( "line.separator" ) for it's line ending.


Thanks,
Ed.


P.S.

Daniel, I would have liked to give you the regular expressions I used in my
timing test code (from mid-May). But, I was advised against doing that because
the REs were very application specific.

import org.apache.oro.text.regex.Pattern;
import org.apache.oro.text.regex.PatternMatcher;
import org.apache.oro.text.regex.Perl5Matcher;
import org.apache.oro.text.regex.Perl5Compiler;
import org.apache.oro.text.regex.PatternMatcherInput;
import org.apache.oro.text.regex.MalformedPatternException;


/**
 * Test the regular expression end anchor using different line endings.
 *
 * @author Ed Chidester
 */
public class TestEndAnchor {

    /**
     * An array of Perl-syntax regular expression strings for testing.
     * Regular expressions are tested against both the
     * {@link #failString__  failing  string} and the
     * {@link #matchString__ matching string} arrays.
     */
    private static String [ ] reString__    = {
            "/x$/m"             ,
            "/x([\\r\\n]|$)/sm" ,
        };

    /**
     * An array of matching strings for testing.
     * This array is used for the tests performed in the
     * {@link #main main testing method}.
     * Each element will match against the corresponding element in
     * the {@link #reString__ regular expression array}.
     */
    private static String [ ] matchString__ = {
            "This line ends with x"                                        ,
            "This also stops at x\r"
            + "but it uses a \\r char like a Macintosh file would"         ,
            "This also stops at x\n"
            + "but it uses a \\n char like a Solaris file would"           ,
            "This also stops at x\r\n"
            + "but it uses both \\r and \\n like Win32 files would"        ,
            "This line also stops with x"
            + System.getProperty( "line.separator" )
            + "and it uses the system-dependent line ending character(s)." ,
        };

    /**
     * An array of failing strings for testing.
     * This array is used for the tests performed in the
     * {@link #main main testing method}.
     * Each element will fail to match against the corresponding element in
     * the {@link #reString__ regular expression array}.
     */
    private static String [ ] failString__  = {
            "This line ends with the wrong character"     ,
            "This one also ends with the wrong character" ,
            "Wrong characters abound in the failString__\r"
            + "array"                                     ,
            "Wrong characters abound in the failString__\r"
            + "array"                                     ,
            "Neither tubas nor xylophones should cause"
            + System.getProperty( "line.separator" )
            + "the regular expressions to match."
        };

    /**
     * <p>
     * Main test method
     * </p>
     *
     */
    public static void main( String [ ] args ) {

        int        i;
        int        x;
        int        size           = matchString__.length;
        int        oroMatchFlags;
        int        firstIndex;
        int        lastIndex;
        int        finalIndex;
        Pattern [] oroObject      = new Pattern [ reString__.length ];
        Perl5Matcher  oroMatcher;
        Perl5Compiler oroCompiler;

        if ( failString__.length  < size ) {
            size = failString__.length;
        }

        try {
            // Initialize the Oro objects
            oroMatcher  = new Perl5Matcher( );
            oroCompiler = new Perl5Compiler( );

            // ---------------------------------------------
            // Initialize all the regular expression objects
            // ---------------------------------------------
            for ( i = 0 ; i < reString__.length ; i++ ) {
                oroMatchFlags = Perl5Compiler.DEFAULT_MASK;
                firstIndex    = reString__[ i ].indexOf(     '/' );
                lastIndex     = reString__[ i ].lastIndexOf( '/' );
                finalIndex    = reString__[ i ].length( );
                if ( lastIndex <= firstIndex ) {
                    System.err.println( "Error reading regular expression \""
                                        + reString__[ i ] + "\"" );
                    lastIndex  = reString__[ i ].length( );
                }
                // Account for any global or case insensitive matches
                for ( int j = lastIndex + 1 ; j < finalIndex ; j++ ) {
                    if      ( reString__[ i ].charAt( j ) == 'i' ) {
                        // // Testing printout...
                        // System.out.println( "Case independent\t"
                        //                     + reString__[ i ] );

                        oroMatchFlags |= Perl5Compiler.CASE_INSENSITIVE_MASK;
                    }
                    else if ( reString__[ i ].charAt( j ) == 'm' ) {
                        // // Testing printout...
                        // System.out.println( "Multiline match\t"
                        //                     + reString__[ i ] );

                        oroMatchFlags |= Perl5Compiler.MULTILINE_MASK;
                    }
                    else if ( reString__[ i ].charAt( j ) == 's' ) {
                        // // Testing printout...
                        // System.out.println( "Singleline match\t"
                        //                     + reString__[ i ] );
                        oroMatchFlags |= Perl5Compiler.SINGLELINE_MASK;
                    }
                    else {
                        System.err.println( "Regular expression option \"/"
                                            + reString__[ i ].charAt( j )
                                            + "\" is being ignored" );
                    }
                } // End for j
                oroObject[ i ] = oroCompiler.compile(
                    reString__[i].substring( (firstIndex + 1), lastIndex ) ,
                    oroMatchFlags                                          );
            } // End for i

            // -------------
            // Begin testing
            // -------------

            // Testing printout...
            System.out.println( "About to begin testing" );


            for ( x = 0 ; x < reString__.length ; x++ ) {
                for ( i = 0 ; i < size ; i++ ) {
                    int beginIndex = 0;
                    boolean legitimateMatch   = false;
                    boolean illegitimateMatch = false;

                    PatternMatcherInput pmiMatch =
                        new PatternMatcherInput( matchString__[ i ] );

                    PatternMatcherInput pmiFail  =
                        new PatternMatcherInput( failString__[  i ] );

                    if ( ! oroMatcher.contains( pmiMatch , oroObject[x] ) ) {

                        System.err.println("Error with Perl5 match["+i+"]");
                        System.err.println( reString__[    x ] );
                        System.err.println( matchString__[ i ] );
                    }
                    else {
                        // Testing printout...
                        System.out.println( "match[ " + i + " ] okay" );
                    }

                    if (   oroMatcher.contains( pmiFail  , oroObject[x] ) ) {

                        System.err.println("Error with Perl5  fail["+i+"]");
                        System.err.println( reString__[   x ] );
                        System.err.println( failString__[ i ] );
                    }
                    else {
                        // Testing printout...
                        System.out.println( "fail[ " + i + " ] okay" );
                    }

                } // End for i
            } // End for x
        }
        catch ( Exception e ) {
            System.err.println( e + "Caught while running test" );
            e.printStackTrace( System.err );
            System.exit( 1 );
        }
        // Testing printout...
        System.out.println( "Finished testing" );


    } // End main method
    
} // TestEndAnchor class

Reply via email to