But, how can it get a value larger than 255? I mean, even if there is a
value greater than one byte, it should be interpreted as two consecutive
characters, not just one. The problem at hand requires the speed. Hence,
what can I do to make it either just ignore unicode files or ignore the
higher bit (this shud work correctly for UTF 8).

----- Original Message -----
From: "Daniel F. Savarese" <[EMAIL PROTECTED]>
To: "ORO Developers List" <[EMAIL PROTECTED]>
Sent: Monday, January 21, 2002 1:06 PM
Subject: Re: Qusetion


>
> In message <005e01c1a242$f74f5fc0$[EMAIL PROTECTED]>,
"Hardeep Si
> ngh" writes:
> >I have had this problem for a long time now:
> ...
> >However, when I try to use this to search into a binary file (esp. a JAR
> >file), it gives me
> >
> >Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>         at org.apache.oro.text.awk.AwkMatcher._search(AwkMatcher.java:717)
>
> The awk package and AwkMatcher are implemented to only work with input
> containing characters with 8-bit values (0-255).  This is because it is
> a straight-up DFA implementation, which results in fast matches (no
> backtracking) but extremely large state transition tables if the range
> of input is expanded beyond 8 bits.  This will be documented more
> obviously in the future.  At any rate, the reason you're getting the
> exception is because a char value greater than 255 is being encountered,
> for which no state transition is defined.  For full Unicode, use the
> Perl or glob matchers.
>
> daniel
>
>
>
> --
> To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
>
>


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to