Hello Again,

After trying several of the excellent suggestions from all you revolutionaries, I realized I hadn't quite explained myself... go figure. So here is another attempt to explain what I am after.

I am actually after "low" ASCII and "High" ASCII characters that my have snuck into a text file. So I need to look at every character, but I don't need to count every character. I just want the ones that have ASCII values below 32 and above 255 and that are not in a small set of allowed control characters.

Based on the suggestions I got on the other thread, I came up with the following that produces the results I am after. SPEED is critical here, since the files I am scanning maybe many mbs. I am wondering if any of you can improve on the design. I feel the need, the need for SPEED. :>)

put field 1 into tString
put "10 11 12 29" into charsToIgnore

    REPEAT for each char tChar in tString
        put charToNum(tChar) into tASCII
        IF (tASCII < 31 OR tASCII > 255) THEN
            IF tASCII is not among the words of charsToIgnore THEN
                add 1 to tCounts[tASCII]
            END IF
        END IF
    END REPEAT
    put the keys of tCounts into tChars
    sort lines of tChars numeric

    REPEAT for each line thisLine in tChars
        put thisLine & TAB & tCounts[thisLine] & Return after newList
    END REPEAT

put newList into field "Chars"

Thanks in Advance

Todd

--

Todd Geist
______________________________________
g e i s t   i n t e r a c t i v e

_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to