Re: Counting Chars By ASCII Part 2

Mark Smith Wed, 01 Mar 2006 08:57:37 -0800

Exactly why I wondered...

from the Docs:

<If the useUnicode property is set to true, the numToChar functionreturns a double-byte character. If the useUnicode is false and youspecify an ASCIIValue greater than 255, the numToChar functionreturns the character corresponding to the ASCIIValue mod 256.>

So if we're dealing with unicode text, and really need to count theinstances of each character outside of the 0 - 255 range, then we'vegot a real lot of tests to do since we have to consider values up to65535...

my 0 - 255 test took a couple of seconds on about 3 megabytes ofdata, testing for only 27 characters...Todd may be taking somelongish coffee breaks!


Mark

On 1 Mar 2006, at 15:34, Jim Ault wrote:


On 3/1/06 7:20 AM, "Todd Geist" <[EMAIL PROTECTED]> wrote:

Question 1:

         IF (tASCII < 31 OR tASCII > 255) THEN

Why would you test for > 255 since no ASCII would be higher than this?

Question 2:

Are you trying to strip the characters, or just count them andreport the

result, like a histogram?

Could you show exactly what you are starting with and what you wantto end

up with?

Thanks.

Jim Ault
Las Vegas

On 3/1/06 7:20 AM, "Todd Geist" <[EMAIL PROTECTED]> wrote:

Hello Again,

After trying several of the excellent suggestions from all you
revolutionaries, I realized I hadn't quite explained myself... go
figure.  So here is another attempt to explain what I am after.

I am actually after "low" ASCII and "High" ASCII characters that my
have snuck into a text file. So I need to look at every character,
but I don't need to count every character.  I just want the ones that
have ASCII values below 32 and above 255 and that are not in a small
set of allowed control characters.

Based on the suggestions I got on the other thread, I came up with
the following that produces the results I am after.  SPEED is
critical here, since the files I am scanning maybe many mbs. I am
wondering if any of you can improve on the design.  I  feel the need,
the need for SPEED.  :>)

put field 1 into tString
put "10 11 12 29" into charsToIgnore

     REPEAT for each char tChar in tString
         put charToNum(tChar) into tASCII
         IF (tASCII < 31 OR tASCII > 255) THEN
             IF tASCII is not among the words of charsToIgnore THEN
                 add 1 to tCounts[tASCII]
             END IF
         END IF
     END REPEAT
     put the keys of tCounts into tChars
     sort lines of tChars numeric

     REPEAT for each line thisLine in tChars

put thisLine & TAB & tCounts[thisLine] & Return afternewList

     END REPEAT

put newList into field "Chars"

Thanks in Advance

Todd



_______________________________________________
use-revolution mailing list
[email protected]

Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:

http://lists.runrev.com/mailman/listinfo/use-revolution


_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: Counting Chars By ASCII Part 2

Reply via email to