Re: [MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Rich Morin
At 21:30 -0500 3/3/09, Robert Schaaf wrote: > a_string.tr('^ -~', ' ') Any comments on efficiency? That's pretty much equivalent to this code: a.gsub(/[^\x20-\x7e]/, ' ') It may or may not be faster, more to your taste, etc. Before using it, be sure that you don't want to preserve characters

Re: [MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Robert Schaaf
Well, my medication has finally worn off, and I came up with this: a_string.tr('^ -~', ' ') Any comments on efficiency? God bless ascii for being contiguous. All this is to clean up imperfectly mapped EBCDIC (eeeww!) Thanks for the suggestions. Bob Schaaf On Mar 3, 2009, at 10:34 AM, Manf

Re: [MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Manfred Stienstra
On Mar 3, 2009, at 4:18 PM, Rich Morin wrote: It looks to me like this is a solution for a different problem; that is, discarding characters outside of the specified range. Also, do we want to map newlines, etc? Anyway, irb sez: Oops, I misread that. Yeah, gsub is probably faster. string.u

Re: [MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Rich Morin
At 12:45 +0100 3/3/09, Manfred Stienstra wrote: > On Mar 3, 2009, at 12:37 PM, Robert Schaaf wrote: >>string.unpack('U*'). >> select { |c| (0x20..0x7e).include? (c) }. >> pack('U*') It looks to me like this is a solution for a different problem; that is, discarding characters outsid

Re: [MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Manfred Stienstra
On Mar 3, 2009, at 12:37 PM, Robert Schaaf wrote: This may be obvious, but in a Unicode world it's driving me nuts. Given an arbitrary string, which may contain unicode characters, how do I replace all characters not in the range 0x20..0x7e with spaces? This isn't really a MacRuby related

[MacRuby-devel] Scanning Unicode strings for non-ascii characters

2009-03-03 Thread Robert Schaaf
Hello all, This may be obvious, but in a Unicode world it's driving me nuts. Given an arbitrary string, which may contain unicode characters, how do I replace all characters not in the range 0x20..0x7e with spaces? Thanks for any guidance, Bob Schaaf AIU Holdings