On Feb 23, 2007, at 03:10 UTC, David Glass wrote:

> At the risk of hijacking my own thread, would the fact that the  
> debugger reports this string as UTF8 when it apparently isn't (2x as 
> long as it should be) be a bug?

No.

> When I read the file in it is reported as UTF-8 (in the debugger),  
> can I not trust this information?

Of course not.  RB isn't a mind-reader; it can't know what the encoding
of the text actually is.  It only knows what you've told it.  When you
read a file in, you tell it what the encoding is, either explicitly, or
by default (the default for a TextInputStrearm is UTF-8).  If what you
tell it is wrong, then it's wrong.  In such a case, the .Encoding
property of the string is whatever you've told it, but RB will not be
able to display or convert this text as you would expect (it'll be
converting or drawing it *as if* it were what you said it is).

> As I watch this in the debugger, as the lines come in from the text  
> file they are reported as UTF-8, but when I drag them out of the  
> listbox they are reported as UTF-16.

This is normal.  There are system requirements in certain places that
require converting text into certain encodings.   For example, the
clipboard and the drag manager on OS X both require Unicode text to be
in UTF-16.

> I know you can't tell me what *specifically* in my code would cause  
> that, but what *generically* would cause that?

Nothing in your code.  It's normal for drags and copy/paste.
  
> Would the concatenation be changing the encoding?

Yes, if you concatenate two strings of different encodings, then RB
will convert one or both into some encoding that can represent the
combined text.  (Of course it doesn't change the strings you're
concatenating; I'm just talking about the combined result.)
 
> I don't use DefineEncoding anywhere, as I learned my lesson the last 
> time I tried to figure out encodings.

That's good.  From your description, I think there's nothing very
complex going on here; the data you're reading in simply isn't UTF-8,
as you're (probably by default) claiming it is.  Change your
TextInputStream.Encoding to reflect whatever the text actually is
(perhaps UTF-16?), and all will work fine.

Best,
- Joe

--
Joe Strout -- [EMAIL PROTECTED]
Verified Express, LLC     "Making the Internet a Better Place"
http://www.verex.com/

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to