On Feb 23, 2007, at 03:10 UTC, David Glass wrote: > At the risk of hijacking my own thread, would the fact that the > debugger reports this string as UTF8 when it apparently isn't (2x as > long as it should be) be a bug?
No. > When I read the file in it is reported as UTF-8 (in the debugger), > can I not trust this information? Of course not. RB isn't a mind-reader; it can't know what the encoding of the text actually is. It only knows what you've told it. When you read a file in, you tell it what the encoding is, either explicitly, or by default (the default for a TextInputStrearm is UTF-8). If what you tell it is wrong, then it's wrong. In such a case, the .Encoding property of the string is whatever you've told it, but RB will not be able to display or convert this text as you would expect (it'll be converting or drawing it *as if* it were what you said it is). > As I watch this in the debugger, as the lines come in from the text > file they are reported as UTF-8, but when I drag them out of the > listbox they are reported as UTF-16. This is normal. There are system requirements in certain places that require converting text into certain encodings. For example, the clipboard and the drag manager on OS X both require Unicode text to be in UTF-16. > I know you can't tell me what *specifically* in my code would cause > that, but what *generically* would cause that? Nothing in your code. It's normal for drags and copy/paste. > Would the concatenation be changing the encoding? Yes, if you concatenate two strings of different encodings, then RB will convert one or both into some encoding that can represent the combined text. (Of course it doesn't change the strings you're concatenating; I'm just talking about the combined result.) > I don't use DefineEncoding anywhere, as I learned my lesson the last > time I tried to figure out encodings. That's good. From your description, I think there's nothing very complex going on here; the data you're reading in simply isn't UTF-8, as you're (probably by default) claiming it is. Change your TextInputStream.Encoding to reflect whatever the text actually is (perhaps UTF-16?), and all will work fine. Best, - Joe -- Joe Strout -- [EMAIL PROTECTED] Verified Express, LLC "Making the Internet a Better Place" http://www.verex.com/ _______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives: <http://support.realsoftware.com/listarchives/lists.html>
