On Dec 17, 2006, at 23:05 UTC, Steve Roy wrote: > I'm wondering how to read a text file in UTF-16 encoding that has an > FFFE BOM (little endian).
If you're running on a little-endian machine, then you just set TextInputStream.Encoding to Encodings.UTF16 and read it in. If you're running on a big-endian machine, then you need to swap every two bytes. There are various ways to do this -- probably something involving a BinaryStream and a MemoryBlock would be best. > so it seems that the solution is to use a BinaryStream like so: > > dim bstr as BinaryStream = File.OpenAsBinaryFile() > dim b as UInt16 = bstr.ReadUInt16() > bstr.LittleEndian = (b = &hFFFE) Nope. BinaryStream.LittleEndian specifies how integers are read and written, and has nothing to do with text. > However now the compiler tells me that the method ReadAll does not > exist in this class, although the documentation says that > BinaryStream supports the Readable interface. Yes, that's correct (ReadAll is not part of the Readable interface). Of course reading a whole file with a BinaryStream is easy (bstr.Read(bstr.Length)), but that's not relevant in this case. On Dec 17, 2006, at 23:27 UTC, Charles Yeomans wrote: > TextInputStream allows you to specify an encoding. If you specify > UTF-16, then the BOM is part of the encoding and so the > TextInputStream should do the right thing. If it doesn't, then you > should consider a feedback report. The BOM is not part of the encoding. A feature request for RB to respect the BOM and automatically do some byte-flipping isn't a completely ridiculous idea, but I'm not sure it's practical. When would you expect the TextInputStream to read the first two bytes of the file to check for this BOM? On the first read? On every read? The first read after you change the encoding? Perhaps these details could be worked out in a sensible way, but the correct behavior isn't obvious to me, and of course it still wouldn't help you read wrong-endian files when there is no BOM. A better feature request might be to extend BinaryStream.LittleEndian to apply to text read/writes as well. But that's a little problematic too, since a BinaryStream doesn't really read or write text; it reads and writes binary data. That still seems a little simpler to work out to me, though. On Dec 18, 2006, at 05:01 UTC, Steve Roy wrote: > I guess that's part of the question here. Is the BOM part of the > encoding? I would think not since it's only a trick used to hint at > the endianness of a file contents and we can't expect a BOM to be > prepended to every string. You're correct. > > data = inp.ReadAll > > if left(data,1) = Encodings.UTF8.Chr(&hFEFF) then > > data = Mid(data,2) > > end if > > This code is not helpful in dealing with endianness because I need to > know the endianness before I read in the data, not after. That's true, but of course you can simply add an "else if" clause comparing the first character to UTF8.Chr(&hFFEF) to detect a BOM in the opposite byte order. So it's easy to tell you have a problem (when there happens to be a BOM); it's harder to actually read in the text under such conditions. > This still leaves the question of whether BinaryStream implements the > Readable interface It does. > which is what the documentation says but the > compiler says it doesn't. No, the compiler says nothing of the sort. ...I see the problem. The documentation on Readable is wrong. Readable defines only Read, ReadError, and EOF; it does not define a ReadAll method. You can see this by typing: Dim r as Readable r. and then pressing Tab after the dot. (Be sure to also also check "x = r." when you think there might be functions that return a value defined.) I found this (closed) bug report for this problem: <http://www.realsoftware.com/feedback/viewreport.php?reportid=dkzzadvg> ..but I'm using 2006R4 and the LR is still wrong. Hopefully RS will reopen it, or (better yet) just fix the problem. Best, - Joe -- Joe Strout -- [EMAIL PROTECTED] Verified Express, LLC "Making the Internet a Better Place" http://www.verex.com/ _______________________________________________ Unsubscribe or switch delivery mode: <http://www.realsoftware.com/support/listmanager/> Search the archives of this list here: <http://support.realsoftware.com/listarchives/lists.html>
