On Dec 17, 2006, at 23:05 UTC, Steve Roy wrote:

> I'm wondering how to read a text file in UTF-16 encoding that has an  
> FFFE BOM (little endian).

If you're running on a little-endian machine, then you just set
TextInputStream.Encoding to Encodings.UTF16 and read it in.

If you're running on a big-endian machine, then you need to swap every
two bytes.  There are various ways to do this -- probably something
involving a BinaryStream and a MemoryBlock would be best.

> so it seems that the solution is to use a BinaryStream like so:
> 
>      dim bstr as BinaryStream = File.OpenAsBinaryFile()
>      dim b as UInt16 = bstr.ReadUInt16()
>      bstr.LittleEndian = (b = &hFFFE)

Nope.  BinaryStream.LittleEndian specifies how integers are read and
written, and has nothing to do with text.

> However now the compiler tells me that the method ReadAll does not  
> exist in this class, although the documentation says that  
> BinaryStream supports the Readable interface.

Yes, that's correct (ReadAll is not part of the Readable interface). 
Of course reading a whole file with a BinaryStream is easy
(bstr.Read(bstr.Length)), but that's not relevant in this case.

On Dec 17, 2006, at 23:27 UTC, Charles Yeomans wrote:

> TextInputStream allows you to specify an encoding.  If you specify  
> UTF-16, then the BOM is part of the encoding and so the  
> TextInputStream should do the right thing.  If it doesn't, then you  
> should consider a feedback report.

The BOM is not part of the encoding.  A feature request for RB to
respect the BOM and automatically do some byte-flipping isn't a
completely ridiculous idea, but I'm not sure it's practical.  When
would you expect the TextInputStream to read the first two bytes of the
file to check for this BOM?  On the first read?  On every read?  The
first read after you change the encoding?  Perhaps these details could
be worked out in a sensible way, but the correct behavior isn't obvious
to me, and of course it still wouldn't help you read wrong-endian files
when there is no BOM.

A better feature request might be to extend BinaryStream.LittleEndian
to apply to text read/writes as well.  But that's a little problematic
too, since a BinaryStream doesn't really read or write text; it reads
and writes binary data.  That still seems a little simpler to work out
to me, though.

On Dec 18, 2006, at 05:01 UTC, Steve Roy wrote:

> I guess that's part of the question here. Is the BOM part of the  
> encoding? I would think not since it's only a trick used to hint at  
> the endianness of a file contents and we can't expect a BOM to be  
> prepended to every string.

You're correct.

> >         data = inp.ReadAll
> >         if left(data,1) = Encodings.UTF8.Chr(&hFEFF) then
> >            data = Mid(data,2)
> >         end if
> 
> This code is not helpful in dealing with endianness because I need to
> know the endianness before I read in the data, not after.

That's true, but of course you can simply add an "else if" clause
comparing the first character to UTF8.Chr(&hFFEF) to detect a BOM in
the opposite byte order.

So it's easy to tell you have a problem (when there happens to be a
BOM); it's harder to actually read in the text under such conditions.

> This still leaves the question of whether BinaryStream implements the
> Readable interface

It does.

> which is what the documentation says but the  
> compiler says it doesn't.

No, the compiler says nothing of the sort.  ...I see the problem.  The
documentation on Readable is wrong.  Readable defines only Read,
ReadError, and EOF; it does not define a ReadAll method.  You can see
this by typing:

  Dim r as Readable
  r.

and then pressing Tab after the dot.  (Be sure to also also check "x =
r." when you think there might be functions that return a value
defined.)

I found this (closed) bug report for this problem:
<http://www.realsoftware.com/feedback/viewreport.php?reportid=dkzzadvg>

..but I'm using 2006R4 and the LR is still wrong.  Hopefully RS will
reopen it, or (better yet) just fix the problem.

Best,
- Joe

--
Joe Strout -- [EMAIL PROTECTED]
Verified Express, LLC     "Making the Internet a Better Place"
http://www.verex.com/

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to