On 12/17/2010 3:25 PM, Mike Cowlishaw wrote:
>
>> I have a Rexx program that merges several small files onto
>> one large one. As it turned out a few of the small files were
>> prefixed with a UTF8 BOM, |0xEFBBBF|. Should the BOM have
>> been recognized and discarded?
> How could Rexx (or any other processor) decide that some particular
> prefix/content/suffix of a file is worthless and should be discarded?
>
> ("darn it, this file ends in 'ILY'; delete that!").
It would handle it as any other text processor. Open the file, read the
first three or four
bytes. If no BOM is present reposition to the beginning, else position
to the first char
after the BOM.
I realize that Rexx can not handle wide characters and use of the UTF8
BOM is
discouraged, and at least on *ix systems can lead to problems with some
apps.
But the use of UTF8 is not forbidden. So when processing text files, it
seems to me
that a BOM should be checked for, even if it is ignored. Or a error
issued for an
unsupported encoding. For UTF8 I would ignore it and process the file as
ASCII.
James Johnson
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Oorexx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oorexx-devel