James Johnson wrote:
> On 12/17/2010 3:25 PM, Mike Cowlishaw wrote:
>
>>>I have a Rexx program that merges several small files onto
>>>one large one. As it turned out a few of the small files were
>>>prefixed with a UTF8 BOM, |0xEFBBBF|. Should the BOM have
>>>been recognized and discarded?
>>
>>How could Rexx (or any other processor) decide that some particular
>>prefix/content/suffix of a file is worthless and should be discarded?
>>
>>("darn it, this file ends in 'ILY'; delete that!").
>
> It would handle it as any other text processor. Open the file, read the
> first three or four
> bytes. If no BOM is present reposition to the beginning, else position
> to the first char
> after the BOM.
>
> I realize that Rexx can not handle wide characters and use of the UTF8
> BOM is
> discouraged, and at least on *ix systems can lead to problems with some
> apps.
> But the use of UTF8 is not forbidden. So when processing text files, it
> seems to me
> that a BOM should be checked for, even if it is ignored. Or a error
> issued for an
> unsupported encoding. For UTF8 I would ignore it and process the file as
> ASCII.
>
>
> James Johnson
>
>
>
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Oorexx-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>
James,
Here is a bit of code that will "transparently" handle files w/ and w/o
a UTF8 BOM. Include the following Class and Method in your program and
then, instead of using
infile = .stream~new(<file name here>)
use
infile = .instrm.UTF8?~new(<file name here>)
to create the stream object. Any leading BOM in the file will be
disregarded when you "linein" the file.
-- code follows
::class instrm.UTF8? subclass stream
::method init
self~init:super(arg(1))
BOM? = self~charin( ,3)
if BOM? <> 'EFBBBF'x then
self~seek('1 read')
-- end of code
--
Gil Barmwater
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Oorexx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oorexx-devel