James Johnson wrote:
> On 12/17/2010 3:25 PM, Mike Cowlishaw wrote:
> 
>>>I have a Rexx program that merges several small files onto
>>>one large one. As it turned out a few of the small files were
>>>prefixed with a UTF8 BOM, |0xEFBBBF|. Should the BOM have
>>>been recognized and discarded?
>>
>>How could Rexx (or any other processor) decide that some particular
>>prefix/content/suffix of a file is worthless and should be discarded?
>>
>>("darn it, this file ends in 'ILY'; delete that!").
> 
> It would handle it as any other text processor. Open the file, read the 
> first three or four
> bytes. If no BOM is present reposition to the beginning, else position 
> to the first char
> after the BOM.
> 
> I realize that Rexx can not handle wide characters and use of the UTF8 
> BOM is
> discouraged, and at least on *ix systems can lead to problems with some 
> apps.
> But the use of UTF8 is not forbidden. So when processing text files, it 
> seems to me
> that a BOM should be checked for, even if it is ignored. Or a error 
> issued for an
> unsupported encoding. For UTF8 I would ignore it and process the file as 
> ASCII.
> 
> 
> James Johnson
> 
> 
> 
> ------------------------------------------------------------------------------
> Lotusphere 2011
> Register now for Lotusphere 2011 and learn how
> to connect the dots, take your collaborative environment
> to the next level, and enter the era of Social Business.
> http://p.sf.net/sfu/lotusphere-d2d
> _______________________________________________
> Oorexx-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
> 
James,

Here is a bit of code that will "transparently" handle files w/ and w/o 
a UTF8 BOM.  Include the following Class and Method in your program and 
then, instead of using

infile = .stream~new(<file name here>)

use

infile = .instrm.UTF8?~new(<file name here>)

to create the stream object.  Any leading BOM in the file will be 
disregarded when you "linein" the file.

-- code follows
::class instrm.UTF8? subclass stream

::method init
     self~init:super(arg(1))
     BOM? = self~charin( ,3)
     if BOM? <> 'EFBBBF'x then
         self~seek('1 read')

-- end of code
-- 
Gil Barmwater

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Oorexx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to