Re: [Oorexx-devel] UTF8 BOM

Mike Cowlishaw Fri, 17 Dec 2010 23:13:01 -0800

> >> I have a Rexx program that merges several small files onto 
> one large 
> >> one. As it turned out a few of the small files were 
> prefixed with a 
> >> UTF8 BOM, |0xEFBBBF|. Should the BOM have been recognized and 
> >> discarded?
> > How could Rexx (or any other processor) decide that some particular 
> > prefix/content/suffix of a file is worthless and should be 
> discarded?
> >
> > ("darn it, this file ends in 'ILY'; delete that!").
> It would handle it as any other text processor. Open the 
> file, read the first three or four bytes. If no BOM is 
> present reposition to the beginning, else position to the 
> first char after the BOM.
> 
> I realize that Rexx can not handle wide characters and use of 
> the UTF8 BOM is discouraged, and at least on *ix systems can 
> lead to problems with some apps.
> But the use of UTF8 is not forbidden. So when processing text 
> files, it seems to me that a BOM should be checked for, even 
> if it is ignored. Or a error issued for an unsupported 
> encoding. For UTF8 I would ignore it and process the file as ASCII.


I wasn't clear -- sorry.  I meant: what if the program *wants* to be able to
read the BOM and then process the file appropriately?  Such a program would be
broken if the standard file read discarded rge BOM.

Perhaps what you need is a wrapper function/class that will do exactly that
('readUTF8' ... etc.).

Mike


------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Oorexx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Re: [Oorexx-devel] UTF8 BOM

Reply via email to