> >> I have a Rexx program that merges several small files onto
> one large
> >> one. As it turned out a few of the small files were
> prefixed with a
> >> UTF8 BOM, |0xEFBBBF|. Should the BOM have been recognized and
> >> discarded?
> > How could Rexx (or any other processor) decide that some particular
> > prefix/content/suffix of a file is worthless and should be
> discarded?
> >
> > ("darn it, this file ends in 'ILY'; delete that!").
> It would handle it as any other text processor. Open the
> file, read the first three or four bytes. If no BOM is
> present reposition to the beginning, else position to the
> first char after the BOM.
>
> I realize that Rexx can not handle wide characters and use of
> the UTF8 BOM is discouraged, and at least on *ix systems can
> lead to problems with some apps.
> But the use of UTF8 is not forbidden. So when processing text
> files, it seems to me that a BOM should be checked for, even
> if it is ignored. Or a error issued for an unsupported
> encoding. For UTF8 I would ignore it and process the file as ASCII.
I wasn't clear -- sorry. I meant: what if the program *wants* to be able to
read the BOM and then process the file appropriately? Such a program would be
broken if the standard file read discarded rge BOM.
Perhaps what you need is a wrapper function/class that will do exactly that
('readUTF8' ... etc.).
Mike
------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Oorexx-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oorexx-devel