Re: Several BOMs in the same file

2003-03-26 Thread Jungshik Shin
Marco Cimarosti wrote: Kent Karlsson wrote: I'm not going into the implementation part; just pointing out that this issue is not something an operating system can ignore. "cat" and "cp" can and shall ignore it. They are octet-level file operations, attaching no semantics to the octets.

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
Kent Karlsson wrote: > > I'm not going into the implementation part; just pointing out that > > this issue is not something an operating system can ignore. > > "cat" and "cp" can and shall ignore it. They are octet-level > file operations, attaching no semantics to the octets. Try "iconv". This

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: > As a minimum, option "-v" must know the semantics of NL and > LF control codes, of the digits, and the of white space. Sorry, I meant: option "-n". _ Marco

RE: Several BOMs in the same file

2003-03-25 Thread Kent Karlsson
> In that case, removing the BOM that would end up somewhere in the > middle is the natural thing to do, just as removing the EOF marker > at the end of the first file is. There is no "EOF marker" at the end of a file. At least not in in modern file systems. There is no NULL, CTRL-Z, or CTRL-D

Re: Several BOMs in the same file

2003-03-25 Thread Doug Ewell
Kent Karlsson wrote: > To avoid "flag bloat", one can instead use the "iconv" command, > and apply that to the source files. Since "head" and "tail" assumes > an ASCII compatible singlebyte or multibyte encoding, where any > state is reset at LF, the target encoding for the iconv command > must,

Re: Several BOMs in the same file

2003-03-25 Thread Pim Blokland
Marco Cimarosti schreef: > > Is this in accordance with the Unicode standard, or do I have > > to remove the second BOM? > > IMHO, Unicode should not specify such a behavior. Deciding what a shell IMHO, it should. The guideline that says a text file can have a U+FEFF at the beginning, but it real

RE: Several BOMs in the same file

2003-03-25 Thread Kent Karlsson
> You command above would now expand to something like this: > > cat -R UTF-16 -F UTF-16LE file1 -F Big-5 file2 > file3 > > Provided with information about the input encodings and the > expected output > encoding, "cat" could now correctly handle BOM's, endianness, new-line > conventions,

RE: Several BOMs in the same file

2003-03-25 Thread Kent Karlsson
> Let's say that I have two files, namely file1 & file2, in any Unicode > encoding, both starting with a BOM, and I compile them into > one by using > > cat file1 file2 > file3 For POSIX implementations, this concatenates the octets (bytes) in the two files, whether either of them is text in U

RE: Several BOMs in the same file

2003-03-25 Thread Marco Cimarosti
Stefan Persson wrote: > Let's say that I have two files, namely file1 & file2, in any Unicode > encoding, both starting with a BOM, and I compile them into > one by using > > cat file1 file2 > file3 > > in Unix or > > copy file1 + file2 file3 > > in MS-DOS, file3 will have the following conte

RE: Several BOMs in the same file

2003-03-24 Thread Lars Kristan
Michael (michka) Kaplan wrote: > But if you do not, what is the harm of a character that you cannot see > and which does not even have width or cause line breaking behavior? > Realistically, what would the problem be? The fact that the 0xFEFF character will not affect the display does not mean tha

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
Chris Jacobs schreef: > > guidelines for the situation where two files are joined, and the > > second one has a BOM, but the first one hasn't. Should the resulting > > file have a BOM? > > In that case you should seriously consider the possibility that the byte > order for both files is different!

Re: Several BOMs in the same file

2003-03-23 Thread Chris Jacobs
- Original Message - From: "Pim Blokland" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Sunday, March 23, 2003 2:43 PM Subject: Re: Several BOMs in the same file [ ... ] > But now you've got me wondering whether there are any

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
Note on the COPY command: it seems some versions of Windows seem to be BOM-aware; at least Windows2000, when concatenating two text files, does remove the second's BOM. Pim Blokland

Re: Several BOMs in the same file

2003-03-23 Thread Michael \(michka\) Kaplan
: "Stefan Persson" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Sunday, March 23, 2003 4:14 AM Subject: Several BOMs in the same file > Hi! > > Let's say that I have two files, namely file1 & file2, in any Unicode > encoding, bot

Re: Several BOMs in the same file

2003-03-23 Thread Pim Blokland
> in MS-DOS, file3 will have the following contents: > > BOM > contents from file1 > BOM > contents from file2 > > Is this in accordance with the Unicode standard Nope. When concatenating two files (or any streams) of which the second one has a BOM, the second one should be deleted. However, there

Several BOMs in the same file

2003-03-23 Thread Stefan Persson
Hi! Let's say that I have two files, namely file1 & file2, in any Unicode encoding, both starting with a BOM, and I compile them into one by using cat file1 file2 > file3 in Unix or copy file1 + file2 file3 in MS-DOS, file3 will have the following contents: BOM contents from file1 BOM conten