Marco Cimarosti wrote:
Kent Karlsson wrote:
I'm not going into the implementation part; just pointing out that
this issue is not something an operating system can ignore.
"cat" and "cp" can and shall ignore it. They are octet-level
file operations, attaching no semantics to the octets.
Kent Karlsson wrote:
> > I'm not going into the implementation part; just pointing out that
> > this issue is not something an operating system can ignore.
>
> "cat" and "cp" can and shall ignore it. They are octet-level
> file operations, attaching no semantics to the octets. Try "iconv".
This
I (Marco Cimarosti) wrote:
> As a minimum, option "-v" must know the semantics of NL and
> LF control codes, of the digits, and the of white space.
Sorry, I meant: option "-n".
_ Marco
> In that case, removing the BOM that would end up somewhere in the
> middle is the natural thing to do, just as removing the EOF marker
> at the end of the first file is.
There is no "EOF marker" at the end of a file. At least not in
in modern file systems. There is no NULL, CTRL-Z, or CTRL-D
Kent Karlsson wrote:
> To avoid "flag bloat", one can instead use the "iconv" command,
> and apply that to the source files. Since "head" and "tail" assumes
> an ASCII compatible singlebyte or multibyte encoding, where any
> state is reset at LF, the target encoding for the iconv command
> must,
Marco Cimarosti schreef:
> > Is this in accordance with the Unicode standard, or do I have
> > to remove the second BOM?
>
> IMHO, Unicode should not specify such a behavior. Deciding what a
shell
IMHO, it should. The guideline that says a text file can have a
U+FEFF at the beginning, but it real
> You command above would now expand to something like this:
>
> cat -R UTF-16 -F UTF-16LE file1 -F Big-5 file2 > file3
>
> Provided with information about the input encodings and the
> expected output
> encoding, "cat" could now correctly handle BOM's, endianness, new-line
> conventions,
> Let's say that I have two files, namely file1 & file2, in any Unicode
> encoding, both starting with a BOM, and I compile them into
> one by using
>
> cat file1 file2 > file3
For POSIX implementations, this concatenates the octets (bytes)
in the two files, whether either of them is text in U
Stefan Persson wrote:
> Let's say that I have two files, namely file1 & file2, in any Unicode
> encoding, both starting with a BOM, and I compile them into
> one by using
>
> cat file1 file2 > file3
>
> in Unix or
>
> copy file1 + file2 file3
>
> in MS-DOS, file3 will have the following conte
Michael (michka) Kaplan wrote:
> But if you do not, what is the harm of a character that you cannot see
> and which does not even have width or cause line breaking behavior?
> Realistically, what would the problem be?
The fact that the 0xFEFF character will not affect the display does not mean
tha
Chris Jacobs schreef:
> > guidelines for the situation where two files are joined, and the
> > second one has a BOM, but the first one hasn't. Should the
resulting
> > file have a BOM?
>
> In that case you should seriously consider the possibility that
the byte
> order for both files is different!
- Original Message -
From: "Pim Blokland" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, March 23, 2003 2:43 PM
Subject: Re: Several BOMs in the same file
[ ... ]
> But now you've got me wondering whether there are any
Note on the COPY command: it seems some versions of Windows seem to
be BOM-aware; at least Windows2000, when concatenating two text
files, does remove the second's BOM.
Pim Blokland
: "Stefan Persson" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Sunday, March 23, 2003 4:14 AM
Subject: Several BOMs in the same file
> Hi!
>
> Let's say that I have two files, namely file1 & file2, in any
Unicode
> encoding, bot
> in MS-DOS, file3 will have the following contents:
>
> BOM
> contents from file1
> BOM
> contents from file2
>
> Is this in accordance with the Unicode standard
Nope. When concatenating two files (or any streams) of which the
second one has a BOM, the second one should be deleted.
However, there
Hi!
Let's say that I have two files, namely file1 & file2, in any Unicode
encoding, both starting with a BOM, and I compile them into one by using
cat file1 file2 > file3
in Unix or
copy file1 + file2 file3
in MS-DOS, file3 will have the following contents:
BOM
contents from file1
BOM
conten
16 matches
Mail list logo