Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)

Philippe Verdy Mon, 15 Nov 2004 10:01:31 -0800

From: "Doug Ewell" <[EMAIL PROTECTED]>

John Cowan <jcowan at reutershealth dot com> wrote:

A 32-bit length count, followed by an array of N arbitrary Unicode
characters, would probably be the best implementation today.
Which is essentially what the Java String class has, if you unwrap it.


Then why do the DataInput and DataOutput interfaces perform this special
conversion?  There isn't any mention, on the page whose URL Theodore
originally provided, of compatibility with C strings.  If a Java String
consists of a count followed by the data, why would "embedded nulls" in
the data make any difference?

Needed for the class loader, to load the string constants pool within compiled classes.

Needed in the JNI interface to C, which has a legacy 8-bit strings interface inherited from old versions of Java (this interface lacks a separate string-length indicator and uses null-terminated strings).

But not needed with the newer JNI interfaces for C where strings are arrays of 16bit "char" code units, with a separate explicit 32-bit string length indicator (no need to escape nulls).

Not needed and not used for file or stream I/O, where *true* UTF-8 is supported by the "UTF-8"-named Charset instance (which fully complies with Unicode definition of UTF-8).

Re: U+0000 in C strings (was: Re: Opinions on this Java URL?)

Reply via email to