Re: Is the binaryness/textness of a data format a property?

2020-03-20 Thread J Decker via Unicode
On Fri, Mar 20, 2020 at 5:48 AM Adam Borowski via Unicode < unicode@unicode.org> wrote: > On Fri, Mar 20, 2020 at 12:21:26PM +, Costello, Roger L. via Unicode > wrote: > > [Definition] Property: an attribute, quality, or characteristic of > something. > > > > JPEG is a binary data format. > >

Re: Unicode "no-op" Character?

2019-06-24 Thread J Decker via Unicode
On Mon, Jun 24, 2019 at 5:35 PM David Starner via Unicode < unicode@unicode.org> wrote: > On Sun, Jun 23, 2019 at 10:41 PM Shawn Steele via Unicode > wrote: > > IMO, since it's unlikely that anyone expects > that they can transmit a NUL through an arbitrary channel, unlike a > random private use

Re: Unicode "no-op" Character?

2019-06-22 Thread J Decker via Unicode
On Sat, Jun 22, 2019 at 2:04 PM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > I see there is no such character, which I pretty much expected after > Google didn’t help. > > > > The original problem I had was solved long ago but the recent article > about watermarking reminded me of

Re: Unicode "no-op" Character?

2019-06-21 Thread J Decker via Unicode
Sounds like a great use for ZWNBSP (zero width non-breaking space) 0xFEFF (Also used as BOM) or that doesn't break; maybe 'ZERO WIDTH SPACE' (U+200B) On Fri, Jun 21, 2019 at 9:48 PM Sławomir Osipiuk via Unicode < unicode@unicode.org> wrote: > Does Unicode include a character that does nothing

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread J Decker via Unicode
On Fri, Oct 12, 2018 at 9:23 AM Doug Ewell via Unicode wrote: > J Decker wrote: > > >> How about the opposite direction: If m is base64 encoded to yield t > >> and then t is base64 decoded to yield n, will it always be the case > >> that m equals n? > > > > False. > > Canonical translation may

Re: Base64 encoding applied to different unicode texts always yields different base64 texts ... true or false?

2018-10-12 Thread J Decker via Unicode
On Fri, Oct 12, 2018 at 3:57 AM Costello, Roger L. via Unicode < unicode@unicode.org> wrote: > Hi Unicode Experts, > > Suppose base64 encoding is applied to m to yield base64 text t. > > Next, suppose base64 encoding is applied to m' to yield base64 text t'. > > If m is not equal to m', then t

Re: Unicode String Models

2018-09-11 Thread J Decker via Unicode
On Tue, Sep 11, 2018 at 3:15 PM Hans Åberg via Unicode wrote: > > > On 11 Sep 2018, at 23:48, Richard Wordingham via Unicode < > unicode@unicode.org> wrote: > > > > On Tue, 11 Sep 2018 21:10:03 +0200 > > Hans Åberg via Unicode wrote: > > > >> Indeed, before UTF-8, in the 1990s, I recall some

Re: UCD in XML or in CSV? (is: UCD in YAML)

2018-09-07 Thread J Decker via Unicode
On Fri, Sep 7, 2018 at 10:58 AM Philippe Verdy via Unicode < unicode@unicode.org> wrote: > > > Le jeu. 6 sept. 2018 à 19:11, Doug Ewell via Unicode > a écrit : > >> Marcel Schneider wrote: >> >> > BTW what I conjectured about the role of line breaks is true for CSV >> > too, and any file

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread J Decker via Unicode
On Mon, Apr 2, 2018 at 5:42 PM, Mark E. Shoulson via Unicode < unicode@unicode.org> wrote: > For unique identifiers for every person, place, thing, etc, consider > https://en.wikipedia.org/wiki/Universally_unique_identifier which are > indeed 128 bits. > > What makes you think a single "glyph"

Re: Fwd: RFC 8369 on Internationalizing IPv6 Using 128-Bit Unicode

2018-04-02 Thread J Decker via Unicode
I was really hoping this was a joke... it didn't hit me it was April 1... https://en.wikipedia.org/wiki/Plane_(Unicode) PlaneAllocated code points[note 1] Assigned characters[note 2]

Re: Interesting UTF-8 decoder

2017-10-09 Thread J Decker via Unicode
that's interesting; however it will segfault if the string ends on a memory allocation boundary. will have to make sure strings are always allocated with 3 extra bytes. 2017-10-09 1:37 GMT-07:00 Martin J. Dürst via Unicode : > A friend of mine sent me a pointer to >

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread J Decker via Unicode
On Mon, Jul 24, 2017 at 1:50 PM, Philippe Verdy <verd...@wanadoo.fr> wrote: > 2017-07-24 21:12 GMT+02:00 J Decker via Unicode <unicode@unicode.org>: > >> >> >> If you don't have that last position in a variable, just use 3 tests but > NO loop at all: if all

Re: Split a UTF-8 multi-octet sequence such that it cannot be unambiguously restored?

2017-07-24 Thread J Decker via Unicode
On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < unicode@unicode.org> wrote: > Hi Folks, > > 2. (Bug) The sending application performs the folding process - inserts > CRLF plus white space characters - and the receiving application does the > unfolding process but doesn't

Database missing/erroneous information

2017-07-12 Thread J Decker via Unicode
I started looking more deeply at the javascript specification. Identifiers are defined as starting with characters with ID_Start and continued with ID_Continue attributes. I grabbed the xml database (ucd.all.grouped.xml ) in which I was able to find IDS, IDC flags ( also OIDS,OIDC, XIDS,XIDC of

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

2017-05-16 Thread J Decker via Unicode
On Mon, May 15, 2017 at 11:50 PM, Henri Sivonen via Unicode < unicode@unicode.org> wrote: > On Tue, May 16, 2017 at 1:16 AM, Shawn Steele via Unicode > wrote: > > I’m not sure how the discussion of “which is better” relates to the > > discussion of ill-formed UTF-8 at all. >