bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-08 Thread Ludovic Courtès
Hi, Rob Browning skribis: > Oh, and unless I'm missing something, I remembered why we may need to > keep the standalone C test program -- there's no straightforward way to > call scm_from_utf8_symbol() from scheme? Ah yes, you’re probably right! Ludo’.

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-07 Thread Rob Browning
Rob Browning writes: > OK, so unfortunately I don't actually recall how I came up with that > number, but I can start over with some canonical approach to compute the > value if we like. I hacked up hash.c to let me call wide_string_hash() directly and printed the hash for wchar_t {0x3A0,

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-07 Thread Ludovic Courtès
Rob Browning skribis: > So this change *could* alter results, but only for non-ASCII strings, > and those results would have been wrong (i.e. relying on uninitialized > memory). OK, that was my understanding too. > That leaves the size_t -> long change in scm_i_str2symbol(), and I don't >

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-07 Thread Ludovic Courtès
Rob Browning skribis: >> Is this a documented example of Jenkins? Or did you use a reference >> implementation? > > Jenkins? That’s the name of the hash function in question. If not, where did you get that example from? :-) Thanks, Ludo’.

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-06 Thread Rob Browning
Ludovic Courtès writes: > Rob Browning skribis: >> + // Make sure a utf-8 symbol has the expected hash. In addition to >> + // catching algorithmic regressions, this would have caught a >> + // long-standing buffer overflow. >> + >> + // περί >> + char about_u8[] = {0xce, 0xa0, 0xce,

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-06 Thread Rob Browning
Rob Browning writes: > Jenkins? Oh, right (after looking back at the code). I'll get back to you regarding this and the other questions after I finish reviewing/remembering. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-06 Thread Rob Browning
Ludovic Courtès writes: > For the final patch please add a ChangeLog-style entry. Will do. > Is this a documented example of Jenkins? Or did you use a reference > implementation? Jenkins? > Yes, it may be nicer to have it in ‘test-suite/tests/hash.test’. > > AFAICS this will only change the

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-11-05 Thread Ludovic Courtès
Hi, Rob Browning skribis: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-07-05 Thread Rob Browning
Rob Browning writes: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars

bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes

2022-07-05 Thread Rob Browning
Noticed while investigating a migration to utf-8 strings. After making changes that routed non-ascii symbol hashing through this function, encoding-iso88597.test began intermittently failing because it would traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. Change the