[M.-A. Lemburg] >>> Could you please make this fix apply only on Solaris, >>> e.g. using an #ifdef ?!
[Martin v. Löwis] >> That shouldn't be done. The code, as it was before, had >> undefined behaviour in C. With the fix, it is now correct. [Marc-Andre] > I don't understand - what's undefined in: > > const char *s; > Py_UNICODE *p; > ... > *p = *(Py_UNICODE *)s; The pointer cast: A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Since Py_UNICODE has a stricter alignment requirement than char, there's no guarantee that _the content_ of p is correctly aligned for Py_UNICODE after the cast. Indeed, that's why the code segfaulted on the Solaris box. On other architectures it may not segfault but "just" take much longer for the HW and SW to hide improperly aligned access. >> If you want to drop usage of memcpy on systems where you >> think it isn't needed, you should make a positive list of >> such systems, e.g. through an autoconf test (although such >> a test is difficult to formulate). > I don't want to drop memcpy() - just keep the existing > working code on platforms where the memcpy() is not > needed. There's no clear way I know of to guess which platforms that may be. Is it possible to fiddle _PyUnicode_DecodeUnicodeInternal's _callers_ so that the char* `s` argument passed to it is always properly aligned for Py_UNICODE? Then the pointer cast would be fine. > ... > A modern compiler should know the alignment requirements > of Py_UNICODE* on the platform and generate appropriate > code. The trend in modern compilers and architectures is to be less forgiving of standard violations, not more. > AFAICTL, only 64-bit platforms are subject to any > such problems due to their requirement to have pointers > aligned on 8-byte boundaries. It's not the alignment of the pointer but of what the pointer points _at_ that's at issue here. While the effect of the pointer cast is undefined, it's not the pointer cast that blows up. It's dereferencing the _result_ of the pointer cast that blows up: it was trying to read up a Py_UNICODE from an address that wasn't properly aligned for Py_UNICODE. That can blow up (or be very slow, or return gibberish -- it's undefined) even if Py_UNICODE has an alignment requirement of "just" 2 (which I expect was actually the case on the Solaris box). _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com