Date: Wed, 2 Nov 2022 02:39:39 +0000 From: David Holland <dholland-t...@netbsd.org> Message-ID: <y2hyaxfvsydjf...@netbsd.org>
| (a) It follows from the observations so far that if I set ->tm_gmtoff | to 101 and ->tm_zone to "abracadabra" (as well as populating the rest | of the structure), and ask strftime to print those fields, it will not | print them but print something else. I continue to fail to see how | this is anything other than a bug. OK, I have looked into that now, and it turns out not to be a bug, but by design. The whole story wrt these two is a little complex... First, the basic definition of %Z (%z is essentially the same issue, just with different data, I will concentrate on %Z so I can give specifics): %Z Replaced by the timezone name or abbreviation, or by no bytes if no timezone information exists. [tm_isdst] That's from POSIX, but should be more or less identical to what the C standard says (I don't have a copy of, any version of, the C standard to verify that, but it is generally possible from POSIX to tell when something is intended to agree with C, and when things are being modified). That by itself is not all that illuminating. But the standard also says Local timezone information shall be set as though strftime() called tzset(). That's POSIX text, but I believe the C standard has a similar requirement, worded a different way (as it does not have tzset()). tzset() accesses "TZ" if set (or the system's default local timezone if not) and loads the data needed to access that timezone. So, from just this, it is clear (I believe) that %Z is intended to print the local timezone name or abbreviation, always - with the only variation being whether the standard or summer time name/abbr is printed (which is based upon tm_isdst). Note that the '[tm_isdst]' suffix on the definition of %Z says that that field is the only one from the struct tm passed in that strftime() will access to provide %Z information - and hence if only %Z were in the format string, tm_isdst would be the only field the application need set. At first glance that's why you're seeing EST/EDT (in an east coast US timezone) despite having created the struct tm with gmtime(). But this is where things start getting more interesting. POSIX also says: If a struct tm broken-down time structure is created or modified by gmtime() or gmtime_r(), it is unspecified whether the result of the %Z and %z conversion specifiers shall refer to UTC or the current local timezone, when strftime( ) is called with such a broken-down time structure. Note that this is a variation from the C standard - but it would seem to allow the behaviour that you want (at least for gmtime(), not for simply setting the tm_zone field to an arbitrary string then expecting %Z to print that string). Further, at first glance, that's what it looks like the libc/time/strftime.c code that we have should do: case 'Z': #ifdef TM_ZONE pt = _add(t->TM_ZONE, pt, ptlim); #elif HAVE_TZNAME (the _add() internal function just adds string data to the output buffer, nothing interesting about that one). 't' is the struct tm * passed to strftime(). and in "private.h" which is included by strftime.c, we have: /* NetBSD defaults */ #define TM_GMTOFF tm_gmtoff #define TM_ZONE tm_zone So, it would appear, at first glance anyway, that we should be adding t->tm_zone to the buffer when we see a %Z conversion. Yet that clearly is not happening. The reason why appears earlier in strftime.c, after private.h is included, but before there is any code: /* ** We don't use these extensions in strftime operation even when ** supported by the local tzcode configuration. A strictly ** conforming C application may leave them in undefined state. */ #ifdef _LIBC #undef TM_ZONE #undef TM_GMTOFF #endif That is, inside strftime.c TM_ZONE is not defined after all, and in the %Z code, we instead fall into that #elif (HAVE_TZNAME is defined, so there code there is executed). That code extracts the zone name from the current local timezone, as loaded by tzset(), and inserts that (standard or summer, depending upon tm_isdst, the one (and only) field of the struct tm that strftime() is permitted to access when converting %Z). That is what is happening is what the C standard requires, and which is clearly at least permitted by POSIX, if not required (there must be some other implementation of the *time* functions which alters tzname[] when gmtime() is called, I would guess - but that is pure speculation). Or to be more blunt, it must be possible to write this code (which I have not compiled, so is missing required #include, and could have other immaterial errors) int main(int argc, char **argv) { struct tm t; char buf[128]; t.tm_isdst = 0; (void) strftime(buf, sizeof buf, "%Z", &t); (void) printf("The local standard timezone name is: %s\n", buf); return 0; } The standards promise that will work, and applications are entitled to take advantage of that. strftime() must not cause the program to abort by accessing the uninitialised (random stack garbage) in the tm_zone field of the struct tm. This is really why you're getting EST when you use %Z on the results of gmtime (as gmtime() sets tm_isdst to 0, and EST is the abbreviation for standard time in the timezone you're using). (Similarly -0500 as the %z offset). This is not a bug -- it is doing both what it is designed, and what it is required by the standards, to do. That might not have been what you would have designed the functions to do, but that doesn't make it broken, just different from what you want, and so you need to use something different. I would note in concluding this issue, that our strftime.3 says in the STANDARDS section: STANDARDS The strftime() function conforms to ISO/IEC 9899:1999 (?ISO�C99?). (The '?' are fancy opening+closing double quotes, that my cut&paste using just ASCII cannot duplicate). Note that we do not claim that strftime() complies with any posix standard. [if you have a 9.99.x version that is recent enough, but not very recent, there might be noise in there about phases of the moon - that was some kind of merge error, just ignore that gibberish, the same line appears again later, where it belongs (or at least where it was put) in the BUGS section. When you ignore the interloper text, the text quoted here is still there] What that means is that we promise that we will do what the C standard (C99 version) requires (of course we can add extensions, but cannot break anything which is required to work). Finally, one concluding remark about all of this: campbell+netbsd-tech-userle...@mumble.net said: | It seems to me either we need a new API, or we risk breaking existing | programs. What's most amazing here, is that appears that no-one participating in this debate has even bothered to go look at our man page. If that was done (the man page for strftime() in this case, though you can do the same for mktime()) you would see: size_t strftime_z(const timezone_t tz, char * restrict buf, size_t maxsize, const char * restrict format, const struct tm * restrict timeptr); This is an addition to both C and POSIX, neither have this, even the tzcode() reference implementation doesn't (though it does have mktime_z() I think .. not certain about that one). That's a variation of strftime() where you can tell it which time zone you want to use, instead of local time, for the conversions, so if you were to do z = tzalloc("UTC"); t = gmtime(&some_time_t_variable); strftime_z(z, buf, sizeof buf, "whatever .. including %s %z and %Z", t); then you'd get %s/%z/%Z values as specified by the UTC timezone, instead of current local time. (Don't forget to tzfree(z) somewhere - or just exit()). That is, we don't need a new API, we already have the new API, just none of the participants here seem to have bothered to notice it. We also have strftime_l (which is a POSIX, but not C, function, but added in a later version of POSIX than that which we claim to support, still we seem to support it anyway), which allows the locale to be used to be specified (rather than just using the default) and even strftime_lz() (which allows both, and is neither POSIX nor C). [Aside: a truly bizarre factoid - when we do eventually claim, in unistd.h, that we support Posix.7 (ie: 2008) rather than Posix.6 (2001) the support we have for strftime_l() will actually vanish - the function will still be there, but will simply be a clone of strftime(), ignoring the passed in locale. That wacky logic comes from tzcode, as many users of it have no locale handling at all, so our implementation will need local fixing sometime]. All the real code is in strftime_lz() the others just call it with the appropriate locale and timezone parameters. We do not, however, seem to have documentation for strftime_l() or strftime_lz(), and that probably should get fixed (by the proverbial someone). kre