Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm

2023-02-23 Thread Jonathan Wakely via Gcc-patches
On Thu, 23 Feb 2023 at 17:55, Alexandre Oliva  wrote:
>
> On Feb 22, 2023, Alexandre Oliva  wrote:
>
> >> Just curious, why doesn't the pmf hack work on arm-vxworks7?
>
> > At first, I thought we were running into this just because we have to
> > define __clang__ because of some vxworks system headers aimed at clang.
> > But even as I tried to drop the #ifndef, the test still failed; I
> > suspected it had to do with ARM's encoding of ptrmemfunc_vbit_in_delta,
> > but I did not confirm that this was the case.
>
> It was much simpler than that: patching locale_facets_nonio.tcc did not
> affect the code generated for the tests, even though the modified
> definition was present in the preprocessed version, and the patch didn't
> cause locale-inst.o to be rebuilt.
>
>
> With a build from scratch, the following patchlet is enough for time_get
> tests to pass for us, and I assume we'll have to keep on carrying such
> local changes, but I wonder if it would make sense to submit a patch to
> adjust all preprocessor tests for __clang__ in libstdc++ to also test
> for __clang_major__.
>
> --- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
> +++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
> @@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
>ctype<_CharT> const& __ctype = use_facet >(__loc);
>__err = ios_base::goodbit;
>bool __use_state = false;
> -#if __GNUC__ >= 5 && !defined(__clang__)
> +#if __GNUC__ >= 5 && !(defined(__clang__) && defined(__clang_major__))
>  #pragma GCC diagnostic push
>  #pragma GCC diagnostic ignored "-Wpmf-conversions"
>// Nasty hack.  The C++ standard mandates that get invokes the do_get

Yeah, we can do that ... but it would be annoying.

We can't rely on __GNUC__ because other compilers pretend to be GCC
(and clang now allows you to fake any value of __GNUC__ with the
-fgcc-version flag), and we can't use __clang__, because other
compilers now pretend to be clang ... where does it end?



Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm

2023-02-23 Thread Alexandre Oliva via Gcc-patches
On Feb 22, 2023, Alexandre Oliva  wrote:

>> Just curious, why doesn't the pmf hack work on arm-vxworks7?

> At first, I thought we were running into this just because we have to
> define __clang__ because of some vxworks system headers aimed at clang.
> But even as I tried to drop the #ifndef, the test still failed; I
> suspected it had to do with ARM's encoding of ptrmemfunc_vbit_in_delta,
> but I did not confirm that this was the case.

It was much simpler than that: patching locale_facets_nonio.tcc did not
affect the code generated for the tests, even though the modified
definition was present in the preprocessed version, and the patch didn't
cause locale-inst.o to be rebuilt.


With a build from scratch, the following patchlet is enough for time_get
tests to pass for us, and I assume we'll have to keep on carrying such
local changes, but I wonder if it would make sense to submit a patch to
adjust all preprocessor tests for __clang__ in libstdc++ to also test
for __clang_major__.

--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -1465,7 +1465,7 @@ _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
   ctype<_CharT> const& __ctype = use_facet >(__loc);
   __err = ios_base::goodbit;
   bool __use_state = false;
-#if __GNUC__ >= 5 && !defined(__clang__)
+#if __GNUC__ >= 5 && !(defined(__clang__) && defined(__clang_major__))
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Wpmf-conversions"
   // Nasty hack.  The C++ standard mandates that get invokes the do_get


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm

2023-02-22 Thread Alexandre Oliva via Gcc-patches
On Feb 17, 2023, Jakub Jelinek  wrote:

> My worry is that people often invoke the time_get APIs on uninitialized
> struct tm.

Yeah.  I thought you meant get(), but it looks like you meant do_get()
as well.  I seem to have overread the permissions to overwrite tm
members, to update its current contents, and to write unspecified values
in its members.

When I started down this path, I thought we might be able to hold the
state bits only in fields that were to be updated, but the century field
and allowing for 16-bit ints made that a bit challenging.


So, back to the drawing board, the other possibility that occurs to me
is to use a thread-local state-keeper chained stack, set up by get, and
that enabled state to be recovered and updated by do_get if relevant
incoming arguments match those recorded at the top of the stack (e.g.,
at least that we're looking at the same tm object).


Another approach is to encode state in tm in-place and progressively,
and compute derived fields when (i) we've just set one of the input
fields, and (ii) any other input fields are in range.  E.g., if we parse
%p, we make sure tm_hour will be in the 00..11 range for am or 12..23
for pm, using the value previously-held in tm_hour, modulo 12, only if
it was in the 00..23 range.  If it wasn't in range, we record 00 or 12.
When we parse %I, if we find that tm_hour is in the 12..23 range, we
take that as meaning a "pm" was parsed for %p before, and add 12 to the
parsed value.  For century, we'd encode it immediately in tm_year
(preserving the previous modulo-100 value), and when parsing %Y we'd use
the century data, both at the risk of using garbage in case of
uninitialized data.  That appears to be in line with the
standard-specified behavior.  Maybe this is what the specification
expects implementations to do?

> For the encoding in get, the questions are:
> 1) if it is ok if we clear some normally unused bits of certain tm_*
>fields even if they wouldn't be otherwise touched; perhaps nothing
>will care, but it is still a user visible change

Rereading the relevant passages in standards and drafts, ISTM that this
unconditional zeroing would be unexpected and not permitted by the
standard, indeed.

> 2) when those tm_* fields we want to encode the state into are unitialized,
>don't we effectively invoke UB by using the uninitialized member
>(admittedly we just overwrite some bits in it and leave others as is,
>but won't that e.g. cause -Wuninitialized warnings)?

ISTM that https://eel.is/c++draft/locale.time.get#virtuals-15 allows
do_get to read values of members, and even to expect them to have been
zero-initialized.  That, and the allowance for unspecified values to be
stored in tm in case of error, seem to me to hint at using out-of-range
values of struct tm.

However, at the end of do_get, we *should* have specified values in
place in the just-read member, and get rules out modifying other fields.
Could get() build a temporary, automatic tm object, zero-initialize it,
set a bit in each field that indicates it is not set, use it throughout
multiple do_get calls to hold parsed values and internal state, and then
copy to the user-supplied tm object only fields that are marked as set
(the bit is clear) and in range?

> More importantly, in the do_get the questions are:
> 3) while do_get is protected, are we guaranteed that it will never be called
>except from the public members of time_get?

That's IMHO the main element of risk in using bits from unrelated fields
to hold state.  Storing the expected values only implies encoding state
in tm where it would ultimately end up (i.e., am/pm in tm_hour / 12,
century in tm_year / 100 + 19), using a zero (or an equal-to-sign) bit
to flag "set", and leaving it for get() to compute derived fields (if
that's even standard-compliant; it's not clear that it is, desirable as
it seems)

> 4) I think we even shouldn't __state._M_finalize_state(__tm); in do_get
>if it is nested, then you wouldn't need to tweak 
> src/c++98/locale_facets.cc;
>generally state should be finalized once everything is parsed together,
>not many times in between (of course if user code will parse stuff
>separately that will not work, say get with "%I" in one call, then get
>with "%h" in another one etc., but that is user's decision).

Parsing "%h%p%I%p%h" should get into tm_hour the value that time_put
read to format with the same format string, so there's an argument for
"finalizing" tm_hour based on more than the state fields we hold atm,
perhaps even for progressive in-place encoding, as I have suggested
above.

> Just curious, why doesn't the pmf hack work on arm-vxworks7?

At first, I thought we were running into this just because we have to
define __clang__ because of some vxworks system headers aimed at clang.
But even as I tried to drop the #ifndef, the test still failed; I
suspected it had to do with ARM's encoding of ptrmemfunc_vbit_in_delta,
but I did not confirm that 

Re: [PATCH] [PR77760] [libstdc++] encode __time_get_state in tm

2023-02-17 Thread Jakub Jelinek via Gcc-patches
On Fri, Feb 17, 2023 at 04:47:45AM -0300, Alexandre Oliva wrote:
> 
> On platforms that fail the ptrtomemfn-cast-to-pfn hack, such as
> arm-*-vxworks*, time_get fails with %I and %p because the state is not
> preserved across do_get calls.
> 
> This patch introduces an alternate hack, that encodes the state in
> unused bits of struct tm before calling do_get, extracts them in
> do_get, does the processing, and encodes it back, so that get extracts
> it.
> 
> The finalizer is adjusted for idempotence, because both do_get and get
> may call it.

As I said in the PR, I'm worried about this approach, but Jonathan
is the expert on the standard...
My worry is that people often invoke the time_get APIs on uninitialized
struct tm.
https://eel.is/c++draft/locale.time.get#general-1 says that corresponding
members of the struct tm are set, in the past we used to set mostly just
a single tm_* for ultimate format specifiers, with the addition of state
we try to set some further ones as well in the spirit of what strptime
does in glibc.  But still, say if the format specifiers only mention
hours/minutes/seconds, we don't try to update year/day/month related stuff
and vice versa.  Or say minute format specifier will only update tm_min
and nothing else.
For the encoding in get, the questions are:
1) if it is ok if we clear some normally unused bits of certain tm_*
   fields even if they wouldn't be otherwise touched; perhaps nothing
   will care, but it is still a user visible change
2) when those tm_* fields we want to encode the state into are unitialized,
   don't we effectively invoke UB by using the uninitialized member
   (admittedly we just overwrite some bits in it and leave others as is,
   but won't that e.g. cause -Wuninitialized warnings)?
More importantly, in the do_get the questions are:
3) while do_get is protected, are we guaranteed that it will never be called
   except from the public members of time_get?  I mean, if we are called
   from the libstdc++ time_get::get, then there is a state and it is already
   encoded in struct tm, so those bits are initialized and all is fine;
   but if it is called from elsewhere (either a class derived from
   time_get which just calls do_get alone on uninitialized struct tm,
   or say get in a derived class which calls do_get but doesn't encode the
   state, or even mixing of older GCC compiled get with newer GCC compiled
   do_get - get can be inlined into user code, while do_get is called
   through vtable and so could come up from another shared library), those
   struct tm bits could be effectively random and we'd finalize the state
   on the random stuff
4) I think we even shouldn't __state._M_finalize_state(__tm); in do_get
   if it is nested, then you wouldn't need to tweak src/c++98/locale_facets.cc;
   generally state should be finalized once everything is parsed together,
   not many times in between (of course if user code will parse stuff
   separately that will not work, say get with "%I" in one call, then get
   with "%h" in another one etc., but that is user's decision).  That
   would be another ABI issue, if the _M_finalize_state is called in
   do_get and the containing get calls it as well, then depending on which
   _M_finalize_state is used at runtime it would either cope well with the
   double finalization or not; now, if users never called time_get stuff
   on uninitialized struct tm, we could just add another bit to the state,
   whether the state has been actually encoded there, and call
   _M_finalize_state in do_get only if it hasn't been.

Just curious, why doesn't the pmf hack work on arm-vxworks7?
Sure, clang++ doesn't support it and if one derives a class from time_get
and overrides do_get and/or get things will not work properly either.

Jakub



[PATCH] [PR77760] [libstdc++] encode __time_get_state in tm

2023-02-16 Thread Alexandre Oliva via Gcc-patches


On platforms that fail the ptrtomemfn-cast-to-pfn hack, such as
arm-*-vxworks*, time_get fails with %I and %p because the state is not
preserved across do_get calls.

This patch introduces an alternate hack, that encodes the state in
unused bits of struct tm before calling do_get, extracts them in
do_get, does the processing, and encodes it back, so that get extracts
it.

The finalizer is adjusted for idempotence, because both do_get and get
may call it.

Regstrapped on x86_64-linux-gnu.
Tested on arm-vxworks7 (gcc-12) and arm-eabi (trunk).  Ok to install?

for  libstdc++-v3/ChangeLog

PR libstdc++/77760
* include/bits/locale_facets_nonio.h (__time_get_state): Add
_M_state_tm, _M_save_to and _M_restore_from.
* include/bits/locale_facets_nonio.tcc (time_get::get): Drop
do_get-overriding hack.  Use state unconditionally, and encode
it in tm around do_get.
(time_get::do_get): Extract state from tm, and encode it back,
around parsing and finalizing.
* src/c++98/locale_facets.cc
(__time_get_state::_M_finalize_state): Make tm_hour and
tm_year idempotent.
---
 libstdc++-v3/include/bits/locale_facets_nonio.h   |   80 +
 libstdc++-v3/include/bits/locale_facets_nonio.tcc |   43 ++-
 libstdc++-v3/src/c++98/locale_facets.cc   |8 ++
 3 files changed, 93 insertions(+), 38 deletions(-)

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.h 
b/libstdc++-v3/include/bits/locale_facets_nonio.h
index 372cf0429501d..711bede158427 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.h
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.h
@@ -361,6 +361,86 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 void
 _M_finalize_state(tm* __tm);
 
+  private:
+void
+_M_state_tm(tm* __tm, bool __totm)
+{
+  // Check we don't invade the in-range tm bits, even if int is
+  // 16-bits wide.
+#define _M_min_shift_tm_sec 6
+#define _M_min_shift_tm_min 6
+#define _M_min_shift_tm_hour 5
+#define _M_min_shift_tm_mday 5
+#define _M_min_shift_tm_mon 4
+#define _M_min_shift_tm_year 16 // 14, but signed, so avoid it.
+#define _M_min_shift_tm_wday 3
+#define _M_min_shift_tm_yday 9
+#define _M_min_shift_tm_isdst 1
+  // Represent __STF in __WDT bits of __TMF up to the __MSB bit.
+  // In __MSB, 0 stands for the most significant bit of __TMF,
+  // 1 the bit next to it, and so on.
+#define _M_time_get_state_bitfield_inout(__tmf, __msb, __wdt, __stf)   \
+  do   \
+  {\
+const unsigned __shift = (sizeof (__tm->__tmf) * __CHAR_BIT__  \
+ - (__msb) - (__wdt)); \
+static char __attribute__ ((__unused__))   \
+  __check_parms_##__tmf[(__msb) >= 0 && (__wdt) > 0
\
+   && __shift >= (_M_min_shift_##__tmf \
+  + (sizeof (__tm->__tmf)  \
+ * __CHAR_BIT__) - 16) \
+   ? 1 : -1];  \
+const unsigned __mask = ((1 << (__wdt)) - 1) << __shift;   \
+if (!__totm)   \
+  this->__stf = (__tm->__tmf & __mask) >> __shift; \
+__tm->__tmf &= ~__mask;\
+if (__totm)
\
+  __tm->__tmf |= ((unsigned)this->__stf << __shift) & __mask;  \
+}  \
+  while (0)
+
+  _M_time_get_state_bitfield_inout (tm_hour,  0, 1, _M_have_I);
+  _M_time_get_state_bitfield_inout (tm_wday,  0, 1, _M_have_wday);
+  _M_time_get_state_bitfield_inout (tm_yday,  0, 1, _M_have_yday);
+  _M_time_get_state_bitfield_inout (tm_mon,   0, 1, _M_have_mon);
+  _M_time_get_state_bitfield_inout (tm_mday,  0, 1, _M_have_mday);
+  _M_time_get_state_bitfield_inout (tm_yday,  1, 1, _M_have_uweek);
+  _M_time_get_state_bitfield_inout (tm_yday,  2, 1, _M_have_wweek);
+  _M_time_get_state_bitfield_inout (tm_isdst, 0, 1, _M_have_century);
+  _M_time_get_state_bitfield_inout (tm_hour,  1, 1, _M_is_pm);
+  _M_time_get_state_bitfield_inout (tm_isdst, 1, 1, _M_want_century);
+  _M_time_get_state_bitfield_inout (tm_yday,  3, 1, _M_want_xday);
+  // _M_pad1
+  _M_time_get_state_bitfield_inout (tm_wday,  1, 6, _M_week_no);
+  // _M_pad2
+  _M_time_get_state_bitfield_inout (tm_mon,   1, 8, _M_century);
+  // _M_pad3
+
+#undef _M_min_shift_tm_hour
+#undef _M_min_shift_tm_sec
+#undef _M_min_shift_tm_min
+#undef _M_min_shift_tm_hour
+#undef _M_min_shift_tm_mday
+#undef _M_min_shift_tm_mon
+#undef _M_min_shift_tm_year