[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-24 Thread kurt at garloff dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #18 from Kurt Garloff  ---
dd_rescue-1.99.13 has been released including the fix to XORN.
(Fix uses uint* casts rather than uchar*.)

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-23 Thread kurt at garloff dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #17 from Kurt Garloff  ---
jIt fixes it. Fix committed to git (on bad old sf.net).
Will release new dd_rescue tomorrow ...

Thanks, Martin, Jakub, Andrew for analyzing this!

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-23 Thread kurt at garloff dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

kurt at garloff dot de changed:

   What|Removed |Added

 CC||kurt at garloff dot de

--- Comment #16 from kurt at garloff dot de ---
Using u32 instead of long to access the arrays in XORN should work as well, no?
Should result in better performance than doing it byte by byte...

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #15 from Martin Liška  ---
(In reply to Jakub Jelinek from comment #14)
> (In reply to Martin Liška from comment #10)
> > > where the XOR16 is implemented as:
> > > 
> > > #define XORN(in1,in2,out,len) \
> > > do {  \
> > >   uint _i;\
> > >   for (_i = 0; _i < len/sizeof(ulong); ++_i)  \
> > >   *((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^ 
> > > *((ulong*)(in2)+_i);\
> > > } while(0)
> > 
> > I can confirm that changing that to:
> > 
> > #define XORN(in1, in2, out, len)  \
> > do\
> > { \
> > uint _i;  \
> > for (_i = 0; _i < len; ++_i)  \
> > *(out + _i) = *(in1 + _i) ^ *(in2 + _i); \
> > } while (0)
> > 
> > fixes the problem. It seems very close to what I saw here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c13
> 
> It depends on if those arrays were stored as ulong or will be later read as
> ulong or something else.

Yes, that's what happens, they are stored with the aforementioned XORN function
as ulong types. And later are read with 
#define GETU32(p) ntohl(*((u32*)(p)))
as u32. That's the aliasing violation.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #14 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #10)
> > where the XOR16 is implemented as:
> > 
> > #define XORN(in1,in2,out,len)   \
> > do {\
> > uint _i;\
> > for (_i = 0; _i < len/sizeof(ulong); ++_i)  \
> > *((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^ 
> > *((ulong*)(in2)+_i);\
> > } while(0)
> 
> I can confirm that changing that to:
> 
> #define XORN(in1, in2, out, len)  \
>   do\
>   { \
>   uint _i;  \
>   for (_i = 0; _i < len; ++_i)  \
>   *(out + _i) = *(in1 + _i) ^ *(in2 + _i); \
>   } while (0)
> 
> fixes the problem. It seems very close to what I saw here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c13

It depends on if those arrays were stored as ulong or will be later read as
ulong or something else.
One could also use typedef ulong ulong_alias __attribute__((may_alias));
and use ulong_alias* above, or memcpy to/out of ulong temporaries.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #13 from Martin Liška  ---
Got it:
https://sourceforge.net/p/ddrescue/tickets/6/

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #12 from Martin Liška  ---
(In reply to Sam James from comment #11)
> Can you drop a link in here if/when reported upstream? Thanks

I was unable to find a bugzilla insntance for the dd_rescue project. Is there
any?

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #11 from Sam James  ---
Can you drop a link in here if/when reported upstream? Thanks

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|WAITING |RESOLVED

--- Comment #10 from Martin Liška  ---
> where the XOR16 is implemented as:
> 
> #define XORN(in1,in2,out,len) \
> do {  \
>   uint _i;\
>   for (_i = 0; _i < len/sizeof(ulong); ++_i)  \
>   *((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^ 
> *((ulong*)(in2)+_i);\
> } while(0)

I can confirm that changing that to:

#define XORN(in1, in2, out, len)  \
do\
{ \
uint _i;  \
for (_i = 0; _i < len; ++_i)  \
*(out + _i) = *(in1 + _i) ^ *(in2 + _i); \
} while (0)

fixes the problem. It seems very close to what I saw here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c13

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #9 from Martin Liška  ---
Actually, looking at the tree dumps before and after the revision, it's leading
to a different place:

First difference happens in:
test_aes.ltrans0.ltrans.116t.dse2

[local count: 8687547526]:
-  _118 = MEM[(ulong *)iv_4(D)];
-  _120 = MEM[(ulong *)input_19];
-  _121 = _118 ^ _120;
-  MEM[(ulong *)iv_4(D)] = _121;
-  _129 = MEM[(ulong *)iv_4(D) + 8B];
-  _131 = MEM[(ulong *)input_19 + 8B];
-  _132 = _129 ^ _131;
-  MEM[(ulong *)iv_4(D) + 8B] = _132;

(there's one more optimized out block like this. Which maps to:

int  AES_Gen_CBC_Enc(AES_Crypt_Blk_fn *cryptfn,
 const uchar* rkeys, uint rounds,
 uchar *iv, uint pad,
 const uchar *input, uchar *output,
 ssize_t len, ssize_t *olen)
{
*olen = len;
while (len >= 16) {
XOR16(iv, input, iv);
cryptfn(rkeys, rounds, iv, iv);
memcpy(output, iv, 16);
len -= 16; input += 16; output += 16;
}
if (len || pad == PAD_ALWAYS) {
uchar *in = crypto->blkbuf2;
fill_blk(input, in, len, pad);
XOR16(iv, in, iv);
cryptfn(rkeys, rounds, iv, output);
/* Store last IV */
memcpy(iv, output, 16);
*olen += 16-(len&15);
//memset(in, 0, 16);
//LFENCE;
}
return (pad == PAD_ALWAYS || (len&15))? 16-(len&15): 0;
}

where the XOR16 is implemented as:

#define XORN(in1,in2,out,len)   \
do {\
uint _i;\
for (_i = 0; _i < len/sizeof(ulong); ++_i)  \
*((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^
*((ulong*)(in2)+_i);\
} while(0)

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #8 from Martin Liška  ---
Ok, one suspicious casting of memory comes from:

typedef union _roundkey {
unsigned char data[16];
unsigned int data32[4];
} roundkey;

...

typedef struct _sec_fields {
...
roundkey ekeys[40];

...
uchar *rkeys = (uchar*)crypto->ekeys;   //malloc(alg->ctx_size);

It's then passed to rijndaelEncrypt(const u8 *rkeys /*u32 rk[4*(Nr + 1)]*/,
uint Nr, const u8 pt[16], u8 ct[16])
 as the first argument, where it's later used as:

const u32 *rk = (u32*)rkeys;

Might be the violation we face? Here are mixed 'roundkey *' and 'u32*'.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
(In reply to Martin Liška from comment #6)
> Yes, I'm also suspecting this code and I can verify that using optimize("O0")
> for rijndaelEncrypt fixes the issue.
> 
> The thing below is cast from 'const u8 *' and I thought it's valid to case
> to 'u32 *' and then access it. Can you explain to me how exactly the
> violation happens?

A cast is fine, what matters are the accesses.  If the object has u32 type,
then casting
its address to const u8 * and later on back to u32 * and accessing through that
type
is fine, but if it has an incompatible type, it is not.  Similarly, if it is
heap allocated, the type is given to it through the stores to it and later
reads from it should be done with a compatible (aliasing wise) type.
See section 6.5 in e.g. C17 for the details (definition of effective type and
the aliasing requirements).

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #6 from Martin Liška  ---
(In reply to Andrew Pinski from comment #5)
> I am 99% sure there is aliasing violations in this code too:
> #if _MSC_VER
> #define GETU32(p) SWAP(*((u32 *)(p)))
> #define PUTU32(ct, st)  \
> {   \
> *((u32 *)(ct)) = SWAP((st));\
> }

Yes, I'm also suspecting this code and I can verify that using optimize("O0")
for rijndaelEncrypt fixes the issue.

The thing below is cast from 'const u8 *' and I thought it's valid to case
to 'u32 *' and then access it. Can you explain to me how exactly the violation
happens?

> #else   /* _MSC_VER */
> # if __BYTE_ORDER == __BIG_ENDIAN
> #define GETU32(p) *((u32*)(p))
> #define PUTU32(ct, st) *((u32*)(ct)) = (st)
> #else   /* BIG_ENDIAN */
> #if 0 //def HAVE_LINUX_SWAB_H
> #define GETU32(p) __swab32(*((u32*)(p)))
> #define PUTU32(ct, st) *((u32*)(ct)) = __swab32((st))
> #else   /* __GNUC__ */
> #include 
> #define GETU32(p) ntohl(*((u32*)(p)))
> #define PUTU32(ct, st) *((u32*)(ct)) = htonl((st))
> #endif  /* __GNUC__ */
> #endif  /* BIG_ENDIAN */
> #endif  /*_MSC_VER */
> 

This part below is guarded with '#if 0'..

> #if 0
> #define GETU32(pt) (((u32)(pt)[0] << 24) ^ ((u32)(pt)[1] << 16) ^
> ((u32)(pt)[2] << 8) ^ ((u32)(pt)[3]))
> #define PUTU32(ct, st)  \
> {   \
> (ct)[0] = (u8)((st) >> 24); \
> (ct)[1] = (u8)((st) >> 16); \
> (ct)[2] = (u8)((st) >> 8);  \
> (ct)[3] = (u8)(st); \
> }
> #endif
> 
> A few other places too ...

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #5 from Andrew Pinski  ---
I am 99% sure there is aliasing violations in this code too:
#if _MSC_VER
#define GETU32(p) SWAP(*((u32 *)(p)))
#define PUTU32(ct, st)  \
{   \
*((u32 *)(ct)) = SWAP((st));\
}
#else   /* _MSC_VER */
# if __BYTE_ORDER == __BIG_ENDIAN
#define GETU32(p) *((u32*)(p))
#define PUTU32(ct, st) *((u32*)(ct)) = (st)
#else   /* BIG_ENDIAN */
#if 0 //def HAVE_LINUX_SWAB_H
#define GETU32(p) __swab32(*((u32*)(p)))
#define PUTU32(ct, st) *((u32*)(ct)) = __swab32((st))
#else   /* __GNUC__ */
#include 
#define GETU32(p) ntohl(*((u32*)(p)))
#define PUTU32(ct, st) *((u32*)(ct)) = htonl((st))
#endif  /* __GNUC__ */
#endif  /* BIG_ENDIAN */
#endif  /*_MSC_VER */

#if 0
#define GETU32(pt) (((u32)(pt)[0] << 24) ^ ((u32)(pt)[1] << 16) ^ ((u32)(pt)[2]
<< 8) ^ ((u32)(pt)[3]))
#define PUTU32(ct, st)  \
{   \
(ct)[0] = (u8)((st) >> 24); \
(ct)[1] = (u8)((st) >> 16); \
(ct)[2] = (u8)((st) >> 8);  \
(ct)[3] = (u8)(st); \
}
#endif

A few other places too ...

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #4 from Martin Liška  ---
(In reply to Andrew Pinski from comment #2)
> Also does adding -fno-strict-aliasing help?

Yes, it helps.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

--- Comment #3 from Martin Liška  ---
(In reply to Andrew Pinski from comment #1)
> We need a better testcase than the direction on how to build a full package.

Sure, I'll try to reduce it.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||lto

--- Comment #2 from Andrew Pinski  ---
Also does adding -fno-strict-aliasing help?

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |WAITING
   Keywords||needs-reduction

--- Comment #1 from Andrew Pinski  ---
We need a better testcase than the direction on how to build a full package.

[Bug ipa/108695] [13 Regression] Wrong code since r13-5215-gb1f30bf42d8d47 for dd_rescue package

2023-02-07 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2023-02-07
   Target Milestone|--- |13.0
 Status|UNCONFIRMED |NEW
   Priority|P3  |P1
 Ever confirmed|0   |1