Re: [PATCH][store merging][RFA] Re-implement merging code
On 10/10/16 12:15, Richard Biener wrote: On Mon, 10 Oct 2016, Richard Biener wrote: On Mon, 10 Oct 2016, Kyrill Tkachov wrote: On 10/10/16 11:22, Richard Biener wrote: On Mon, 10 Oct 2016, Kyrill Tkachov wrote: Hi Richard, As I mentioned, here is the patch applying to the main store merging patch to re-implement encode_tree_to_bitpos to operate on the bytes directly. This works fine on little-endian but breaks on big-endian, even for merging bitfields within a single byte. Consider the code snippet from gcc.dg/store_merging_6.c: struct bar { int a : 3; unsigned char b : 4; unsigned char c : 1; char d; char e; char f; char g; }; void foo1 (struct bar *p) { p->b = 3; p->a = 2; p->c = 1; p->d = 4; p->e = 5; } The correct GIMPLE for these merged stores on big-endian is: MEM[(voidD.49 *)p_2(D)] = 18180; MEM[(charD.8 *)p_2(D) + 2B] = 5; whereas with this patch we emit: MEM[(voidD.49 *)p_2(D)] = 39428; MEM[(charD.8 *)p_2(D) + 2B] = 5; The dump for merging the individual stores without this patch (using the correct but costly wide_int approach in the base patch) is: After writing 3 of size 4 at position 3 the merged region contains: 6 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 46 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 47 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 47 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 47 4 5 0 0 0 And with this patch it is: After writing 3 of size 4 at position 3 the merged region contains: 18 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 1a 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 9a 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 9a 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 9a 4 5 0 0 0 (Note the dump just dumps the byte array from index 0 to so the first thing printed is the lowest numbered byte. Also, each byte is dumped in hex.) The code as included here doesn't do any byte swapping for big-endian but as seen from the dump even writing a sub-byte bitfield goes wrong so it would be nice to resolve that before going forward. Any help with debugging this is hugely appreciated. I've included an ASCII diagram of the steps in the algorithm in the patch itself. Ah, I think you need to account for BITS_BIG_ENDIAN in shift_bytes_in_array. You have to shift towards MSB which means changing left to right shifts for BITS_BIG_ENDIAN. Thanks, I'll try it out. But this is on aarch64 where BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 so there's something else bad here. Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for pdp11 as well). Or maybe I'm confusing how get_inner_reference numbers "bits" when it returns bitpos... (and how a multi-byte value in target memory representation has to be "shifted" by bitpos). I really thought BITS_BIG_ENDIAN is the only thing that matters... Btw, I reproduced on ppc64-linux (which has BITS_BIG_ENDIAN). having looked around the documentation and codebase it looks like BITS_BIG_ENDIAN is just used to define how bitfield instructions on a target operate and so only apply to the RTL level, so I think we don't have to worry about that at GIMPLE. As a hack, adjusting bitpos for BYTES_BIG_ENDIAN to: bitpos = byte_size * BITS_PER_UNIT - bitlen - (bitpos % BITS_PER_UNIT) "fixes" the dg.exp=store_merging* testcases but is still wrong for multi-byte testcases (gcc.c-torture/execute/20040629-1.c is a good one that exercises all that). I'm trying to wrap my head around the byte layout and what should be shifted where... Thanks, Kyrill Richard. Richard. You also seem to miss to account for amnt / BITS_PER_UNIT != 0. Independently of BYTES_BIG_ENDIAN it would be ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; ... doh, yes. I'll fix that. (so best use a single load / store and operate on a temporary). Thanks, Kyrill Richard. Thanks, Kyrill
Re: [PATCH][store merging][RFA] Re-implement merging code
On Mon, 10 Oct 2016, Richard Biener wrote: > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > On 10/10/16 11:22, Richard Biener wrote: > > > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > > > Hi Richard, > > > > > > > > As I mentioned, here is the patch applying to the main store merging > > > > patch > > > > to > > > > re-implement encode_tree_to_bitpos > > > > to operate on the bytes directly. > > > > > > > > This works fine on little-endian but breaks on big-endian, even for > > > > merging > > > > bitfields within a single byte. > > > > Consider the code snippet from gcc.dg/store_merging_6.c: > > > > > > > > struct bar { > > > >int a : 3; > > > >unsigned char b : 4; > > > >unsigned char c : 1; > > > >char d; > > > >char e; > > > >char f; > > > >char g; > > > > }; > > > > > > > > void > > > > foo1 (struct bar *p) > > > > { > > > >p->b = 3; > > > >p->a = 2; > > > >p->c = 1; > > > >p->d = 4; > > > >p->e = 5; > > > > } > > > > > > > > The correct GIMPLE for these merged stores on big-endian is: > > > >MEM[(voidD.49 *)p_2(D)] = 18180; > > > >MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > > > whereas with this patch we emit: > > > >MEM[(voidD.49 *)p_2(D)] = 39428; > > > >MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > > > The dump for merging the individual stores without this patch (using the > > > > correct but costly wide_int approach in the base patch) is: > > > > After writing 3 of size 4 at position 3 the merged region contains: > > > > 6 0 0 0 0 0 > > > > After writing 2 of size 3 at position 0 the merged region contains: > > > > 46 0 0 0 0 0 > > > > After writing 1 of size 1 at position 7 the merged region contains: > > > > 47 0 0 0 0 0 > > > > After writing 4 of size 8 at position 8 the merged region contains: > > > > 47 4 0 0 0 0 > > > > After writing 5 of size 8 at position 16 the merged region contains: > > > > 47 4 5 0 0 0 > > > > > > > > > > > > And with this patch it is: > > > > After writing 3 of size 4 at position 3 the merged region contains: > > > > 18 0 0 0 0 0 > > > > After writing 2 of size 3 at position 0 the merged region contains: > > > > 1a 0 0 0 0 0 > > > > After writing 1 of size 1 at position 7 the merged region contains: > > > > 9a 0 0 0 0 0 > > > > After writing 4 of size 8 at position 8 the merged region contains: > > > > 9a 4 0 0 0 0 > > > > After writing 5 of size 8 at position 16 the merged region contains: > > > > 9a 4 5 0 0 0 > > > > > > > > (Note the dump just dumps the byte array from index 0 to so the > > > > first > > > > thing printed is the lowest numbered byte. > > > > Also, each byte is dumped in hex.) > > > > > > > > The code as included here doesn't do any byte swapping for big-endian > > > > but > > > > as > > > > seen from the dump even writing a sub-byte > > > > bitfield goes wrong so it would be nice to resolve that before going > > > > forward. > > > > Any help with debugging this is hugely appreciated. I've included an > > > > ASCII > > > > diagram of the steps in the algorithm > > > > in the patch itself. > > > Ah, I think you need to account for BITS_BIG_ENDIAN in > > > shift_bytes_in_array. You have to shift towards MSB which means changing > > > left to right shifts for BITS_BIG_ENDIAN. > > > > Thanks, I'll try it out. But this is on aarch64 where > > BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 > > so there's something else bad here. > > Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN > (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for > pdp11 as well). Or maybe I'm confusing how get_inner_reference numbers "bits" when it returns bitpos... (and how a multi-byte value in target memory representation has to be "shifted" by bitpos). I really thought BITS_BIG_ENDIAN is the only thing that matters... Btw, I reproduced on ppc64-linux (which has BITS_BIG_ENDIAN). Richard. > Richard. > > > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0. > > > Independently of BYTES_BIG_ENDIAN it would be > > > > > >ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; > > > ... > > > > doh, yes. I'll fix that. > > > > > (so best use a single load / store and operate on a temporary). > > > > Thanks, > > Kyrill > > > > > Richard. > > > > > > > Thanks, > > > > Kyrill > > > > > > > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][store merging][RFA] Re-implement merging code
On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > On 10/10/16 11:22, Richard Biener wrote: > > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > > > Hi Richard, > > > > > > As I mentioned, here is the patch applying to the main store merging patch > > > to > > > re-implement encode_tree_to_bitpos > > > to operate on the bytes directly. > > > > > > This works fine on little-endian but breaks on big-endian, even for > > > merging > > > bitfields within a single byte. > > > Consider the code snippet from gcc.dg/store_merging_6.c: > > > > > > struct bar { > > >int a : 3; > > >unsigned char b : 4; > > >unsigned char c : 1; > > >char d; > > >char e; > > >char f; > > >char g; > > > }; > > > > > > void > > > foo1 (struct bar *p) > > > { > > >p->b = 3; > > >p->a = 2; > > >p->c = 1; > > >p->d = 4; > > >p->e = 5; > > > } > > > > > > The correct GIMPLE for these merged stores on big-endian is: > > >MEM[(voidD.49 *)p_2(D)] = 18180; > > >MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > whereas with this patch we emit: > > >MEM[(voidD.49 *)p_2(D)] = 39428; > > >MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > > > The dump for merging the individual stores without this patch (using the > > > correct but costly wide_int approach in the base patch) is: > > > After writing 3 of size 4 at position 3 the merged region contains: > > > 6 0 0 0 0 0 > > > After writing 2 of size 3 at position 0 the merged region contains: > > > 46 0 0 0 0 0 > > > After writing 1 of size 1 at position 7 the merged region contains: > > > 47 0 0 0 0 0 > > > After writing 4 of size 8 at position 8 the merged region contains: > > > 47 4 0 0 0 0 > > > After writing 5 of size 8 at position 16 the merged region contains: > > > 47 4 5 0 0 0 > > > > > > > > > And with this patch it is: > > > After writing 3 of size 4 at position 3 the merged region contains: > > > 18 0 0 0 0 0 > > > After writing 2 of size 3 at position 0 the merged region contains: > > > 1a 0 0 0 0 0 > > > After writing 1 of size 1 at position 7 the merged region contains: > > > 9a 0 0 0 0 0 > > > After writing 4 of size 8 at position 8 the merged region contains: > > > 9a 4 0 0 0 0 > > > After writing 5 of size 8 at position 16 the merged region contains: > > > 9a 4 5 0 0 0 > > > > > > (Note the dump just dumps the byte array from index 0 to so the > > > first > > > thing printed is the lowest numbered byte. > > > Also, each byte is dumped in hex.) > > > > > > The code as included here doesn't do any byte swapping for big-endian but > > > as > > > seen from the dump even writing a sub-byte > > > bitfield goes wrong so it would be nice to resolve that before going > > > forward. > > > Any help with debugging this is hugely appreciated. I've included an ASCII > > > diagram of the steps in the algorithm > > > in the patch itself. > > Ah, I think you need to account for BITS_BIG_ENDIAN in > > shift_bytes_in_array. You have to shift towards MSB which means changing > > left to right shifts for BITS_BIG_ENDIAN. > > Thanks, I'll try it out. But this is on aarch64 where > BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 > so there's something else bad here. Maybe I'm confusing all the macros, so maybe it's BYTES_BIG_ENDIAN (vs. WORDS_BIG_ENDIAN -- in theory this approach should work for pdp11 as well). Richard. > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0. > > Independently of BYTES_BIG_ENDIAN it would be > > > >ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; > > ... > > doh, yes. I'll fix that. > > > (so best use a single load / store and operate on a temporary). > > Thanks, > Kyrill > > > Richard. > > > > > Thanks, > > > Kyrill > > > > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH][store merging][RFA] Re-implement merging code
On 10/10/16 12:06, Kyrill Tkachov wrote: On 10/10/16 11:22, Richard Biener wrote: On Mon, 10 Oct 2016, Kyrill Tkachov wrote: Hi Richard, As I mentioned, here is the patch applying to the main store merging patch to re-implement encode_tree_to_bitpos to operate on the bytes directly. This works fine on little-endian but breaks on big-endian, even for merging bitfields within a single byte. Consider the code snippet from gcc.dg/store_merging_6.c: struct bar { int a : 3; unsigned char b : 4; unsigned char c : 1; char d; char e; char f; char g; }; void foo1 (struct bar *p) { p->b = 3; p->a = 2; p->c = 1; p->d = 4; p->e = 5; } The correct GIMPLE for these merged stores on big-endian is: MEM[(voidD.49 *)p_2(D)] = 18180; MEM[(charD.8 *)p_2(D) + 2B] = 5; whereas with this patch we emit: MEM[(voidD.49 *)p_2(D)] = 39428; MEM[(charD.8 *)p_2(D) + 2B] = 5; The dump for merging the individual stores without this patch (using the correct but costly wide_int approach in the base patch) is: After writing 3 of size 4 at position 3 the merged region contains: 6 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 46 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 47 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 47 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 47 4 5 0 0 0 And with this patch it is: After writing 3 of size 4 at position 3 the merged region contains: 18 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 1a 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 9a 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 9a 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 9a 4 5 0 0 0 (Note the dump just dumps the byte array from index 0 to so the first thing printed is the lowest numbered byte. Also, each byte is dumped in hex.) The code as included here doesn't do any byte swapping for big-endian but as seen from the dump even writing a sub-byte bitfield goes wrong so it would be nice to resolve that before going forward. Any help with debugging this is hugely appreciated. I've included an ASCII diagram of the steps in the algorithm in the patch itself. Ah, I think you need to account for BITS_BIG_ENDIAN in shift_bytes_in_array. You have to shift towards MSB which means changing left to right shifts for BITS_BIG_ENDIAN. Thanks, I'll try it out. But this is on aarch64 where BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 so there's something else bad here. You also seem to miss to account for amnt / BITS_PER_UNIT != 0. Independently of BYTES_BIG_ENDIAN it would be ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; ... doh, yes. I'll fix that. Scratch that, just read your other reply. The precondition for that function is that the shift amount is less than BITS_PER_UNIT. I'll clarify that in the comment. Kyril (so best use a single load / store and operate on a temporary). Thanks, Kyrill Richard. Thanks, Kyrill
Re: [PATCH][store merging][RFA] Re-implement merging code
On 10/10/16 11:22, Richard Biener wrote: On Mon, 10 Oct 2016, Kyrill Tkachov wrote: Hi Richard, As I mentioned, here is the patch applying to the main store merging patch to re-implement encode_tree_to_bitpos to operate on the bytes directly. This works fine on little-endian but breaks on big-endian, even for merging bitfields within a single byte. Consider the code snippet from gcc.dg/store_merging_6.c: struct bar { int a : 3; unsigned char b : 4; unsigned char c : 1; char d; char e; char f; char g; }; void foo1 (struct bar *p) { p->b = 3; p->a = 2; p->c = 1; p->d = 4; p->e = 5; } The correct GIMPLE for these merged stores on big-endian is: MEM[(voidD.49 *)p_2(D)] = 18180; MEM[(charD.8 *)p_2(D) + 2B] = 5; whereas with this patch we emit: MEM[(voidD.49 *)p_2(D)] = 39428; MEM[(charD.8 *)p_2(D) + 2B] = 5; The dump for merging the individual stores without this patch (using the correct but costly wide_int approach in the base patch) is: After writing 3 of size 4 at position 3 the merged region contains: 6 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 46 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 47 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 47 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 47 4 5 0 0 0 And with this patch it is: After writing 3 of size 4 at position 3 the merged region contains: 18 0 0 0 0 0 After writing 2 of size 3 at position 0 the merged region contains: 1a 0 0 0 0 0 After writing 1 of size 1 at position 7 the merged region contains: 9a 0 0 0 0 0 After writing 4 of size 8 at position 8 the merged region contains: 9a 4 0 0 0 0 After writing 5 of size 8 at position 16 the merged region contains: 9a 4 5 0 0 0 (Note the dump just dumps the byte array from index 0 to so the first thing printed is the lowest numbered byte. Also, each byte is dumped in hex.) The code as included here doesn't do any byte swapping for big-endian but as seen from the dump even writing a sub-byte bitfield goes wrong so it would be nice to resolve that before going forward. Any help with debugging this is hugely appreciated. I've included an ASCII diagram of the steps in the algorithm in the patch itself. Ah, I think you need to account for BITS_BIG_ENDIAN in shift_bytes_in_array. You have to shift towards MSB which means changing left to right shifts for BITS_BIG_ENDIAN. Thanks, I'll try it out. But this is on aarch64 where BITS_BIG_ENDIAN is 0 even when BYTES_BIG_ENDIAN is 1 so there's something else bad here. You also seem to miss to account for amnt / BITS_PER_UNIT != 0. Independently of BYTES_BIG_ENDIAN it would be ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; ... doh, yes. I'll fix that. (so best use a single load / store and operate on a temporary). Thanks, Kyrill Richard. Thanks, Kyrill
Re: [PATCH][store merging][RFA] Re-implement merging code
On Mon, 10 Oct 2016, Richard Biener wrote: > On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > > > Hi Richard, > > > > As I mentioned, here is the patch applying to the main store merging patch > > to > > re-implement encode_tree_to_bitpos > > to operate on the bytes directly. > > > > This works fine on little-endian but breaks on big-endian, even for merging > > bitfields within a single byte. > > Consider the code snippet from gcc.dg/store_merging_6.c: > > > > struct bar { > > int a : 3; > > unsigned char b : 4; > > unsigned char c : 1; > > char d; > > char e; > > char f; > > char g; > > }; > > > > void > > foo1 (struct bar *p) > > { > > p->b = 3; > > p->a = 2; > > p->c = 1; > > p->d = 4; > > p->e = 5; > > } > > > > The correct GIMPLE for these merged stores on big-endian is: > > MEM[(voidD.49 *)p_2(D)] = 18180; > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > whereas with this patch we emit: > > MEM[(voidD.49 *)p_2(D)] = 39428; > > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > > > The dump for merging the individual stores without this patch (using the > > correct but costly wide_int approach in the base patch) is: > > After writing 3 of size 4 at position 3 the merged region contains: > > 6 0 0 0 0 0 > > After writing 2 of size 3 at position 0 the merged region contains: > > 46 0 0 0 0 0 > > After writing 1 of size 1 at position 7 the merged region contains: > > 47 0 0 0 0 0 > > After writing 4 of size 8 at position 8 the merged region contains: > > 47 4 0 0 0 0 > > After writing 5 of size 8 at position 16 the merged region contains: > > 47 4 5 0 0 0 > > > > > > And with this patch it is: > > After writing 3 of size 4 at position 3 the merged region contains: > > 18 0 0 0 0 0 > > After writing 2 of size 3 at position 0 the merged region contains: > > 1a 0 0 0 0 0 > > After writing 1 of size 1 at position 7 the merged region contains: > > 9a 0 0 0 0 0 > > After writing 4 of size 8 at position 8 the merged region contains: > > 9a 4 0 0 0 0 > > After writing 5 of size 8 at position 16 the merged region contains: > > 9a 4 5 0 0 0 > > > > (Note the dump just dumps the byte array from index 0 to so the first > > thing printed is the lowest numbered byte. > > Also, each byte is dumped in hex.) > > > > The code as included here doesn't do any byte swapping for big-endian but as > > seen from the dump even writing a sub-byte > > bitfield goes wrong so it would be nice to resolve that before going > > forward. > > Any help with debugging this is hugely appreciated. I've included an ASCII > > diagram of the steps in the algorithm > > in the patch itself. > > Ah, I think you need to account for BITS_BIG_ENDIAN in > shift_bytes_in_array. You have to shift towards MSB which means changing > left to right shifts for BITS_BIG_ENDIAN. > > You also seem to miss to account for amnt / BITS_PER_UNIT != 0. > Independently of BYTES_BIG_ENDIAN it would be Ok, that would matter only if you'd merge shift_bytes_in_array, clear_bit_region and the |-ring of that into the final buffer (which should be possible). Richard.
Re: [PATCH][store merging][RFA] Re-implement merging code
On Mon, 10 Oct 2016, Kyrill Tkachov wrote: > Hi Richard, > > As I mentioned, here is the patch applying to the main store merging patch to > re-implement encode_tree_to_bitpos > to operate on the bytes directly. > > This works fine on little-endian but breaks on big-endian, even for merging > bitfields within a single byte. > Consider the code snippet from gcc.dg/store_merging_6.c: > > struct bar { > int a : 3; > unsigned char b : 4; > unsigned char c : 1; > char d; > char e; > char f; > char g; > }; > > void > foo1 (struct bar *p) > { > p->b = 3; > p->a = 2; > p->c = 1; > p->d = 4; > p->e = 5; > } > > The correct GIMPLE for these merged stores on big-endian is: > MEM[(voidD.49 *)p_2(D)] = 18180; > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > whereas with this patch we emit: > MEM[(voidD.49 *)p_2(D)] = 39428; > MEM[(charD.8 *)p_2(D) + 2B] = 5; > > The dump for merging the individual stores without this patch (using the > correct but costly wide_int approach in the base patch) is: > After writing 3 of size 4 at position 3 the merged region contains: > 6 0 0 0 0 0 > After writing 2 of size 3 at position 0 the merged region contains: > 46 0 0 0 0 0 > After writing 1 of size 1 at position 7 the merged region contains: > 47 0 0 0 0 0 > After writing 4 of size 8 at position 8 the merged region contains: > 47 4 0 0 0 0 > After writing 5 of size 8 at position 16 the merged region contains: > 47 4 5 0 0 0 > > > And with this patch it is: > After writing 3 of size 4 at position 3 the merged region contains: > 18 0 0 0 0 0 > After writing 2 of size 3 at position 0 the merged region contains: > 1a 0 0 0 0 0 > After writing 1 of size 1 at position 7 the merged region contains: > 9a 0 0 0 0 0 > After writing 4 of size 8 at position 8 the merged region contains: > 9a 4 0 0 0 0 > After writing 5 of size 8 at position 16 the merged region contains: > 9a 4 5 0 0 0 > > (Note the dump just dumps the byte array from index 0 to so the first > thing printed is the lowest numbered byte. > Also, each byte is dumped in hex.) > > The code as included here doesn't do any byte swapping for big-endian but as > seen from the dump even writing a sub-byte > bitfield goes wrong so it would be nice to resolve that before going forward. > Any help with debugging this is hugely appreciated. I've included an ASCII > diagram of the steps in the algorithm > in the patch itself. Ah, I think you need to account for BITS_BIG_ENDIAN in shift_bytes_in_array. You have to shift towards MSB which means changing left to right shifts for BITS_BIG_ENDIAN. You also seem to miss to account for amnt / BITS_PER_UNIT != 0. Independently of BYTES_BIG_ENDIAN it would be ptr[i + (amnt / BITS_PER_UNIT)] = ptr[i] << amnt; ... (so best use a single load / store and operate on a temporary). Richard. > Thanks, > Kyrill > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)