> At the end of this loop, key[b] contains two copies of the cyclically > permuted skey next to each other. When building the cache, you scan > through the bits of val, xor the corresponding keys in if they're set > and then throw away half of the 32 bits when assigning > scache->bytes[val] = res; > > So I think you can use "uint16_t keys[NBBY];" and "uint16_t res = 0;", > replace j < 32 by j < 16 and 31 - j by 15 - j and you'll get the exact > same result.
In other words, the first nested loop can be simplified to this: for (b = 0; b < NBBY; ++b) key[b] = skey << b | skey >> (NBSK - b); and instead of populating the the key[] array up front, you could do: void stoeplitz_cache_init(struct stoeplitz_cache *scache, stoeplitz_key skey) { unsigned int b, shift, val; /* * Cache the results of all possible bit combinations of * one byte. */ for (val = 0; val < 256; ++val) { uint16_t res = 0; for (b = 0; b < NBBY; ++b) { shift = NBBY - b - 1; if (val & (1 << shift)) res ^= skey << b | skey >> (NBSK - b); } scache->bytes[val] = res; } }