1/ how do you measure performances ? Anything which is not the median of 1-5K runs is meaningless.
2/ Don't use context, transform are usually better optimized by compilers

3/ are you using gcc on a 64 bits system ? On this configuration a gcc bug prevent proto to be inlined.
