GPU offload.

Some work on that already got done as part of the AXLE project, but there's
still a lot more to do to get anything that can be usefully integrated into

This likely ties in with batching work, since without batching it's
unlikely you can get much benefit from GPU offload.

