You've proven my point completely.  This process is bottlenecked in the CPU.
The only way to improve it would be to optimize the system (libc) functions
like "fread" where it is spending most of it's time.

Or to optimize its IO handling to be more efficient. (E.g., use larger
blocks to reduce the number of syscalls.)
