For the cumsum example profiler says 100% time is spent in 
theano.scan_module.scan_op.Scan of type Py

Scan Op profiling ( scan_fn )
  Message: None
  Time in 1 calls of the op (for a total of 100000 steps) 1.166187e+00s

  Total time spent in calling the VM 3.082674e-01s (26.434%)
  Total overhead (computing slices..) 8.579197e-01s (73.566%)

10000 calls of add inside the loop take total 0.076s (still 10x worse than 
pure C):

0.076s       7.62e-07s     C     100000        1   Elemwise{add,no_inplace}

However the real overhead seems to be managing the slices and calling the 
VM, no 
wonder if its done in python. I tried to force compilation into C using:


but I got:
AttributeError: 'Scan' object has no attribute 'c_compile_args'

There no C implementation of Scan operation?


