I agree with Josh, PyTuple_New() can be faster than PyMem_Malloc() due to tuple 
free list. small_stack increases C stack consumption even for calls without 
keyword arguments. This is serious problem since we can't control stack 


