Hi Everyone,
I've recently run into some unpleasantness with PySpark
when trying to use a pandas DataFrame *inside* a mapPartitions
function. I've traced the error to numexpr (which pandas uses) and
submitted a bug here:
https://code.google.com/p/numexpr/issues/detail?id=123
Bottom line: platform.machine's os.popen call fails upon close when run
under PySpark. Out of curiosity, has anyone run into something similar
and have a solution? Right now, I've been forced to patch numexpr in
order to prevent the call to platform.machine() as mentioned in the
above bug report.
Thanks,
Mike