Hi Everyone,
I've recently run into some unpleasantness with PySpark when trying to use a pandas DataFrame *inside* a mapPartitions function. I've traced the error to numexpr (which pandas uses) and submitted a bug here:
       https://code.google.com/p/numexpr/issues/detail?id=123

Bottom line: platform.machine's os.popen call fails upon close when run under PySpark. Out of curiosity, has anyone run into something similar and have a solution? Right now, I've been forced to patch numexpr in order to prevent the call to platform.machine() as mentioned in the above bug report.

Thanks,
Mike

Reply via email to