Hi All,
I have some data output source which can only be written to by a specific Python API. For that I am (ab)using foreachPartition(writing_func) from PySpark which works pretty well. I wonder if its possible to somehow update the task metrics - specifically setBytesWritten - at the end of every partition. On the surface it seems impossible to me, for 2 reasons: 1. I don't think there is an open py4j gateway in a task context 2. TaskMetrics is accessed via ThreadLocal, so even with an open gateway I don't think it'll hit the specific thread Does anyone know of existing solution or a workaround? Thanks Shay