Hi,

I'm attempting to implement a generic error handling ProcessFunction in
pyflink.  Given a user provided function, I want to invoke that function
for each element in the DataStream, catch any errors thrown by
the function, convert those errors into events, and then emit those event
errors to a different DataStream sink.

I'm trying to do this by reusing the same OutputTag in each of my
ProcessFunctions.
However, this does not work, I believe because I am using the same
error_output_tag in two different functions, which causes it to have a
reference(?)  to _thread.Rlock, which causes the ProcessFunction instance
to be un-pickleable.

Here's a standalone example
<https://gist.github.com/ottomata/cba55f2c65cf584ffdb933410f3b4237> of the
failure using the canonical word_count example.

My question is.
1. Does Flink support re-use of the same OutputTag instance in multiple
ProcessFunctions?
2. If so, is my problem pyflink / python / pickle specific?

Thanks!
-Andrew Otto
 Wikimedia Foundation

Reply via email to