Re: Failure Threshold in Spark Structured Streaming?

2020-07-02 Thread Jungtaek Lim
Structured Streaming is basically following SQL semantic, which doesn't have such a semantic of "max allowance of failures". If you'd like to tolerate malformed data, please read with raw format (string or binary) which won't fail with such data, and try converting. e.g. from_json() will produce

Failure Threshold in Spark Structured Streaming?

2020-07-02 Thread Eric Beabes
Currently my job fails even on a single failure. In other words, even if one incoming message is malformed the job fails. I believe there's a property that allows us to set an acceptable number of failures. I Googled but couldn't find the answer. Can someone please help? Thanks.