https://stackoverflow.com/questions/53938967/writing-corrupt-data-from-kafka-json-datasource-in-spark-structured-streaming
On Wed, Dec 26, 2018 at 2:42 PM Colin Williams
wrote:
>
> From my initial impression it looks like I'd need to create my own
> `from_json` using `jsonToStructs` as a referenc
>From my initial impression it looks like I'd need to create my own
`from_json` using `jsonToStructs` as a reference but try to handle `
case : BadRecordException => null ` or similar to try to write the non
matching string to a corrupt records column
On Wed, Dec 26, 2018 at 1:55 PM Colin Williams
Hi,
I'm trying to figure out how I can write records that don't match a
json read schema via spark structred streaming to an output sink /
parquet location. Previously I did this in batch via corrupt column
features of batch. But in this spark structured streaming I'm reading
from kafka a string a
Hi Fawze,
Thank you for the link. But that is exactly what I am doing.
I think this is related to
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
setting.
When the disk utilization exceeds this setting, the node is marked
unhealthy.
Other than increasing the default 9
Hi thanks. This is part of the solution I found after writing the
question. The other part being is that I needed to write the input
stream to a temporary file. I would prefer not to write any temporary
file but the ssl.keystore.location properties seems to expect a file
path.
On Tue, Dec 25, 201