I'm trying to extract data from a Debezium CDC source, in which one of the 
backing tables has an open schema nested JSON field like this:


"objectives": {
    "items": [
        {
            "id": 1,
            "label": "test 1"
            "size": 1000.0
        },
        {
            "id": 2,
            "label": "test 2"
            "size": 500.0
        }
    ],
    "threshold": 10.0,
    "threshold_period": "hourly",
    "max_ms": 30000.0
}


Any of these fields can be missing at any time, and there can also be 
additional, different fields. It is guaranteed that a field will have the same 
data type for all occurrences.

For now, I really need to get only the `threshold` and `threshold_period` 
fields. For which I'm using a field as the following:


CREATE TABLE probes (
  `objectives` ROW(`threshold` FLOAT, `threshold_period` STRING)
  ...
) WITH (
     ...
      'format' = 'debezium-json',
      'debezium-json.schema-include' = 'true',
      'debezium-json.ignore-parse-errors' = 'true'
)


However I keep getting `NULL` values in my `objectives` column, or corrupt JSON 
message exceptions when I disable the `ignore-parse-errors` option.

Does JSON parsing need to match 100% the schema of the field or is it lenient?

Is there any option or syntactic detail I'm missing?

Best Regards,

Reply via email to