Hi, guodong 
 
> I am wondering if Flink SQL can/will support the flexible schema in the 
> future,

It’s an interesting topic, this feature is more close to the scope of schema 
inference.
The schema inference should come in next few releases. 

Best,
Leonard Xu




> for example, register the table without defining specific schema for each 
> field, to let user define a generic map or array for one field. but the value 
> of map/array can be any object. Then, the type conversion cost might be 
> saved. 
> 
> Guodong
> 
> 
> On Thu, May 28, 2020 at 7:43 PM Benchao Li <libenc...@gmail.com 
> <mailto:libenc...@gmail.com>> wrote:
> Hi Guodong,
> 
> I think you almost get the answer,
> 1. map type, it's not working for current implementation. For example, use 
> map<varchar, varchar>, if the value if non-string json object, then 
> `JsonNode.asText()` may not work as you wish.
> 2. list all fields you cares. IMO, this can fit your scenario. And you can 
> set format.fail-on-missing-field = true, to allow setting non-existed fields 
> to be null.
> 
> For 1, I think maybe we can support it in the future, and I've created 
> jira[1] to track this.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-18002 
> <https://issues.apache.org/jira/browse/FLINK-18002>
> Guodong Wang <wangg...@gmail.com <mailto:wangg...@gmail.com>> 于2020年5月28日周四 
> 下午6:32写道:
> Hi !
> 
> I want to use Flink SQL to process some json events. It is quite challenging 
> to define a schema for the Flink SQL table. 
> 
> My data source's format is some json like this
> {
>     "top_level_key1": "some value",
>     "nested_object": {
>         "nested_key1": "abc",
>         "nested_key2": 123,
>         "nested_key3": ["element1", "element2", "element3"]
>     }
> }
> 
> The big challenges for me to define a schema for the data source are
> 1. the keys in nested_object are flexible, there might be 3 unique keys or 
> more unique keys. If I enumerate all the keys in the schema, I think my code 
> is fragile, how to handle event which contains more  nested_keys in 
> nested_object ?
> 2. I know table api support Map type, but I am not sure if I can put generic 
> object as the value of the map. Because the values in nested_object are of 
> different types, some of them are int, some of them are string or array.
> 
> So. how to expose this kind of json data as table in Flink SQL without 
> enumerating all the nested_keys?
> 
> Thanks.
> 
> Guodong
> 
> 
> -- 
> 
> Best,
> Benchao Li

Reply via email to