Re:Re: Schema Evolution & Json Schemas

2024-03-10 Thread Jensen
退订
















At 2024-02-26 20:55:19, "Salva Alcántara"  wrote:

Awesome Andrew, thanks a lot for the info!


On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto  wrote:

>  the following code generator
Oh, and FWIW we avoid code generation and POJOs, and instead rely on Flink's 
Row or RowData abstractions.










On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto  wrote:

Hi! 


I'm not sure if this totally is relevant for you, but we use JSONSchema and 
JSON with Flink at the Wikimedia Foundation. 
We explicitly disallow the use of additionalProperties, unless it is to define 
Map type fields (where additionalProperties itself is a schema).


We have JSONSchema converters and JSON Serdes to be able to use our JSONSchemas 
and JSON records with both the DataStream API (as Row) and Table API (as 
RowData).


See:
- 
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
- 
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object


State schema evolution is supported via the EventRowTypeInfo wrapper.


Less directly about Flink: I gave a talk at Confluent's Current conf in 2022 
about why we use JSONSchema. See also this blog post series if you are 
interested!


-Andrew Otto
 Wikimedia Foundation




On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara  wrote:

I'm facing some issues related to schema evolution in combination with the 
usage of Json Schemas and I was just wondering whether there are any 
recommended best practices.


In particular, I'm using the following code generator:


- https://github.com/joelittlejohn/jsonschema2pojo



Main gotchas so far relate to the `additionalProperties` field. When setting 
that to true, the resulting POJO is not valid according to Flink rules because 
the generated getter/setter methods don't follow the java beans naming 
conventions, e.g., see here:


- https://github.com/joelittlejohn/jsonschema2pojo/issues/1589


This means that the Kryo fallback is used for serialization purposes, which is 
not only bad for performance but also breaks state schema evolution.


So, because of that, setting `additionalProperties` to `false` looks like a 
good idea but then your job will break if an upstream/producer service adds a 
property to the messages you are reading. To solve this problem, the POJOs for 
your job (as a reader) can be generated to ignore the `additionalProperties` 
field (via the `@JsonIgnore` Jackson annotation). This seems to be a good 
overall solution to the problem, but looks a bit convoluted to me / didn't come 
without some trial & error (= pain & frustration).


Is there anyone here facing similar issues? It would be good to hear your 
thoughts on this!


BTW, this is very interesting article that touches on the above mentioned 
difficulties:
- 
https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
 



Re: Schema Evolution & Json Schemas

2024-02-26 Thread Salva Alcántara
Awesome Andrew, thanks a lot for the info!

On Sun, Feb 25, 2024 at 4:37 PM Andrew Otto  wrote:

> >  the following code generator
> Oh, and FWIW we avoid code generation and POJOs, and instead rely on
> Flink's Row or RowData abstractions.
>
>
>
>
>
> On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto  wrote:
>
>> Hi!
>>
>> I'm not sure if this totally is relevant for you, but we use JSONSchema
>> and JSON with Flink at the Wikimedia Foundation.
>> We explicitly disallow the use of additionalProperties
>> ,
>> unless it is to define Map type fields
>> 
>> (where additionalProperties itself is a schema).
>>
>> We have JSONSchema converters and JSON Serdes to be able to use our
>> JSONSchemas and JSON records with both the DataStream API (as Row) and
>> Table API (as RowData).
>>
>> See:
>> -
>> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
>> -
>> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object
>>
>> State schema evolution is supported via the EventRowTypeInfo wrapper
>> 
>> .
>>
>> Less directly about Flink: I gave a talk at Confluent's Current conf in
>> 2022 about why we use JSONSchema
>> .
>> See also this blog post series if you are interested
>> 
>> !
>>
>> -Andrew Otto
>>  Wikimedia Foundation
>>
>>
>> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara 
>> wrote:
>>
>>> I'm facing some issues related to schema evolution in combination with
>>> the usage of Json Schemas and I was just wondering whether there are any
>>> recommended best practices.
>>>
>>> In particular, I'm using the following code generator:
>>>
>>> - https://github.com/joelittlejohn/jsonschema2pojo
>>>
>>> Main gotchas so far relate to the `additionalProperties` field. When
>>> setting that to true, the resulting POJO is not valid according to Flink
>>> rules because the generated getter/setter methods don't follow the java
>>> beans naming conventions, e.g., see here:
>>>
>>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>>>
>>> This means that the Kryo fallback is used for serialization purposes,
>>> which is not only bad for performance but also breaks state schema
>>> evolution.
>>>
>>> So, because of that, setting `additionalProperties` to `false` looks
>>> like a good idea but then your job will break if an upstream/producer
>>> service adds a property to the messages you are reading. To solve this
>>> problem, the POJOs for your job (as a reader) can be generated to ignore
>>> the `additionalProperties` field (via the `@JsonIgnore` Jackson
>>> annotation). This seems to be a good overall solution to the problem, but
>>> looks a bit convoluted to me / didn't come without some trial & error (=
>>> pain & frustration).
>>>
>>> Is there anyone here facing similar issues? It would be good to hear
>>> your thoughts on this!
>>>
>>> BTW, this is very interesting article that touches on the above
>>> mentioned difficulties:
>>> -
>>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>>>
>>>
>>>


Re: Schema Evolution & Json Schemas

2024-02-25 Thread Andrew Otto
>  the following code generator
Oh, and FWIW we avoid code generation and POJOs, and instead rely on
Flink's Row or RowData abstractions.





On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto  wrote:

> Hi!
>
> I'm not sure if this totally is relevant for you, but we use JSONSchema
> and JSON with Flink at the Wikimedia Foundation.
> We explicitly disallow the use of additionalProperties
> ,
> unless it is to define Map type fields
> 
> (where additionalProperties itself is a schema).
>
> We have JSONSchema converters and JSON Serdes to be able to use our
> JSONSchemas and JSON records with both the DataStream API (as Row) and
> Table API (as RowData).
>
> See:
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object
>
> State schema evolution is supported via the EventRowTypeInfo wrapper
> 
> .
>
> Less directly about Flink: I gave a talk at Confluent's Current conf in
> 2022 about why we use JSONSchema
> .
> See also this blog post series if you are interested
> 
> !
>
> -Andrew Otto
>  Wikimedia Foundation
>
>
> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara 
> wrote:
>
>> I'm facing some issues related to schema evolution in combination with
>> the usage of Json Schemas and I was just wondering whether there are any
>> recommended best practices.
>>
>> In particular, I'm using the following code generator:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo
>>
>> Main gotchas so far relate to the `additionalProperties` field. When
>> setting that to true, the resulting POJO is not valid according to Flink
>> rules because the generated getter/setter methods don't follow the java
>> beans naming conventions, e.g., see here:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>>
>> This means that the Kryo fallback is used for serialization purposes,
>> which is not only bad for performance but also breaks state schema
>> evolution.
>>
>> So, because of that, setting `additionalProperties` to `false` looks like
>> a good idea but then your job will break if an upstream/producer service
>> adds a property to the messages you are reading. To solve this problem, the
>> POJOs for your job (as a reader) can be generated to ignore the
>> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
>> This seems to be a good overall solution to the problem, but looks a bit
>> convoluted to me / didn't come without some trial & error (= pain &
>> frustration).
>>
>> Is there anyone here facing similar issues? It would be good to hear your
>> thoughts on this!
>>
>> BTW, this is very interesting article that touches on the above mentioned
>> difficulties:
>> -
>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>>
>>
>>


Re: Schema Evolution & Json Schemas

2024-02-25 Thread Andrew Otto
Hi!

I'm not sure if this totally is relevant for you, but we use JSONSchema and
JSON with Flink at the Wikimedia Foundation.
We explicitly disallow the use of additionalProperties
,
unless it is to define Map type fields

(where additionalProperties itself is a schema).

We have JSONSchema converters and JSON Serdes to be able to use our
JSONSchemas and JSON records with both the DataStream API (as Row) and
Table API (as RowData).

See:
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object

State schema evolution is supported via the EventRowTypeInfo wrapper

.

Less directly about Flink: I gave a talk at Confluent's Current conf in
2022 about why we use JSONSchema
.
See also this blog post series if you are interested

!

-Andrew Otto
 Wikimedia Foundation


On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara 
wrote:

> I'm facing some issues related to schema evolution in combination with the
> usage of Json Schemas and I was just wondering whether there are any
> recommended best practices.
>
> In particular, I'm using the following code generator:
>
> - https://github.com/joelittlejohn/jsonschema2pojo
>
> Main gotchas so far relate to the `additionalProperties` field. When
> setting that to true, the resulting POJO is not valid according to Flink
> rules because the generated getter/setter methods don't follow the java
> beans naming conventions, e.g., see here:
>
> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>
> This means that the Kryo fallback is used for serialization purposes,
> which is not only bad for performance but also breaks state schema
> evolution.
>
> So, because of that, setting `additionalProperties` to `false` looks like
> a good idea but then your job will break if an upstream/producer service
> adds a property to the messages you are reading. To solve this problem, the
> POJOs for your job (as a reader) can be generated to ignore the
> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
> This seems to be a good overall solution to the problem, but looks a bit
> convoluted to me / didn't come without some trial & error (= pain &
> frustration).
>
> Is there anyone here facing similar issues? It would be good to hear your
> thoughts on this!
>
> BTW, this is very interesting article that touches on the above mentioned
> difficulties:
> -
> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>
>
>


Schema Evolution & Json Schemas

2024-02-22 Thread Salva Alcántara
I'm facing some issues related to schema evolution in combination with the
usage of Json Schemas and I was just wondering whether there are any
recommended best practices.

In particular, I'm using the following code generator:

- https://github.com/joelittlejohn/jsonschema2pojo

Main gotchas so far relate to the `additionalProperties` field. When
setting that to true, the resulting POJO is not valid according to Flink
rules because the generated getter/setter methods don't follow the java
beans naming conventions, e.g., see here:

- https://github.com/joelittlejohn/jsonschema2pojo/issues/1589

This means that the Kryo fallback is used for serialization purposes, which
is not only bad for performance but also breaks state schema evolution.

So, because of that, setting `additionalProperties` to `false` looks like a
good idea but then your job will break if an upstream/producer service adds
a property to the messages you are reading. To solve this problem, the
POJOs for your job (as a reader) can be generated to ignore the
`additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
This seems to be a good overall solution to the problem, but looks a bit
convoluted to me / didn't come without some trial & error (= pain &
frustration).

Is there anyone here facing similar issues? It would be good to hear your
thoughts on this!

BTW, this is very interesting article that touches on the above mentioned
difficulties:
-
https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html