Re: Temporal join on rolling aggregate

2024-02-25 Thread Ron liu
+1,
But I think this should be a more general requirement, that is, support for
declaring watermarks in query, which can be declared for any type of
source, such as table, view. Similar to databricks provided [1], this needs
a FLIP.

[1]
https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-watermark.html

Best,
Ron


Re: Schema Evolution & Json Schemas

2024-02-25 Thread Andrew Otto
>  the following code generator
Oh, and FWIW we avoid code generation and POJOs, and instead rely on
Flink's Row or RowData abstractions.





On Sun, Feb 25, 2024 at 10:35 AM Andrew Otto  wrote:

> Hi!
>
> I'm not sure if this totally is relevant for you, but we use JSONSchema
> and JSON with Flink at the Wikimedia Foundation.
> We explicitly disallow the use of additionalProperties
> ,
> unless it is to define Map type fields
> 
> (where additionalProperties itself is a schema).
>
> We have JSONSchema converters and JSON Serdes to be able to use our
> JSONSchemas and JSON records with both the DataStream API (as Row) and
> Table API (as RowData).
>
> See:
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
> -
> https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object
>
> State schema evolution is supported via the EventRowTypeInfo wrapper
> 
> .
>
> Less directly about Flink: I gave a talk at Confluent's Current conf in
> 2022 about why we use JSONSchema
> .
> See also this blog post series if you are interested
> 
> !
>
> -Andrew Otto
>  Wikimedia Foundation
>
>
> On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara 
> wrote:
>
>> I'm facing some issues related to schema evolution in combination with
>> the usage of Json Schemas and I was just wondering whether there are any
>> recommended best practices.
>>
>> In particular, I'm using the following code generator:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo
>>
>> Main gotchas so far relate to the `additionalProperties` field. When
>> setting that to true, the resulting POJO is not valid according to Flink
>> rules because the generated getter/setter methods don't follow the java
>> beans naming conventions, e.g., see here:
>>
>> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>>
>> This means that the Kryo fallback is used for serialization purposes,
>> which is not only bad for performance but also breaks state schema
>> evolution.
>>
>> So, because of that, setting `additionalProperties` to `false` looks like
>> a good idea but then your job will break if an upstream/producer service
>> adds a property to the messages you are reading. To solve this problem, the
>> POJOs for your job (as a reader) can be generated to ignore the
>> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
>> This seems to be a good overall solution to the problem, but looks a bit
>> convoluted to me / didn't come without some trial & error (= pain &
>> frustration).
>>
>> Is there anyone here facing similar issues? It would be good to hear your
>> thoughts on this!
>>
>> BTW, this is very interesting article that touches on the above mentioned
>> difficulties:
>> -
>> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>>
>>
>>


Re: Schema Evolution & Json Schemas

2024-02-25 Thread Andrew Otto
Hi!

I'm not sure if this totally is relevant for you, but we use JSONSchema and
JSON with Flink at the Wikimedia Foundation.
We explicitly disallow the use of additionalProperties
,
unless it is to define Map type fields

(where additionalProperties itself is a schema).

We have JSONSchema converters and JSON Serdes to be able to use our
JSONSchemas and JSON records with both the DataStream API (as Row) and
Table API (as RowData).

See:
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/src/main/java/org/wikimedia/eventutilities/flink/formats/json
-
https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities-flink/#managing-a-object

State schema evolution is supported via the EventRowTypeInfo wrapper

.

Less directly about Flink: I gave a talk at Confluent's Current conf in
2022 about why we use JSONSchema
.
See also this blog post series if you are interested

!

-Andrew Otto
 Wikimedia Foundation


On Fri, Feb 23, 2024 at 1:58 AM Salva Alcántara 
wrote:

> I'm facing some issues related to schema evolution in combination with the
> usage of Json Schemas and I was just wondering whether there are any
> recommended best practices.
>
> In particular, I'm using the following code generator:
>
> - https://github.com/joelittlejohn/jsonschema2pojo
>
> Main gotchas so far relate to the `additionalProperties` field. When
> setting that to true, the resulting POJO is not valid according to Flink
> rules because the generated getter/setter methods don't follow the java
> beans naming conventions, e.g., see here:
>
> - https://github.com/joelittlejohn/jsonschema2pojo/issues/1589
>
> This means that the Kryo fallback is used for serialization purposes,
> which is not only bad for performance but also breaks state schema
> evolution.
>
> So, because of that, setting `additionalProperties` to `false` looks like
> a good idea but then your job will break if an upstream/producer service
> adds a property to the messages you are reading. To solve this problem, the
> POJOs for your job (as a reader) can be generated to ignore the
> `additionalProperties` field (via the `@JsonIgnore` Jackson annotation).
> This seems to be a good overall solution to the problem, but looks a bit
> convoluted to me / didn't come without some trial & error (= pain &
> frustration).
>
> Is there anyone here facing similar issues? It would be good to hear your
> thoughts on this!
>
> BTW, this is very interesting article that touches on the above mentioned
> difficulties:
> -
> https://www.creekservice.org/articles/2024/01/09/json-schema-evolution-part-2.html
>
>
>