Re: What is the state of Scala wrappers?

2023-02-13 Thread Alexey Novakov via user
I think this blog article is good enough for the Scala support awareness:
https://flink.apache.org/2022/02/22/scala-free.html
In my opinion, it would be much better if one of the Scala-wrappers
community project moves under umbrella of *org.apache.flink *to have a
chance to survive longer.

Alexey


On Wed, Feb 8, 2023 at 8:59 AM Martijn Visser 
wrote:

> Hi all,
>
> Is there anything that the Flink community could do to raise awareness?
> Perhaps it would be interesting for the maintainers to write a short blog
> post about it, which potentially could drive traffic?
>
> Best regards,
>
> Martijn
>
> On Sun, Feb 5, 2023 at 4:39 PM Alexey Novakov via user <
> user@flink.apache.org> wrote:
>
>> Hi Erwan,
>>
>> I think those 2 projects you mentioned are pretty much the options we
>> have at the moment if you want to use Scala 2.13 or 3.
>> I believe your contribution to upgrade one of them to Flink 1.16 will be
>> very welcomed.
>>
>> Best regards,
>> Alex
>>
>> On Thu, Feb 2, 2023 at 9:32 AM Erwan Loisant  wrote:
>>
>>> Hi,
>>>
>>> Back in October, the Flink team announced that the Scala API was to be
>>> deprecated them removed. Which I think is perfectly fine, having third
>>> party develop Scala wrappers is a good approach.
>>>
>>> With the announce I expected those wrapper projects to get steam,
>>> however both projects linked in the announcement (
>>> https://github.com/findify/flink-adt and
>>> https://github.com/ariskk/flink4s) don't seem much maintained, and are
>>> stuck on Flink 1.15.
>>>
>>> Any team here using Flink with Scala are moving away from the official
>>> Scala API? Maybe there is a project that I'm missing that is getting more
>>> attentions than the 2 linked aboved?
>>>
>>>
>>> Thank you!
>>> Erwan
>>>
>>


Flink程序内存Dump不了

2023-02-13 Thread lxk
Flink version:1.16
java version: jdk1.8.0_251
问题:最近上线的Flink程序,频繁young 
gc,几秒一次,在调整了新生代大小之后,还是没有解决,目前整个jvm堆大小是3.57g。因此想通过程序内存情况来分析哪里问题有问题,我们通过yarn上的applicationId,使用ps
 -ef|grep 1666758697316_2639108,找到对应的pid,最后执行 jmap -dump:format 
b,file=user.dump 26326 
命令生成dump文件,但我们测试了很多个程序,只要一开始dump,都会对线上程序产生影响,程序的container会莫名的死掉,然后程序重启。具体执行命令后的报错如下:
sun.jvm.hotspot.debugger.UnmappedAddressException: 7f74efa5d410
https://pic.imgdb.cn/item/63eb2a46f144a010071899ba.png
不知道大家有没有遇见这个问题,是我们使用的姿势不对,还是目前使用的版本有什么问题,希望大家能够给出一些建议和看法。

Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
Hi Alexis,

If we change the operator uid and restart the job, the job will not be
started successfully[1]. We have to use --allowNonRestoredState to start
it. This means that the state for the old uid will not be used in the
operator with the new uid. I think the data in the state will be lost.

Best,
Hang

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/savepoints/#what-happens-if-i-delete-an-operator-that-has-state-from-my-job

Alexis Sarda-Espinosa  于2023年2月13日周一 19:56写道:

> Hi Hang,
>
> Thanks for the confirmation. One follow-up question with a somewhat
> convoluted scenario:
>
>1. An unaligned checkpoint is created.
>2. I stop the job *without* savepoint.
>3. I want to start a modified job from the checkpoint, but I changed
>one of the operator's uids.
>
> If the operator whose uid changed had in-flight data as part of the
> checkpoint, it will lose said data after starting, right?
>
> I imagine this is not good practice, but it's just a hypothetical scenario
> I wanted to understand better.
>
> Regards,
> Alexis.
>
>
> Am Mo., 13. Feb. 2023 um 12:33 Uhr schrieb Hang Ruan <
> ruanhang1...@gmail.com>:
>
>> ps: the savepoint will also not contain in-flight data.
>>
>> Best,
>> Hang
>>
>> Hang Ruan  于2023年2月13日周一 19:31写道:
>>
>>> Hi Alexis,
>>>
>>> No, aligned checkpoint will not contain the in-flight. Aligned
>>> checkpoint makes sure that the data before the barrier has been processed
>>> and there is no need to store in-flight data for one checkpoint.
>>>
>>> I think these documents[1][2] will help you to understand it.
>>>
>>>
>>> Best,
>>> Hang
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/
>>> [2]
>>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#checkpointing
>>>
>>> Alexis Sarda-Espinosa  于2023年2月11日周六 06:00写道:
>>>
 Hello,

 One feature of unaligned checkpoints is that the checkpoint barriers
 can overtake in-flight data, so the buffers are persisted as part of the
 state.

 The documentation for savepoints doesn't mention anything explicitly,
 so just to be sure, will savepoints always wait for in-flight data to be
 processed before they are completed, or could they also persist buffers in
 certain situations (e.g. when there's backpressure)?

 Regards,
 Alexis.




Re: Updating Scala package names while preserving state

2023-02-13 Thread Thomas Eckestad
My conclusions. First, I think it would be good to clarify the background. The 
class for which I changed the package/namespace is a POJO class which is part 
of the applications state. According to the official Flink documentation on 
state evolution:

Class name of the POJO type cannot change, including the namespace of the class.

So that is quite clear.

I also investigated using the state-processor-api to create a new Savepoint 
with updated namespace for the class, but that is not trivial, so decided to 
give up on that since the state of the application in question was not 
considered that important.

If using the State-processor-api for handling a namespace change that would 
require both the moved and the original class to be on the java class path when 
running the state processor, to enabled both de-serializing the state (old 
namespace)  and serializing the new state (new namespace). So it could have 
been done that way I guess.

I did not find any other option for migrating the state due to a 
namespace/package name change. Performing text replace with sed does not work.

On 7 Feb 2023, at 12:03, Thomas Eckestad 
mailto:thomas.eckes...@niradynamics.se>> wrote:

Hi,

I would like to change the package name of our Scala code from 
com.company.foo.something to com.company.bar.something while preserving the 
state. How can I make a Savepoint from an application built with 
com.company.foo.something and make that Savepoint compatible with new code 
built from com.company.bar.something?

In a Savepoint directory from one of our Flink jobs. Doing: egrep 
com\.company\.bar produces a lot of hits. Could it be expected to work with 
just using sed to replace the strings? Or are there binary, non-text data, as 
well that needs to be updated?

We are currently on FLink 1.13.6.

Thanks,
Thomas
Thomas Eckestad
Systems Engineer
Development RSI

NIRA Dynamics AB
Wallenbergs gata 4
58330 Link?ping, Sweden
Mobile: +46 701 447 279
thomas.eckes...@niradynamics.se
www.niradynamics.se




Re: Pyflink Side Output Question and/or suggested documentation change

2023-02-13 Thread Andrew Otto
Thank you!

On Mon, Feb 13, 2023 at 5:55 AM Dian Fu  wrote:

> Thanks Andrew, I think this is a valid advice. I will update the
> documentation~
>
> Regards,
> Dian
>
> ,
>
> On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto  wrote:
>
>> Question about side outputs and OutputTags in pyflink.  The docs
>> 
>> say we are supposed to
>>
>> yield output_tag, value
>>
>> Docs then say:
>> > For retrieving the side output stream you use getSideOutput(OutputTag) on
>> the result of the DataStream operation.
>>
>> From this, I'd expect that calling datastream.get_side_output would be
>> optional.   However, it seems that if you do not call
>> datastream.get_side_output, then the main datastream will have the record
>> destined to the output tag still in it, as a Tuple(output_tag, value).
>> This caused me great confusion for a while, as my downstream tasks would
>> break because of the unexpected Tuple type of the record.
>>
>> Here's an example of the failure using side output and ProcessFunction
>> in the word count example.
>> 
>>
>> I'd expect that just yielding an output_tag would make those records be
>> in a different datastream, but apparently this is not the case unless you
>> call get_side_output.
>>
>> If this is the expected behavior, perhaps the docs should be updated to
>> say so?
>>
>> -Andrew Otto
>>  Wikimedia Foundation
>>
>>
>>
>>
>>


Re: Could savepoints contain in-flight data?

2023-02-13 Thread Alexis Sarda-Espinosa
Hi Hang,

Thanks for the confirmation. One follow-up question with a somewhat
convoluted scenario:

   1. An unaligned checkpoint is created.
   2. I stop the job *without* savepoint.
   3. I want to start a modified job from the checkpoint, but I changed one
   of the operator's uids.

If the operator whose uid changed had in-flight data as part of the
checkpoint, it will lose said data after starting, right?

I imagine this is not good practice, but it's just a hypothetical scenario
I wanted to understand better.

Regards,
Alexis.


Am Mo., 13. Feb. 2023 um 12:33 Uhr schrieb Hang Ruan :

> ps: the savepoint will also not contain in-flight data.
>
> Best,
> Hang
>
> Hang Ruan  于2023年2月13日周一 19:31写道:
>
>> Hi Alexis,
>>
>> No, aligned checkpoint will not contain the in-flight. Aligned checkpoint
>> makes sure that the data before the barrier has been processed and there is
>> no need to store in-flight data for one checkpoint.
>>
>> I think these documents[1][2] will help you to understand it.
>>
>>
>> Best,
>> Hang
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/
>> [2]
>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#checkpointing
>>
>> Alexis Sarda-Espinosa  于2023年2月11日周六 06:00写道:
>>
>>> Hello,
>>>
>>> One feature of unaligned checkpoints is that the checkpoint barriers can
>>> overtake in-flight data, so the buffers are persisted as part of the state.
>>>
>>> The documentation for savepoints doesn't mention anything explicitly, so
>>> just to be sure, will savepoints always wait for in-flight data to be
>>> processed before they are completed, or could they also persist buffers in
>>> certain situations (e.g. when there's backpressure)?
>>>
>>> Regards,
>>> Alexis.
>>>
>>>


Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
ps: the savepoint will also not contain in-flight data.

Best,
Hang

Hang Ruan  于2023年2月13日周一 19:31写道:

> Hi Alexis,
>
> No, aligned checkpoint will not contain the in-flight. Aligned checkpoint
> makes sure that the data before the barrier has been processed and there is
> no need to store in-flight data for one checkpoint.
>
> I think these documents[1][2] will help you to understand it.
>
>
> Best,
> Hang
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/
> [2]
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#checkpointing
>
> Alexis Sarda-Espinosa  于2023年2月11日周六 06:00写道:
>
>> Hello,
>>
>> One feature of unaligned checkpoints is that the checkpoint barriers can
>> overtake in-flight data, so the buffers are persisted as part of the state.
>>
>> The documentation for savepoints doesn't mention anything explicitly, so
>> just to be sure, will savepoints always wait for in-flight data to be
>> processed before they are completed, or could they also persist buffers in
>> certain situations (e.g. when there's backpressure)?
>>
>> Regards,
>> Alexis.
>>
>>


Re: Could savepoints contain in-flight data?

2023-02-13 Thread Hang Ruan
Hi Alexis,

No, aligned checkpoint will not contain the in-flight. Aligned checkpoint
makes sure that the data before the barrier has been processed and there is
no need to store in-flight data for one checkpoint.

I think these documents[1][2] will help you to understand it.


Best,
Hang

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/state/checkpointing_under_backpressure/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/#checkpointing

Alexis Sarda-Espinosa  于2023年2月11日周六 06:00写道:

> Hello,
>
> One feature of unaligned checkpoints is that the checkpoint barriers can
> overtake in-flight data, so the buffers are persisted as part of the state.
>
> The documentation for savepoints doesn't mention anything explicitly, so
> just to be sure, will savepoints always wait for in-flight data to be
> processed before they are completed, or could they also persist buffers in
> certain situations (e.g. when there's backpressure)?
>
> Regards,
> Alexis.
>
>


Re: Pyflink Side Output Question and/or suggested documentation change

2023-02-13 Thread Dian Fu
Thanks Andrew, I think this is a valid advice. I will update the
documentation~

Regards,
Dian

,

On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto  wrote:

> Question about side outputs and OutputTags in pyflink.  The docs
> 
> say we are supposed to
>
> yield output_tag, value
>
> Docs then say:
> > For retrieving the side output stream you use getSideOutput(OutputTag) on
> the result of the DataStream operation.
>
> From this, I'd expect that calling datastream.get_side_output would be
> optional.   However, it seems that if you do not call
> datastream.get_side_output, then the main datastream will have the record
> destined to the output tag still in it, as a Tuple(output_tag, value).
> This caused me great confusion for a while, as my downstream tasks would
> break because of the unexpected Tuple type of the record.
>
> Here's an example of the failure using side output and ProcessFunction in
> the word count example.
> 
>
> I'd expect that just yielding an output_tag would make those records be in
> a different datastream, but apparently this is not the case unless you call
> get_side_output.
>
> If this is the expected behavior, perhaps the docs should be updated to
> say so?
>
> -Andrew Otto
>  Wikimedia Foundation
>
>
>
>
>


Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Theodor Wübker
Hey Hector,

thanks for your reply. Your assumption is entirely correct, I have a few 
Million datasets on the topic already to test a streaming use case. I am 
planning on testing it with a variety of settings, but the problems occur with 
any cluster-configuration. For example Parallelism 1 with 1 Taskmanager and 1 
slot. I plan to scale it up to 10 slots and 10 parallelism for testing 
purposes. 

I do not think that any events are kept on hold, since the output always 
contains windows with the latest timestamp (but not enough of them, it should 
be much more). Nevertheless I will try your suggestion.

Maybe my configuration is wrong? The only “out-of-orderness”-related thing I 
have configured is Watermarks, in the way I sent previously. The docs [1] 
mention per-kafka-partition watermarks, perhaps this would help me? Sadly, it 
does not say there, how to activate it.

Best,
Theo

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/#watermark-strategies-and-the-kafka-connector


> On 13. Feb 2023, at 10:42, Hector Rios  wrote:
> 
> Hi Theo
> 
> In your initial email, you mentioned that you have "a bit of Data on it" when 
> referring to your topic with ten partitions. Correct me if I'm wrong, but 
> that sounds like the data in your topic is bounded and trying to test a 
> streaming use-case. What kind of parallelism do you have configured for this 
> job? Is there a configuration to set the number of slots per task manager?
> 
> I've seen varying results based on the amount of parallelism configured on a 
> job. In the end, it usually boils down to the fact that events might be 
> ingested into Flink out of order. If the event time on an event is earlier 
> than the current watermark, then the event might be discarded unless you've 
> configured some level of out-of-orderedness. Even with out-of-orderedness 
> configured, if your data is bounded, you might have events with later event 
> times arriving earlier, which will remain in the state waiting for the 
> watermark to progress. As you can imagine, if there are no more events, then 
> your records are on hold. 
> 
> As a test, after all, your events have been ingested from the topic, try 
> producing one last event with an event time one or 2 hours later than your 
> latest event and see if they show up.
> 
> Hope it helps
> -Hector
> 
> On Mon, Feb 13, 2023 at 8:47 AM Theodor Wübker  > wrote:
> Hey,
> 
> so one more thing, the query looks like this:
> 
> SELECT window_start, window_end, a, b, c, count(*) as x FROM 
> TABLE(TUMBLE(TABLE data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) 
> GROUP BY window_start, window_end, a, b, c
> 
> When the non-determinism occurs, the topic is not keyed at all. When I key it 
> by the attribute “a”, I get the incorrect, but deterministic results. Maybe 
> in the second case, only 1 partition out of the 10 is consumed at once?
> 
> Best,
> Theo
> 
>> On 13. Feb 2023, at 08:15, Theodor Wübker > > wrote
>> 
>> Hey Yuxia,
>> 
>> thanks for your response. I figured too, that the events arrive in a 
>> (somewhat) random order and thus cause non-determinism. I used a Watermark 
>> like this:"timeStampData - INTERVAL '10' SECOND” . Increasing the Watermark 
>> Interval does not solve the problem though, the results are still not 
>> deterministic. Instead I keyed the 10 partition topic. Now results are 
>> deterministic, but they are incorrect (way too few). Am I doing something 
>> fundamentally wrong? I just need the messages to be in somewhat in order 
>> (just so they don’t violate the watermark). 
>> 
>> Best,
>> Theo
>> 
>> (sent again, sorry, I previously only responded to you, not the Mailing list 
>> by accident)
>> 
>>> On 13. Feb 2023, at 08:14, Theodor Wübker >> > wrote:
>>> 
>>> Hey Yuxia,
>>> 
>>> thanks for your response. I figured too, that the events arrive in a 
>>> (somewhat) random order and thus cause non-determinism. I used a Watermark 
>>> like this: "timeStampData - INTERVAL '10' SECOND” . Increasing the 
>>> Watermark Interval does not solve the problem though, the results are still 
>>> not deterministic. Instead I keyed the 10 partition topic. Now results are 
>>> deterministic, but they are incorrect (way too few). Am I doing something 
>>> fundamentally wrong? I just need the messages to be in somewhat in order 
>>> (just so they don’t violate the watermark). 
>>> 
>>> Best,
>>> Theo
>>> 
 On 13. Feb 2023, at 04:23, yuxia >>> > wrote:
 
 HI, Theo.
 I'm wondering what the Event-Time-Windowed Query you are using looks like.
 For example, how do you define the watermark?
 Considering you read records from the 10 partitions, and it may well that 
 the records will arrive the window process operator out of order. 
 Is it possible that the records exceed the 

Re: Non-Determinism in Table-API with Kafka and Event Time

2023-02-13 Thread Hector Rios
Hi Theo

In your initial email, you mentioned that you have "a bit of Data on it"
when referring to your topic with ten partitions. Correct me if I'm wrong,
but that sounds like the data in your topic is bounded and trying to test a
streaming use-case. What kind of parallelism do you have configured for
this job? Is there a configuration to set the number of slots per task
manager?

I've seen varying results based on the amount of parallelism configured on
a job. In the end, it usually boils down to the fact that events might be
ingested into Flink out of order. If the event time on an event is earlier
than the current watermark, then the event might be discarded unless you've
configured some level of out-of-orderedness. Even with out-of-orderedness
configured, if your data is bounded, you might have events with later event
times arriving earlier, which will remain in the state waiting for the
watermark to progress. As you can imagine, if there are no more events,
then your records are on hold.

As a test, after all, your events have been ingested from the topic, try
producing one last event with an event time one or 2 hours later than your
latest event and see if they show up.

Hope it helps
-Hector

On Mon, Feb 13, 2023 at 8:47 AM Theodor Wübker 
wrote:

> Hey,
>
> so one more thing, the query looks like this:
>
> SELECT window_start, window_end, a, b, c, count(*) as x FROM 
> TABLE(TUMBLE(TABLE
> data.v1, DESCRIPTOR(timeStampData), INTERVAL '1' HOUR)) GROUP BY
> window_start, window_end, a, b, c
>
> When the non-determinism occurs, the topic is not keyed at all. When I key
> it by the attribute “a”, I get the incorrect, but deterministic results.
> Maybe in the second case, only 1 partition out of the 10 is consumed at
> once?
>
> Best,
> Theo
>
> On 13. Feb 2023, at 08:15, Theodor Wübker 
> wrote
>
> Hey Yuxia,
>
> thanks for your response. I figured too, that the events arrive in a
> (somewhat) random order and thus cause non-determinism. I used a
> Watermark like this:"timeStampData - INTERVAL '10' SECOND*”* . Increasing
> the Watermark Interval does not solve the problem though, the results are
> still not deterministic. Instead I keyed the 10 partition topic. Now
> results are deterministic, but they are incorrect (way too few). Am I doing
> something fundamentally wrong? I just need the messages to be in somewhat
> in order (just so they don’t violate the watermark).
>
> Best,
> Theo
>
> (sent again, sorry, I previously only responded to you, not the Mailing
> list by accident)
>
> On 13. Feb 2023, at 08:14, Theodor Wübker 
> wrote:
>
> Hey Yuxia,
>
> thanks for your response. I figured too, that the events arrive in a
> (somewhat) random order and thus cause non-determinism. I used a
> Watermark like this: "timeStampData - INTERVAL '10' SECOND*”* .
> Increasing the Watermark Interval does not solve the problem though, the
> results are still not deterministic. Instead I keyed the 10 partition
> topic. Now results are deterministic, but they are incorrect (way too few).
> Am I doing something fundamentally wrong? I just need the messages to be in
> somewhat in order (just so they don’t violate the watermark).
>
> Best,
> Theo
>
> On 13. Feb 2023, at 04:23, yuxia  wrote:
>
> HI, Theo.
> I'm wondering what the Event-Time-Windowed Query you are using looks like.
> For example, how do you define the watermark?
> Considering you read records from the 10 partitions, and it may well that
> the records will arrive the window process operator out of order.
> Is it possible that the records exceed the watermark, but there're still
> some records will arrive?
>
> If that's the case, every time, the records used to calculate result may
> well different and then result in non-determinism result.
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Theodor Wübker" 
> 收件人: "User" 
> 发送时间: 星期日, 2023年 2 月 12日 下午 4:25:45
> 主题: Non-Determinism in Table-API with Kafka and Event Time
>
> Hey everyone,
>
> I experience non-determinism in my Table API Program at the moment and (as
> a relatively unexperienced Flink and Kafka user) I can’t really explain to
> myself why it happens. So, I have a topic with 10 Partitions and a bit of
> Data on it. Now I run a simple SELECT * query on this, that moves some
> attributes around and writes everything on another topic with 10
> partitions. Then, on this topic I run a Event-Time-Windowed Query. Now I
> experience Non-Determinism: The results of the windowed query differ with
> every execution.
> I thought this might be, because the SELECT query wrote the data to the
> partitioned topic without keys. So I tried it again with the same key I
> used for the original topic. It resulted in the exact same topic structure.
> Now when I run the Event-Time-Windowed query, I get incorrect results (too
> few result-entries).
>
> I have already read a lot of the Docs on this and can’t seem to figure it
> out. I would much appreciate, if someone could shed a bit of light on this.
> Is 

运行中的作业状态清除操作

2023-02-13 Thread Jason_H
遇到的问题:在flink中,使用状态的地方已经设置了ttl, 但是上下游把数据清空了,然后想重新测试发现, 
flink计算的结果不是从0开始累计的,是基于之前的结果继续累计的,这种情况在不重启作业的情况下 有什么办法处理吗?


具体来说就是,在不停止flink作业的情况下,怎么清楚运行中的flink作业的状态数据,以达到计算结果归0开始累计。


在不重启作业的情况下,清空状态数据是不是和重启作业运行是一样的效果,避免状态累计数据对结果产生影响呢。
| |
Jason_H
|
|
hyb_he...@163.com
|

How to use Postgres UUID types in Flink SQL?

2023-02-13 Thread Frank Lyaruu
HI Flink community, I'm trying to use a JDBC dataset from Postgres. That
dataset contains a column of the postgres type UUID.
I don't know how to 'type' this column in SQL.

If I try:
CREATE TABLE my_table (id: STRING) WITH ()

I get this error:
java.lang.ClassCastException: class java.util.UUID cannot be cast to class
java.lang.String (java.util.UUID)

So it isn't STRING, but it does seem to have recognized the UUID type and
created a java.util.UUID instance
I see a merged-and-reverted ticket about this here:
https://issues.apache.org/jira/browse/FLINK-19869

It mentions using BinaryRawValueData for now, but can someone help me
figure out what that would look like in Flink SQL?

regards Frank