Dear list,
there have been some discussions and activities in the last months about
a Scala free runtime which should make it possible to use newer Scala
version (>= 2.13 / 3.x) on the application side.
Stephan Ewen announced the implementation is on the way [1] and Martijn
Vissr mentioned
Hi Till,
Thanks for your feedback.
>>> How will our tests be affected by these changes? Will Flink require
more resources and, thus, will it risk destabilizing our testing
infrastructure?
There are some tests that need to be adjusted, for example,
BlockingShuffleITCase. For other tests,
Hi Joern,
Very thanks for sharing the cases! Could you also share a bit more on the
detailed scenarios~?
Best,
Yun
--Original Mail --
Sender:Joern Kottmann
Send Date:Fri Dec 3 16:43:38 2021
Recipients:Yun Gao
CC:vtygoss , Alexander Preuß ,
With regards to the Java APIs, you will definitely be able to use the
Java DataSet/DataStream APIs from Scala without any restrictions imposed
by Flink. This is already working with the current SNAPSHOT version.
As we speak we are also working to achieve the same for the Table API;
we expect
perfect. Thanks. That is what I imagined.
On Mon, Dec 6, 2021 at 2:04 AM Hang Ruan wrote:
> Hi,
>
> 1. Yes, the kafka source will use the Kafka committed offset for the group
> id to start the job.
>
> 2. No, the auto.offset.reset
>
Event ordering in Flink is only maintained between pairs of events that
take exactly the same path through the execution graph. So if you
have multiple instances of A (let's call them A1 and A2), each broadcasting
a partition of the total rule space, then one instance of B (B1) might
receive rule1
Hi Everyone,
as outlined in FLIP-194 discussion [1], for the future directions of Flink
HA services, I'd like to verify my thoughts around guarantees of the
distributed filesystems used with Flink.
Currently some of the services (*JobGraphStore*, *CompletedCheckpointStore*)
are implemented using
ping @users; any input on how this would affect you is highly appreciated.
On 25/11/2021 22:39, Chesnay Schepler wrote:
I included the user ML in the thread.
@users Are you still using Zookeeper 3.4? If so, were you planning to
upgrade Zookeeper in the near future?
I'm not sure about ZK
Hi community,
I'm currently converting a DataStream of Avro SpecificRecord type into
Table using the following method:
public static Table
toTable(StreamTableEnvironment tEnv,
DataStream dataStream,
Could someone please help me understand the implications of the upgrade?
As far as I understood this upgrade would only affect users that have a
zookeeper shared across multiple services, some of which require ZK 3.4-? A
workaround for those users would be to run two ZKs with different versions,
Hi Yunfeng,
it seems this is a deeper issue with the fromValues implementation.
Under the hood, it still uses the deprecated InputFormat stack. And as
far as I can see, there we don't emit a final MAX_WATERMARK. I will
definitely forward this.
But toDataStream forwards watermarks correctly.
Hello again,
We recently upgraded from Flink 1.12.3 to 1.14.0 and we were hoping it would
solve our issue with checkpointing with finished data sources. We need the
checkpointing to work to trigger Flink's GenericWriteAheadSink class.
Firstly, the constant mentioned on FLIP-147 that enables
Hi,
Sorry to hear it's hard to find the option. It is part of the 1.14
release[1]. It is also documented how to enable it[2]. Happy to hear how
we can improve the situation here.
As for the exception. Are you seeing this exception occur repeatedly for
the same task? I can imagine a situation
Current users of ZK 3.4 and below would need to upgrade their Zookeeper
installation that is used by Flink to 3.5+.
Whether K8s users are affected depends on whether they use ZK or not. If
they do, see above, otherwise they are not affected at all.
On 06/12/2021 18:49, Arvid Heise wrote:
Oh my bad, it must be Statefun then. I remember I needed to play around
with that for _some_ build.
On Wed, Dec 1, 2021 at 7:48 PM Chesnay Schepler wrote:
> Flink can be built with Java 11 since 1.10. If I recall correctly we
> solved the tools.jar issue, which Hadoop depends on, by excluding
Hi,
In my project, we are trying to configure the "Incremental checkpointing"
with RocksDB in the backend.
We are using Flink 1.11 version and RockDB with AWS : S3 backend
Issue:
--
In my pipeline, my window size is 5 mins and the incremental checkpointing
is happening for every 2 mins.
I
FYI:
We(Alibaba) are widely using ZooKeeper 3.5.5 for all the YARN and some K8s
Flink high-available applications.
Best,
Yang
Chesnay Schepler 于2021年12月7日周二 上午2:22写道:
> Current users of ZK 3.4 and below would need to upgrade their Zookeeper
> installation that is used by Flink to 3.5+.
>
>
Hi!
the checkpointing size is not going beyond 300 MB
Is 300MB the total size of checkpoint or the incremental size of
checkpoint? If it is the latter one, Flink will only store necessary
information (for example the keys and the fields that are selected) in
checkpoint and it is compressed, so
When should I prepare for upgrading ZK to 3.5 or newer?
We're operating a Hadoop cluster w/ ZK 3.4.6 for running only Flink jobs.
Just hope that the rolling update is not that painful - any advice on this?
Best,
Dongwon
On Tue, Dec 7, 2021 at 3:22 AM Chesnay Schepler wrote:
> Current users of
Hi Timo,
Thanks for this information. Since it is confirmed that toDataStream is
functioning correctly and that I can avoid this problem by not using
fromValues in my implementation, I think I have got enough information for
my current work and don't need to rediscuss fromDatastream's behavior.
Thank you David
From: David Anderson
Sent: Monday, December 6, 2021 1:36:20 AM
To: Alexey Trenikhun
Cc: Flink User Mail List
Subject: Re: Order of events in Broadcast State
Event ordering in Flink is only maintained between pairs of events that take
exactly
Hey Timo and Flink community,
I wonder if there is a fix for this issue. The last time I rollbacked to
version 12 of Flink and downgraded Ververica.
I am really keen to leverage the new features on the latest versions of
Ververica 2.5+ , i have tried a myriad of tricks suggested ( example :
Hi Ilan,
I think so, using CLI instead of REST API should solve this, as the user
code execution would be pulled out to a separate JVM. If you're going to
try that, it would be great to hear back whether it has solved your issue.
As for 1.13.4, there is currently no on-going effort / concrete
Hi Kevin,
Flink comes with two schedulers for streaming:
- Default
- Adaptive (opt-in)
Adaptive is still in experimental phase and doesn't support local recover.
You're most likely using the first one, so you should be OK.
Can you elaborate on this a bit? We aren't changing the parallelism when
Hi Till,
Thanks for your feedback.
>>> How will our tests be affected by these changes? Will Flink require
more resources and, thus, will it risk destabilizing our testing
infrastructure?
There are some tests that need to be adjusted, for example,
BlockingShuffleITCase. For other tests,
我觉得你可以尝试一下ship本地的hadoop conf,然后设置HADOOP_CONF_DIR环境变量的方式
-yt /path/of/my-hadoop-conf
-yD containerized.master.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf'
-yD containerized.taskmanager.env.HADOOP_CONF_DIR='$PWD/my-hadoop-conf'
Best,
Yang
chenqizhu 于2021年11月30日周二 上午10:00写道:
> all,您好:
>
>
flink cdc支持mysql整库同步进hudi湖吗?如果支持的话,希望能给一个例子,还要求能够支持schema变更。谢谢!
如果是两套oss或s3
bucket(每个bucket对应一组accessKey/secret)要怎么配置呢?例如写数据到bucketA,但checkpoint在bucketB
在 2021-12-06 18:59:46,"Yang Wang" 写道:
>我觉得你可以尝试一下ship本地的hadoop conf,然后设置HADOOP_CONF_DIR环境变量的方式
>
>-yt /path/of/my-hadoop-conf
>-yD
deal all:
目前在看table api 中,自定义的异步 join 方法 AsyncTableFunction#eval 方法时,发现接口提供的是:
public void eval(CompletableFuture>
future,Object... keys) {...}
目前遇到两个问题:
1. 直接用 futrue.complate(rowdata) 传递数据,只是实现了发送是异步,异步join 得自己实现,这个理解对吗?
2. 像join hbase
支持,例子参考hudi官网
chengyanan1...@foxmail.com
发件人: casel.chen
发送时间: 2021-12-06 23:55
收件人: user-zh@flink.apache.org
主题: flink cdc支持mysql整库同步进hudi湖吗?
flink cdc支持mysql整库同步进hudi湖吗?如果支持的话,希望能给一个例子,还要求能够支持schema变更。谢谢!
deal all?? table api join
AsyncTableFunction#eval ?? public void
eval(CompletableFuture
Hi!
1. 直接用 futrue.complate(rowdata) 传递数据,只是实现了发送是异步,异步join 得自己实现,这个理解对吗?
正确。例如 HBaseRowDataAsyncLookupFunction 里就调用了 hbase 提供的 table#get 异步方法实现异步查询。
2. 像join hbase 里面通过线程池实现了join异步,是无法保证顺序,并没有看到任何保证顺序的操作,还是有其他逻辑保证顺序吗?
Async operator 就是为了提高无需保序的操作(例如很多 etl 查维表就不关心顺序)的效率才引入的,如果对顺序有强需求就不能用
Hi!
1. 直接用 futrue.complate(rowdata) 传递数据,只是实现了发送是异步,异步join 得自己实现,这个理解对吗?
正确。例如 HBaseRowDataAsyncLookupFunction 里就调用了 hbase 提供的 table#get 异步方法实现异步查询。
2. 像join hbase 里面通过线程池实现了join异步,是无法保证顺序,并没有看到任何保证顺序的操作,还是有其他逻辑保证顺序吗?
Async operator 就是为了提高无需保序的操作(例如很多 etl 查维表就不关心顺序)的效率才引入的,如果对顺序有强需求就不能用
好的,谢谢,我这边尝试下异步保证顺序,我们这边有些场景
在 2021-12-07 14:17:51,"Caizhi Weng" 写道:
>Hi!
>
>1. 直接用 futrue.complate(rowdata) 传递数据,只是实现了发送是异步,异步join 得自己实现,这个理解对吗?
>
>
>正确。例如 HBaseRowDataAsyncLookupFunction 里就调用了 hbase 提供的 table#get 异步方法实现异步查询。
>
>2. 像join hbase 里面通过线程池实现了join异步,是无法保证顺序,并没有看到任何保证顺序的操作,还是有其他逻辑保证顺序吗?
MySQL CDC connector
支持并发读取的,读取过程也不会用锁,600万的数据量很小了,百亿级的分库分表我们和社区用户测试下都是ok的,你可以自己试试。
祝好,
Leonard
> 2021年12月6日 下午3:54,张阳 <705503...@qq.com.INVALID> 写道:
>
> 因为数据量有600w 所以担心初始化时间太长 或者性能问题
>
>
>
>
> --原始邮件--
> 发件人:
退订
™薇维苿尉℃ 于2021年12月6日 周一16:20写道:
> 退订
39 matches
Mail list logo