Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 文章 Jing Ge
Thanks everyone for your effort!

Best regards,
Jing

On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser 
wrote:

> Thanks everyone for joining! It's good to see so many have joined in such
> a short time already. I've just refreshed the link which you can always
> find on the project website [1]
>
> Best regards, Martijn
>
> [1] https://flink.apache.org/community.html#slack
>
> Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :
>
>> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>>
>> Best,
>> Jingsong
>>
>>
>> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>>
>>> Thank Xingtong for making this possible!
>>>
>>> Cheers,
>>> Jark Wu
>>>
>>> On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > I'm very happy to announce that the Apache Flink community has created
>>> a
>>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
>>> >
>>> > ## Join the Slack workspace
>>> >
>>> > You can join the Slack workspace by either of the following two ways:
>>> > 1. Click the invitation link posted on the project website [2].
>>> > 2. Ask anyone who already joined the Slack workspace to invite you.
>>> >
>>> > We recommend 2), if available. Due to Slack limitations, the invitation
>>> > link in 1) expires and needs manual updates after every 100 invites.
>>> If it
>>> > is expired, please reach out to the dev / user mailing lists.
>>> >
>>> > ## Community rules
>>> >
>>> > When using the community Slack workspace, please follow these community
>>> > rules:
>>> > * *Be respectful* - This is the most important rule!
>>> > * All important decisions and conclusions *must be reflected back to
>>> the
>>> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
>>> happen."
>>> > - The Apache Mottos [3]
>>> > * Use *Slack threads* to keep parallel conversations from overwhelming
>>> a
>>> > channel.
>>> > * Please *do not direct message* people for troubleshooting, Jira
>>> assigning
>>> > and PR review. These should be picked-up voluntarily.
>>> >
>>> >
>>> > ## Maintenance
>>> >
>>> >
>>> > Committers can refer to this wiki page [4] for information needed for
>>> > maintaining the Slack workspace.
>>> >
>>> >
>>> > Thanks Jark, Martijn and Robert for helping setting up the Slack
>>> workspace.
>>> >
>>> >
>>> > Best,
>>> >
>>> > Xintong
>>> >
>>> >
>>> > [1] https://apache-flink.slack.com/
>>> >
>>> > [2] https://flink.apache.org/community.html#slack
>>> >
>>> > [3] http://theapacheway.com/on-list/
>>> >
>>> > [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>>> >
>>>
>>


Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 文章 Martijn Visser
Thanks everyone for joining! It's good to see so many have joined in such a
short time already. I've just refreshed the link which you can always find
on the project website [1]

Best regards, Martijn

[1] https://flink.apache.org/community.html#slack

Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :

> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>
> Best,
> Jingsong
>
>
> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>
>> Thank Xingtong for making this possible!
>>
>> Cheers,
>> Jark Wu
>>
>> On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:
>>
>> > Hi everyone,
>> >
>> > I'm very happy to announce that the Apache Flink community has created a
>> > dedicated Slack workspace [1]. Welcome to join us on Slack.
>> >
>> > ## Join the Slack workspace
>> >
>> > You can join the Slack workspace by either of the following two ways:
>> > 1. Click the invitation link posted on the project website [2].
>> > 2. Ask anyone who already joined the Slack workspace to invite you.
>> >
>> > We recommend 2), if available. Due to Slack limitations, the invitation
>> > link in 1) expires and needs manual updates after every 100 invites. If
>> it
>> > is expired, please reach out to the dev / user mailing lists.
>> >
>> > ## Community rules
>> >
>> > When using the community Slack workspace, please follow these community
>> > rules:
>> > * *Be respectful* - This is the most important rule!
>> > * All important decisions and conclusions *must be reflected back to the
>> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
>> happen."
>> > - The Apache Mottos [3]
>> > * Use *Slack threads* to keep parallel conversations from overwhelming a
>> > channel.
>> > * Please *do not direct message* people for troubleshooting, Jira
>> assigning
>> > and PR review. These should be picked-up voluntarily.
>> >
>> >
>> > ## Maintenance
>> >
>> >
>> > Committers can refer to this wiki page [4] for information needed for
>> > maintaining the Slack workspace.
>> >
>> >
>> > Thanks Jark, Martijn and Robert for helping setting up the Slack
>> workspace.
>> >
>> >
>> > Best,
>> >
>> > Xintong
>> >
>> >
>> > [1] https://apache-flink.slack.com/
>> >
>> > [2] https://flink.apache.org/community.html#slack
>> >
>> > [3] http://theapacheway.com/on-list/
>> >
>> > [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>> >
>>
>


Re:Re: Re: [Internet]Re: Re: Some question with Flink state

2022-06-02 文章 Xuyang
Hi, 理论上来说这句话是不是有问题?


> “是因为如果使用value-state,一个task会存在多个key,不同的key的内容会进行替换”


因为ValueState也是keyedState的一种,所以也是每个key各自维护一个valuestate,不同的key之间是隔离的。
其实一般情况下ValueState里面存Map,和直接MapState没啥区别,只不过在不同的状态存储上和状态的TTL策略有略微不同,所以不太推荐ValueState里面存Map。
所以其实还是看具体的业务场景,假如只是算一个累加的值的话,用valuestate就够了。




--

Best!
Xuyang





在 2022-05-25 13:38:52,"lxk7...@163.com"  写道:
>
>刚看了下keygroup的原理,前面的内容大致能理解了,对于下面这段话
>"map-state的话相当于某些固定的key group里面的key都可以通过map-state的user-key去分别存储"
>我理解   
>是因为如果使用value-state,一个task会存在多个key,不同的key的内容会进行替换,而使用map的话,就算同一个task有多个key,根据用户自定义的key还是可以匹配到的。
>这样的话,大部分场景其实都适合使用map-state。
>
>
>lxk7...@163.com
> 
>From: jurluo(罗凯)
>Date: 2022-05-25 11:05
>To: user-zh@flink.apache.org
>Subject: Re: [Internet]Re: Re: Some question with Flink state
>老哥,看起来好像没什么问题,相同的key都分配在了同个task,每个task会存在多种key是正常的。key会按最大并行度分成多个key 
>group,然后固定的key 
>group分配到各个task里。只能保证相同的key会到同一个task,不能保证一个task只有一个key。你这个需求用map-state才合适。map-state的话相当于某些固定的key
> group里面的key都可以通过map-state的user-key去分别存储。
> 
>> 2022年5月25日 上午10:45,lxk7...@163.com 写道:
>> 
>> 图片好像又挂了  我重发下
>> hello,我这边测试了一下发现一个问题,在使用String类型做keyby的时候并没有得到正确的结果,而使用int类型的时候结果是正确。而且测试发现两次keyby确实是以第二次keyby为准
>> 
>> 
>> 
>>下面是我的代码及测试结果
>> 
>> 
>> 
>> 一.使用int类型
>> 
>> 
>> 
>>public class KeyByTest {
>> 
>> 
>> 
>> public static void main(String[] args) throws Exception {
>> 
>> 
>> 
>> StreamExecutionEnvironment env = 
>> StreamExecutionEnvironment.getExecutionEnvironment();
>> 
>> 
>> 
>> env.setParallelism(10);
>> 
>> 
>> 
>> 
>> 
>> DataStreamSource dataDataStreamSource = 
>> env.fromCollection(Arrays.asList(new data(1, "123", "首页"),
>> 
>> 
>> 
>> new data(1, "123", "分类页"),
>> 
>> 
>> 
>> new data(2, "r-123", "搜索结果页"),
>> 
>> 
>> 
>> new data(1, "r-123", "我的页"),
>> 
>> 
>> 
>> new data(3, "r-4567", "搜索结果页")));
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> SingleOutputStreamOperator map = 
>> dataDataStreamSource.keyBy(new MyKeySelector())
>> 
>> 
>> 
>> .map(new RichMapFunction() {
>> 
>> 
>> 
>> 
>> 
>> @Override
>> 
>> 
>> 
>> public String map(data data) throws Exception {
>> 
>> 
>> 
>> System.out.println(data.toString() + "的subtask为:" + 
>> getRuntimeContext().getIndexOfThisSubtask() );
>> 
>> 
>> 
>> return data.toString();
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> });
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> env.execute("test");
>> 
>> 
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> class data{
>> 
>> 
>> 
>> private int id;
>> 
>> 
>> 
>> private String goods;
>> 
>> 
>> 
>> private String pageName;
>> 
>> 
>> 
>> 
>> 
>> public data(int id, String goods, String pageName) {
>> 
>> 
>> 
>> this.id = id;
>> 
>> 
>> 
>> this.goods = goods;
>> 
>> 
>> 
>> this.pageName = pageName;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> public data() {
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public int getId() {
>> 
>> 
>> 
>> return id;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public void setId(int id) {
>> 
>> 
>> 
>> this.id = id;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public String getGoods() {
>> 
>> 
>> 
>> return goods;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public void setGoods(String goods) {
>> 
>> 
>> 
>> this.goods = goods;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public String getPageName() {
>> 
>> 
>> 
>> return pageName;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> public void setPageName(String pageName) {
>> 
>> 
>> 
>> this.pageName = pageName;
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> @Override
>> 
>> 
>> 
>> public String toString() {
>> 
>> 
>> 
>> return "data{" +
>> 
>> 
>> 
>> "id='" + id + '\'' +
>> 
>> 
>> 
>> ", goods='" + goods + '\'' +
>> 
>> 
>> 
>> ", pageName='" + pageName + '\'' +
>> 
>> 
>> 
>> '}';
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> class MyKeySelector implements KeySelector{
>> 
>> 
>> 
>> 
>> 
>> @Override
>> 
>> 
>> 
>> public Integer getKey(data data) throws Exception {
>> 
>> 
>> 
>> return data.getId();
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> }
>> 
>> 
>> 
>> 控制台的输出如下:
>> https://s2.loli.net/2022/05/25/mxtZu9YAPN2FD1a.png
>> 
>> 
>> 
>> 可以看见数据根据id分组,分到了不同的subtask上。
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 二.使用String类型  代码如下:
>> 
>> 
>> 
>> public class KeyByTest {
>> 
>> 
>> 
>> public static void main(String[] args) throws Exception {
>> 
>> 
>> 
>> StreamExecutionEnvironment env = 
>> StreamExecutionEnvironment.getExecutionEnvironment();
>> 
>> 
>> 
>> env.setParallelism(10);
>> 
>> 
>> 
>> 
>> 
>> DataStreamSource dataDataStreamSource = 
>> env.fromCollection(Arrays.asList(new data("1", "123", "首页"),
>> 
>> 
>> 
>>

Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 文章 Jingsong Li
Thanks Xingtong, Jark, Martijn and Robert for making this possible!

Best,
Jingsong


On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:

> Thank Xingtong for making this possible!
>
> Cheers,
> Jark Wu
>
> On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:
>
> > Hi everyone,
> >
> > I'm very happy to announce that the Apache Flink community has created a
> > dedicated Slack workspace [1]. Welcome to join us on Slack.
> >
> > ## Join the Slack workspace
> >
> > You can join the Slack workspace by either of the following two ways:
> > 1. Click the invitation link posted on the project website [2].
> > 2. Ask anyone who already joined the Slack workspace to invite you.
> >
> > We recommend 2), if available. Due to Slack limitations, the invitation
> > link in 1) expires and needs manual updates after every 100 invites. If
> it
> > is expired, please reach out to the dev / user mailing lists.
> >
> > ## Community rules
> >
> > When using the community Slack workspace, please follow these community
> > rules:
> > * *Be respectful* - This is the most important rule!
> > * All important decisions and conclusions *must be reflected back to the
> > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
> happen."
> > - The Apache Mottos [3]
> > * Use *Slack threads* to keep parallel conversations from overwhelming a
> > channel.
> > * Please *do not direct message* people for troubleshooting, Jira
> assigning
> > and PR review. These should be picked-up voluntarily.
> >
> >
> > ## Maintenance
> >
> >
> > Committers can refer to this wiki page [4] for information needed for
> > maintaining the Slack workspace.
> >
> >
> > Thanks Jark, Martijn and Robert for helping setting up the Slack
> workspace.
> >
> >
> > Best,
> >
> > Xintong
> >
> >
> > [1] https://apache-flink.slack.com/
> >
> > [2] https://flink.apache.org/community.html#slack
> >
> > [3] http://theapacheway.com/on-list/
> >
> > [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
> >
>


Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 文章 Jark Wu
Thank Xingtong for making this possible!

Cheers,
Jark Wu

On Thu, 2 Jun 2022 at 15:31, Xintong Song  wrote:

> Hi everyone,
>
> I'm very happy to announce that the Apache Flink community has created a
> dedicated Slack workspace [1]. Welcome to join us on Slack.
>
> ## Join the Slack workspace
>
> You can join the Slack workspace by either of the following two ways:
> 1. Click the invitation link posted on the project website [2].
> 2. Ask anyone who already joined the Slack workspace to invite you.
>
> We recommend 2), if available. Due to Slack limitations, the invitation
> link in 1) expires and needs manual updates after every 100 invites. If it
> is expired, please reach out to the dev / user mailing lists.
>
> ## Community rules
>
> When using the community Slack workspace, please follow these community
> rules:
> * *Be respectful* - This is the most important rule!
> * All important decisions and conclusions *must be reflected back to the
> mailing lists*. "If it didn’t happen on a mailing list, it didn’t happen."
> - The Apache Mottos [3]
> * Use *Slack threads* to keep parallel conversations from overwhelming a
> channel.
> * Please *do not direct message* people for troubleshooting, Jira assigning
> and PR review. These should be picked-up voluntarily.
>
>
> ## Maintenance
>
>
> Committers can refer to this wiki page [4] for information needed for
> maintaining the Slack workspace.
>
>
> Thanks Jark, Martijn and Robert for helping setting up the Slack workspace.
>
>
> Best,
>
> Xintong
>
>
> [1] https://apache-flink.slack.com/
>
> [2] https://flink.apache.org/community.html#slack
>
> [3] http://theapacheway.com/on-list/
>
> [4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
>


[ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-02 文章 Xintong Song
Hi everyone,

I'm very happy to announce that the Apache Flink community has created a
dedicated Slack workspace [1]. Welcome to join us on Slack.

## Join the Slack workspace

You can join the Slack workspace by either of the following two ways:
1. Click the invitation link posted on the project website [2].
2. Ask anyone who already joined the Slack workspace to invite you.

We recommend 2), if available. Due to Slack limitations, the invitation
link in 1) expires and needs manual updates after every 100 invites. If it
is expired, please reach out to the dev / user mailing lists.

## Community rules

When using the community Slack workspace, please follow these community
rules:
* *Be respectful* - This is the most important rule!
* All important decisions and conclusions *must be reflected back to the
mailing lists*. "If it didn’t happen on a mailing list, it didn’t happen."
- The Apache Mottos [3]
* Use *Slack threads* to keep parallel conversations from overwhelming a
channel.
* Please *do not direct message* people for troubleshooting, Jira assigning
and PR review. These should be picked-up voluntarily.


## Maintenance


Committers can refer to this wiki page [4] for information needed for
maintaining the Slack workspace.


Thanks Jark, Martijn and Robert for helping setting up the Slack workspace.


Best,

Xintong


[1] https://apache-flink.slack.com/

[2] https://flink.apache.org/community.html#slack

[3] http://theapacheway.com/on-list/

[4] https://cwiki.apache.org/confluence/display/FLINK/Slack+Management


Re: Re: Flink写入CK数据丢失问题

2022-06-02 文章 lxk7...@163.com

我们目前程序整体都是正常的,没有发生过报错,checkpoint是有开启的。
今天查阅了一下相关资料,发现flink已有的issue跟我这个有点像[FLINK-23721] Flink SQL state TTL has no 
effect when using non-incremental RocksDBStateBackend - ASF JIRA 
(apache.org),主要原因是我在sql里用了group by,设置了ttl,但是ttl在rocksdb状态后端不生效,所以导致管理内存使用率占满。
目前我的解决方案是使用fsstatebackend,现在观察下来管理内存没有任何问题,我会继续关注整体的数据量差异。


lxk7...@163.com
 
发件人: yue ma
发送时间: 2022-06-02 15:05
收件人: user-zh
主题: Re: Flink写入CK数据丢失问题
你好,你可以先看看你们的任务是否开启了 checkpoint  ,以及任务运行的过程中是否发生了 failover
 
lxk  于2022年6月2日周四 11:38写道:
 
> 各位,请教个问题
> 目前使用flink往ck写入数据,使用的是datastream
> api以及rocksdb状态后端,程序中了开了两个窗口,都是10秒级别。同时还使用了sql进行group by
> 求和,求和的操作没有加窗口,同时streamtableenv 设置了状态生存时间为10s.
> 在跟离线端对比数据的时候发现,一段时间内的数据跟离线差异不大,从0点-17点(数据的事件时间),但是18点(事件时间)以后的数据实时端差异量特别大。
> 目前在webui上发现整个管理内存的使用率已经占满,不知道是否跟这个有关系。
>
> 还有一点现象是,今天的数据我们对比了ck上实时的表(正确的),总体数据量还是要小很多。但是当我从零点重新消费,目前来看今天的数据能够对上,不知道是否是因为程序运行一段时间后,整个管理内存都被占满了,从而导致原本缓存的数据丢失了。
> 以下是相应的算子链和整个tm内存情况。出现反压是因为从今天0点重新开始消费了。
>
>
>
>


Re: Flink写入CK数据丢失问题

2022-06-02 文章 yue ma
你好,你可以先看看你们的任务是否开启了 checkpoint  ,以及任务运行的过程中是否发生了 failover

lxk  于2022年6月2日周四 11:38写道:

> 各位,请教个问题
> 目前使用flink往ck写入数据,使用的是datastream
> api以及rocksdb状态后端,程序中了开了两个窗口,都是10秒级别。同时还使用了sql进行group by
> 求和,求和的操作没有加窗口,同时streamtableenv 设置了状态生存时间为10s.
> 在跟离线端对比数据的时候发现,一段时间内的数据跟离线差异不大,从0点-17点(数据的事件时间),但是18点(事件时间)以后的数据实时端差异量特别大。
> 目前在webui上发现整个管理内存的使用率已经占满,不知道是否跟这个有关系。
>
> 还有一点现象是,今天的数据我们对比了ck上实时的表(正确的),总体数据量还是要小很多。但是当我从零点重新消费,目前来看今天的数据能够对上,不知道是否是因为程序运行一段时间后,整个管理内存都被占满了,从而导致原本缓存的数据丢失了。
> 以下是相应的算子链和整个tm内存情况。出现反压是因为从今天0点重新开始消费了。
>
>
>
>