date:20200109

谢谢，同定位到这个地方

已改成

```

Boolean existed = uniqMark.value();
// 如果是重复出现则被过滤掉
if (null != existed && existed) {
  return null;
}
uniqMark.update(true);

```

问题解决

Yun Tang  于2020年1月10日周五 上午3:05写道：

> Hi Kevin
>
> 帮你看了下代码，问题定位到了。
> 容易发生的原因在于你的可见性配置成了 
> ReturnExpiredIfNotCleanedUp，而Flink会先发现数据expire，然后触发clear操作，之后再返回expire的数据
> [1]。建议修改一下第39行的判断条件，每次取数据时都做是否为null的判断。
>
>
> [1]
> https://github.com/apache/flink/blob/dba5b9e0138b667c3ecd32f7b16645d531477720/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/AbstractTtlDecorator.java#L96
>
> 祝好
> 唐云
>
> --
> *From:* Kevin Liao 
> *Sent:* Friday, January 10, 2020 1:08
> *To:* Yun Tang 
> *Cc:* user-zh@flink.apache.org 
> *Subject:* Re: flink遇到 valueState 自身的 NPE
>
> 谢答，首先贴的代码确实是运行的程序
>
> 此外刚刚又通过打印 log 确认了 uniqMark == null 是 false
>
> 我现在的怀疑点是这个地方
>
> if (null != uniqMark.value() && uniqMark.value()) {
>
> 第一处uniqMark.value()可以取到结果（不为 null），同时由于 ttl 策略会触发
> clear，进而导致第二个uniqMark.value()取出来的就是 null
> 了（看来编译器没有对这种写法做优化，就是真实的执行了两次value()函数），追了下代码似乎可以支持我的猜测，所以准备明天验证下这个猜测
>
> 明天有消息同步，谢谢
>
> Yun Tang  于2020年1月10日周五 上午12:59写道：
>
> > Hi Kevin
> >
> > State TTL 是清理的state中的数据条目entry，不是清理state在map函数中的对象本身。所以无论如何，作为value
> > state对象的uniqMark 是不会因为TTL而变成null的。
> >
> >
> >
> 我注意到你的作业即使发生failover之后，立刻恢复的时候，仍然遇到了这个NPE问题，我怀疑你实际运行的代码第39行并不是你贴出来的代码，很有可能是对应你的代码的第34行，也就是map方法的输入RDLog是null，这也符合作业成功restore之后，又再次立即遇到failover的场景，也就是处理到了非法“脏数据”，导致作业不断failover。建议你按照这个思路确认排除一下。
> >
> > 祝好
> > 唐云
> > --
> > *From:* Kevin Liao 
> > *Sent:* Thursday, January 9, 2020 23:17
> > *To:* user-zh@flink.apache.org 
> > *Subject:* Re: flink遇到 valueState 自身的 NPE
> >
> > 谢答
> > 但还有一问题想请教：
> >
> > 当 valueState 触发 ttl 被回收后，这里的引用应该也会被 gc 掉，就会变成 null 了吧？难道是 operator 原来处理这个
> > key 的线程会被一并回收掉，下次这个 key 再来时其实已经是另外新生成的线程提供服务了（这样肯定要重新调用一次 open
> > 方法）？看了代码但还没搞明白，求解惑，谢谢
> >
> > Benchao Li  于2020年1月9日周四 下午8:59写道：
> >
> > > 我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
> > > 除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
> > > 但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。
> > >
> > > Kevin Liao  于2020年1月9日周四 下午8:15写道：
> > >
> > > > https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
> > > >
> > > > 抱歉，再试试这个
> > > >
> > > > Benchao Li  于2020年1月9日周四 下午8:13写道：
> > > >
> > > > > 我这边点开是 403 Forbidden
> > > > >
> > > > > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > > > > >
> > > > > > 谢谢，看看能否看见
> > > > > >
> > > > > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > > > > >
> > > > > > > hi Kevin,
> > > > > > >
> > > > > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > > > > 或者你可以直接贴文字？
> > > > > > >
> > > > > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > > > > >
> > > > > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > > > > >
> > > > > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > > > > >
> > > > > > > >> hi Kevin，
> > > > > > > >>
> > > > > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？
> > 从上面的日志看不出来是valueState是null呢。
> > > > > > > >>
> > > > > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > > > > >>
> > > > > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > > > > >> >
> > > > > > > >> > ```
> > > > > > > >> > 2020-01-09 05:14:04.087
> > > [flink-akka.actor.default-dispatcher-28]
> > > > > > INFO
> > > > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> > Map
> > > > ->
> > > > > > > >> Filter ->
> > > > > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b)
> > > switched
> > > > > from
> > > > > > > >> > RUNNING to FAILED.
> > > > > > > >> > java.lang.NullPointerException: null
> > > > > > > >> > at
> > com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > > > > >> > at
> > com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > > > > >> > at
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > > > > >> > at
> > > > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > >
> > >
>

How to assign a UID to a KeyedStream?

2020-01-09 Thread Ken Krugler

Hi all,

[Of course, right after hitting send I realized I could just do 
rides.getTransformation().setUid(“blah”), ditto for the fares stream. Might be 
something to add to the docs, or provide a .uid() method on KeyedStreams for 
syntactic sugar]

Just for grins, I disabled auto-generated UIDs for the taxi rides/fares state 
example in the online tutorial. 

env.getConfig().disableAutoGeneratedUIDs();

I then added UIDs for all operators, sources & sinks. But I still get the 
following when calling env.getExecutionPlan() or env.execute():

java.lang.IllegalStateException: Auto generated UIDs have been disabled but no 
UID or hash has been assigned to operator Partition
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transform(StreamGraphGenerator.java:297)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transformTwoInputTransform(StreamGraphGenerator.java:682)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transform(StreamGraphGenerator.java:252)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:209)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1529)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionPlan(StreamExecutionEnvironment.java:1564)
at com.citi.flink.RidesAndFaresTool.main(RidesAndFaresTool.java:63)

The simple workflow is:

DataStream rides = env
.addSource(new CheckpointedTaxiRideSource(ridesFile, 
servingSpeedFactor))
.uid("source: taxi rides")
.name("taxi rides")
.filter((TaxiRide ride) -> ride.isStart)
.uid("filter: only start rides")
.name("only start rides")
.keyBy((TaxiRide ride) -> ride.rideId);

DataStream fares = env
.addSource(new CheckpointedTaxiFareSource(faresFile, 
servingSpeedFactor))
.uid("source: taxi fares")
.name("taxi fares")
.keyBy((TaxiFare fare) -> fare.rideId);

DataStreamSink> enriched = rides
.connect(fares)
.flatMap(new EnrichmentFunction())
.uid("function: enrich rides with fares")
.name("enrich rides with fares")
.addSink(sink)
.uid("sink: enriched taxi rides")
.name("enriched taxi rides");

Internally the exception is thrown when the EnrichFunction (a 
RichCoFlatMapFunction) is being transformed by 
StreamGraphGenerator.transformTwoInputTransform().

This calls StreamGraphGenerator.transform() with the two inputs, but the 
Transformation for each input is a PartitionTransformation.

I don’t see a way to set the UID following the keyBy(), as a KeyedStream 
creates the PartitionTransformation without a UID.

Any insight into setting the UID properly here? Or should 
StreamGraphGenerator.transform() skip the no-uid check for 
PartitionTransformation, since that’s not an operator with state?

Thanks,

— Ken

--
Ken Krugler
http://www.scaleunlimited.com 
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

How to assign a UID to a KeyedStream?

2020-01-09 Thread Ken Krugler

Hi all,

Just for grins, I disabled auto-generated UIDs for the taxi rides/fares state 
example in the online tutorial. 

env.getConfig().disableAutoGeneratedUIDs();

I then added UIDs for all operators, sources & sinks. But I still get the 
following when calling env.getExecutionPlan() or env.execute():

java.lang.IllegalStateException: Auto generated UIDs have been disabled but no 
UID or hash has been assigned to operator Partition
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transform(StreamGraphGenerator.java:297)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transformTwoInputTransform(StreamGraphGenerator.java:682)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.transform(StreamGraphGenerator.java:252)
at 
org.apache.flink.streaming.api.graph.StreamGraphGenerator.generate(StreamGraphGenerator.java:209)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1529)
at 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionPlan(StreamExecutionEnvironment.java:1564)
at com.citi.flink.RidesAndFaresTool.main(RidesAndFaresTool.java:63)

The simple workflow is:

DataStream rides = env
.addSource(new CheckpointedTaxiRideSource(ridesFile, 
servingSpeedFactor))
.uid("source: taxi rides")
.name("taxi rides")
.filter((TaxiRide ride) -> ride.isStart)
.uid("filter: only start rides")
.name("only start rides")
.keyBy((TaxiRide ride) -> ride.rideId);

DataStream fares = env
.addSource(new CheckpointedTaxiFareSource(faresFile, 
servingSpeedFactor))
.uid("source: taxi fares")
.name("taxi fares")
.keyBy((TaxiFare fare) -> fare.rideId);

DataStreamSink> enriched = rides
.connect(fares)
.flatMap(new EnrichmentFunction())
.uid("function: enrich rides with fares")
.name("enrich rides with fares")
.addSink(sink)
.uid("sink: enriched taxi rides")
.name("enriched taxi rides");

Internally the exception is thrown when the EnrichFunction (a 
RichCoFlatMapFunction) is being transformed by 
StreamGraphGenerator.transformTwoInputTransform().

This calls StreamGraphGenerator.transform() with the two inputs, but the 
Transformation for each input is a PartitionTransformation.

I don’t see a way to set the UID following the keyBy(), as a KeyedStream 
creates the PartitionTransformation without a UID.

Any insight into setting the UID properly here? Or should 
StreamGraphGenerator.transform() skip the no-uid check for 
PartitionTransformation, since that’s not an operator with state?

Thanks,

— Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: flink遇到 valueState 自身的 NPE

2020-01-09 Thread Yun Tang

Hi Kevin

帮你看了下代码，问题定位到了。
容易发生的原因在于你的可见性配置成了 
ReturnExpiredIfNotCleanedUp，而Flink会先发现数据expire，然后触发clear操作，之后再返回expire的数据 
[1]。建议修改一下第39行的判断条件，每次取数据时都做是否为null的判断。


[1] 
https://github.com/apache/flink/blob/dba5b9e0138b667c3ecd32f7b16645d531477720/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/AbstractTtlDecorator.java#L96

祝好
唐云


From: Kevin Liao 
Sent: Friday, January 10, 2020 1:08
To: Yun Tang 
Cc: user-zh@flink.apache.org 
Subject: Re: flink遇到 valueState 自身的 NPE

谢答，首先贴的代码确实是运行的程序

此外刚刚又通过打印 log 确认了 uniqMark == null 是 false

我现在的怀疑点是这个地方

if (null != uniqMark.value() && uniqMark.value()) {

第一处uniqMark.value()可以取到结果（不为 null），同时由于 ttl 策略会触发
clear，进而导致第二个uniqMark.value()取出来的就是 null
了（看来编译器没有对这种写法做优化，就是真实的执行了两次value()函数），追了下代码似乎可以支持我的猜测，所以准备明天验证下这个猜测

明天有消息同步，谢谢

Yun Tang  于2020年1月10日周五 上午12:59写道：

> Hi Kevin
>
> State TTL 是清理的state中的数据条目entry，不是清理state在map函数中的对象本身。所以无论如何，作为value
> state对象的uniqMark 是不会因为TTL而变成null的。
>
>
> 我注意到你的作业即使发生failover之后，立刻恢复的时候，仍然遇到了这个NPE问题，我怀疑你实际运行的代码第39行并不是你贴出来的代码，很有可能是对应你的代码的第34行，也就是map方法的输入RDLog是null，这也符合作业成功restore之后，又再次立即遇到failover的场景，也就是处理到了非法“脏数据”，导致作业不断failover。建议你按照这个思路确认排除一下。
>
> 祝好
> 唐云
> --
> *From:* Kevin Liao 
> *Sent:* Thursday, January 9, 2020 23:17
> *To:* user-zh@flink.apache.org 
> *Subject:* Re: flink遇到 valueState 自身的 NPE
>
> 谢答
> 但还有一问题想请教：
>
> 当 valueState 触发 ttl 被回收后，这里的引用应该也会被 gc 掉，就会变成 null 了吧？难道是 operator 原来处理这个
> key 的线程会被一并回收掉，下次这个 key 再来时其实已经是另外新生成的线程提供服务了（这样肯定要重新调用一次 open
> 方法）？看了代码但还没搞明白，求解惑，谢谢
>
> Benchao Li  于2020年1月9日周四 下午8:59写道：
>
> > 我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
> > 除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
> > 但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。
> >
> > Kevin Liao  于2020年1月9日周四 下午8:15写道：
> >
> > > https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
> > >
> > > 抱歉，再试试这个
> > >
> > > Benchao Li  于2020年1月9日周四 下午8:13写道：
> > >
> > > > 我这边点开是 403 Forbidden
> > > >
> > > > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > > > >
> > > > > 谢谢，看看能否看见
> > > > >
> > > > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > > > >
> > > > > > hi Kevin,
> > > > > >
> > > > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > > > 或者你可以直接贴文字？
> > > > > >
> > > > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > > > >
> > > > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > > > >
> > > > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > > > >
> > > > > > >> hi Kevin，
> > > > > > >>
> > > > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？
> 从上面的日志看不出来是valueState是null呢。
> > > > > > >>
> > > > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > > > >>
> > > > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > > > >> >
> > > > > > >> > ```
> > > > > > >> > 2020-01-09 05:14:04.087
> > [flink-akka.actor.default-dispatcher-28]
> > > > > INFO
> > > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> Map
> > > ->
> > > > > > >> Filter ->
> > > > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b)
> > switched
> > > > from
> > > > > > >> > RUNNING to FAILED.
> > > > > > >> > java.lang.NullPointerException: null
> > > > > > >> > at
> com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > > > >> > at
> com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > > > >> > at
> > > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > > > > >> > at
> > > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > > >> >
> > > > > >
> > > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> >

Re: How can I find out which key group belongs to which subtask

2020-01-09 Thread 杨东晓

Thanks Congxian!
 My purpose is not only make data goes into one same subtask but the
specific subtask which belongs to same taskmanager with upstream record.
The key idea is to avoid shuffling  between taskmanagers.
I think the KeyGroupRangeAssignment.java

explained a lot about how to get keygroup and subtask context that can make
that happen.
Do you know if there are still  serialization happening while data
transferred between operator in same taskmanager?
Thanks.

Congxian Qiu  于2020年1月9日周四 上午1:55写道：

> Hi
>
> If you just want to make sure some key goes into the same subtask, does
> custom key selector[1] help?
>
> For the keygroup and subtask information, you can ref to
> KeyGroupRangeAssignment[2] for more info, and the max parallelism logic you
> can ref to doc[3]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions
> [2]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/state/KeyGroupRangeAssignment.java
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/parallel.html#setting-the-maximum-parallelism
>
> Best,
> Congxian
>
>
> 杨东晓  于2020年1月9日周四 上午7:47写道：
>
>> Hi , I'm trying to do some optimize about Flink 'keyby' processfunction.
>> Is there any possible I can find out one key belongs to which key-group
>> and essentially find out one key-group belongs to which subtask.
>> The motivation I want to know that is we want to  force the data records
>> from upstream still goes to same taskmanager downstream subtask .Which
>> means even if we use a keyedstream function we still want no cross jvm
>> communication happened during run time.
>> And if we can achieve that , can we also avoid the expensive cost for
>> record serialization because data is only transferred in same taskmanager
>> jvm instance?
>>
>> Thanks.
>>
>

Running Flink on java 11

Hi guys,
well We have requirement in our project to use Java 11, although we would
really like to use Flink because it seems to match our needs perfectly.

We were testing it on java 1.8 and all looks fine.
We tried to run it on Java 11 and also looks fine, at least for now.

We were also running this as a Job Cluster, and since those images [1] are
based on openjdk:8-jre-alpine we switch to java 13-jdk-alpine. Cluster
started and submitted the job. All seemed fine.

The Job and 3rd party library that this job is using were compiled with Java
11.
I was looking for any posts related to java 11 issues and I've found this
[2] one.
We are also aware of ongoing FLINK-10725 [3] but this is assigned to 1.10
FLink version

Having all of this, I would like to ask few questions

1. Is there any release date planed for 1.10?
2. Are you aware of any issues regarding running Flink on Java 11?
3. If my Job code would not use any code features from java 11, would flink
handle it when running on java 11? Or they are some internal functionalities
that would not be working on Java 11 (things that are using unsafe or
reflections?)

Thanks,
Krzysztof

[1]
https://github.com/apache/flink/blob/release-1.9/flink-container/docker/README.md
[2]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/UnsupportedOperationException-from-org-apache-flink-shaded-asm6-org-objectweb-asm-ClassVisitor-visit1-td28571.html
[3] https://issues.apache.org/jira/browse/FLINK-10725

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: flink遇到 valueState 自身的 NPE

谢答，首先贴的代码确实是运行的程序

此外刚刚又通过打印 log 确认了 uniqMark == null 是 false

我现在的怀疑点是这个地方

if (null != uniqMark.value() && uniqMark.value()) {

第一处uniqMark.value()可以取到结果（不为 null），同时由于 ttl 策略会触发
clear，进而导致第二个uniqMark.value()取出来的就是 null
了（看来编译器没有对这种写法做优化，就是真实的执行了两次value()函数），追了下代码似乎可以支持我的猜测，所以准备明天验证下这个猜测

明天有消息同步，谢谢

Yun Tang  于2020年1月10日周五 上午12:59写道：

> Hi Kevin
>
> State TTL 是清理的state中的数据条目entry，不是清理state在map函数中的对象本身。所以无论如何，作为value
> state对象的uniqMark 是不会因为TTL而变成null的。
>
>
> 我注意到你的作业即使发生failover之后，立刻恢复的时候，仍然遇到了这个NPE问题，我怀疑你实际运行的代码第39行并不是你贴出来的代码，很有可能是对应你的代码的第34行，也就是map方法的输入RDLog是null，这也符合作业成功restore之后，又再次立即遇到failover的场景，也就是处理到了非法“脏数据”，导致作业不断failover。建议你按照这个思路确认排除一下。
>
> 祝好
> 唐云
> --
> *From:* Kevin Liao 
> *Sent:* Thursday, January 9, 2020 23:17
> *To:* user-zh@flink.apache.org 
> *Subject:* Re: flink遇到 valueState 自身的 NPE
>
> 谢答
> 但还有一问题想请教：
>
> 当 valueState 触发 ttl 被回收后，这里的引用应该也会被 gc 掉，就会变成 null 了吧？难道是 operator 原来处理这个
> key 的线程会被一并回收掉，下次这个 key 再来时其实已经是另外新生成的线程提供服务了（这样肯定要重新调用一次 open
> 方法）？看了代码但还没搞明白，求解惑，谢谢
>
> Benchao Li  于2020年1月9日周四 下午8:59写道：
>
> > 我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
> > 除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
> > 但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。
> >
> > Kevin Liao  于2020年1月9日周四 下午8:15写道：
> >
> > > https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
> > >
> > > 抱歉，再试试这个
> > >
> > > Benchao Li  于2020年1月9日周四 下午8:13写道：
> > >
> > > > 我这边点开是 403 Forbidden
> > > >
> > > > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > > > >
> > > > > 谢谢，看看能否看见
> > > > >
> > > > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > > > >
> > > > > > hi Kevin,
> > > > > >
> > > > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > > > 或者你可以直接贴文字？
> > > > > >
> > > > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > > > >
> > > > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > > > >
> > > > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > > > >
> > > > > > >> hi Kevin，
> > > > > > >>
> > > > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？
> 从上面的日志看不出来是valueState是null呢。
> > > > > > >>
> > > > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > > > >>
> > > > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > > > >> >
> > > > > > >> > ```
> > > > > > >> > 2020-01-09 05:14:04.087
> > [flink-akka.actor.default-dispatcher-28]
> > > > > INFO
> > > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> Map
> > > ->
> > > > > > >> Filter ->
> > > > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b)
> > switched
> > > > from
> > > > > > >> > RUNNING to FAILED.
> > > > > > >> > java.lang.NullPointerException: null
> > > > > > >> > at
> com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > > > >> > at
> com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > > > >> > at
> > > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > > > > >> > at
> > > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > > >> >
> > > > > >
> > > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > > > > > >> > at
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > > > > > >> > at
> > > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > > > > > >> > at
> > org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > > > > > >> > at

Re: flink遇到 valueState 自身的 NPE

2020-01-09 Thread Yun Tang

Hi Kevin

State TTL 是清理的state中的数据条目entry，不是清理state在map函数中的对象本身。所以无论如何，作为value 
state对象的uniqMark 是不会因为TTL而变成null的。

我注意到你的作业即使发生failover之后，立刻恢复的时候，仍然遇到了这个NPE问题，我怀疑你实际运行的代码第39行并不是你贴出来的代码，很有可能是对应你的代码的第34行，也就是map方法的输入RDLog是null，这也符合作业成功restore之后，又再次立即遇到failover的场景，也就是处理到了非法“脏数据”，导致作业不断failover。建议你按照这个思路确认排除一下。

祝好
唐云

From: Kevin Liao 
Sent: Thursday, January 9, 2020 23:17
To: user-zh@flink.apache.org 
Subject: Re: flink遇到 valueState 自身的 NPE

谢答
但还有一问题想请教：

当 valueState 触发 ttl 被回收后，这里的引用应该也会被 gc 掉，就会变成 null 了吧？难道是 operator 原来处理这个
key 的线程会被一并回收掉，下次这个 key 再来时其实已经是另外新生成的线程提供服务了（这样肯定要重新调用一次 open
方法）？看了代码但还没搞明白，求解惑，谢谢

Benchao Li  于2020年1月9日周四 下午8:59写道：

> 我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
> 除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
> 但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。
>
> Kevin Liao  于2020年1月9日周四 下午8:15写道：
>
> > https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
> >
> > 抱歉，再试试这个
> >
> > Benchao Li  于2020年1月9日周四 下午8:13写道：
> >
> > > 我这边点开是 403 Forbidden
> > >
> > > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> > >
> > > >
> > > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > > >
> > > > 谢谢，看看能否看见
> > > >
> > > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > > >
> > > > > hi Kevin,
> > > > >
> > > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > > 或者你可以直接贴文字？
> > > > >
> > > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > > >
> > > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > > >
> > > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > > >
> > > > > >> hi Kevin，
> > > > > >>
> > > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> > > > > >>
> > > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > > >>
> > > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > > >> >
> > > > > >> > ```
> > > > > >> > 2020-01-09 05:14:04.087
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map
> > ->
> > > > > >> Filter ->
> > > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b)
> switched
> > > from
> > > > > >> > RUNNING to FAILED.
> > > > > >> > java.lang.NullPointerException: null
> > > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > > >> > at
> > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > >> >
> > > > > >>
> > > > >
> > >
> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > > > >> > at
> > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > >> >
> > > > >
> > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > > > > >> > at
> > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > > > > >> > at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > > > > >> > at java.lang.Thread.run(Thread.java:748)
> > > > > >> > 2020-01-09 05:14:04.088
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> > > > > Calculating
> > > > > >> > tasks to restart to recover the failed task
> > > > > >> > 90bea66de1c231edf33913ecd54406c1_2.
> > > > > >> > 2020-01-09 05:14:04.088
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12
> > > tasks
> > > > > >> should
> > > > > >> > be restarted to recover the failed task
> > > > > >>

Re: Flink logging issue with logback

2020-01-09 Thread Maximilian Michels


FYI, there is also a PR: https://github.com/apache/flink/pull/10811

On 09.01.20 01:53, Bajaj, Abhinav wrote:

Thanks Dawid, Max and Yang for confirming the issue and providing potential 
workaround.

On 1/8/20, 3:24 AM, "Maximilian Michels"  wrote:

 Interesting that we came across this problem at the same time. We have
 observed this with Lyft's K8s operator which uses the Rest API for job
 submission, much like the Flink dashboard.
 
 Note that you can restore the original stdout/stderr in your program:
 
private static void restoreStdOutAndStdErr() {

  System.setOut(new PrintStream(
  new FileOutputStream(FileDescriptor.out)));
  System.setErr(new PrintStream(
  new FileOutputStream(FileDescriptor.err)));
}
 
 Just call restoreStdOutAndStdErr() before you start building the Flink

 job. Of course, this is just meant to be a workaround.
 
 I think an acceptable solution is to always print upon execution. For

 the plan preview we may keep the existing behavior.
 
 Cheers,

 Max
 
 On 07.01.20 17:39, Dawid Wysakowicz wrote:

 > A quick update. The suppression of stdout/stderr actually might soon be
 > dropped, see: 
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-15504data=01%7C01%7C%7Ce78cef8e4589a72d08d7942d53eb%7C6d4034cd72254f72b85391feaea64919%7C1sdata=a2dpid%2Fm1SN8%2F5sx09%2FbuVuk%2FI7UM%2BWMZtNALaYf8rU%3Dreserved=0
 >
 > Best,
 >
 > Dawid
 >
 > On 07/01/2020 07:17, Yang Wang wrote:
 >> Hi Bajaj,
 >>
 >> I have tested just as you say, and find that the logs in the user
 >> class could not show up when
 >> using ConsoleAppender. If using FileAppender instead, everything goes
 >> well.
 >>
 >> It is so weird and i have no idea how to debug it.
 >> Best,
 >> Yang
 >>
 >> Bajaj, Abhinav > > 于2020年1月7日周二 上午4:28写道：
 >>
 >> Hi,
 >>
 >> Thanks much for the responses.
 >>
 >> Let me add some more details and clarify my question.
 >>
 >> _Setup_
 >>
 >>   * I used the WikipediaAnalysis example and added a log in main
 >> method.
 >>
 >> ……
 >>
 >> public static void main(String[] args) throws Exception {
 >>   StreamExecutionEnvironment see =
 >> StreamExecutionEnvironment./getExecutionEnvironment/();
 >> /LOG/.info("Info log for test");
 >>
 >> DataStream edits = see.addSource(new
 >> WikipediaEditsSource());
 >>
 >> ……
 >>
 >>   * I am using the Flink 1.7.1 distribution and starting
 >> jobmanager and taskmanager locally using the below commands –
 >>   o ./bin/jobmanager.sh start-foreground
 >>   o ./bin/taskmanager.sh start-foreground
 >>   o Both jobmanager and taskmanager log in the console now
 >>   o JVM options are correctly set and verified from jobmanager
 >> & taskmanager logs
 >>
 >>   * I submit the WikipediaAnalysis job from Flink dashboard and
 >> checked the jobmanager logs
 >>
 >> _Run 1_: Flink is using the default log4j logging
 >>
 >>   * Jobmanager logs the added info log from the job
 >>   o 2020-01-06 11:55:37,422 INFO wikiedits.WikipediaAnalysis -
 >> Info log for test
 >>
 >> _Run 2_: Flink is setup to use logback as suggested in Flink
 >> documentation here
 >> 

 >>
 >>   * Jobmanger does not log the added info log from the job
 >>
 >> So, it seems there is a logging behavior difference between using
 >> log4j & logback in Flink.
 >>
 >> Is this expected or a known difference?
 >>
 >> Thanks again,
 >>
 >> Abhinav Bajaj
 >>
 >> _PS_: Ahh. I see how my email was confusing the first time.
 >> Hopefully this one is better :P
 >>
 >> *From: *Dawid Wysakowicz > >
 >> *Date: *Monday, January 6, 2020 at 5:13 AM
 >> *Cc: *"Bajaj, Abhinav" > >, "user@flink.apache.org
 >> " > >
 >> *Subject: *Re: Flink logging issue with logback
 >>
 >> Hi Bajaj,
 >>
 >> I am not entirely sure what is the actual issue you are seeking
 >>

Re: Completed job wasn't saved to archive

2020-01-09 Thread Rong Rong

Hi Pavel,

Sorry for bringing this thread up so late. I was digging into the usage of
the Flink history server and I found one situation where there would be no
logs and no failure/success message from the cluster:
In very rare case in our Flink-YARN session cluster: if an application
master (JobManager running container) fails and being restarted as a YARN
2nd attempt (we haven't enable HA) - then there will be no logs of
archiving being logged whatsoever. However in this case the there would be
a completely new AM container brought up running the JM again (e.g. new log
files)

I am not exactly sure whether this suites your scenarios. Could you
describe a bit more on how your cluster was configured?

Thanks,
Rong

On Mon, Nov 25, 2019 at 10:49 AM Chesnay Schepler 
wrote:

> I'm afraid I can't think of a solution. I don't see a way how this
> operation can succeed or fail without anything being logged.
>
> Is the cluster behaving normally afterwards? Could you check whether the 
> numRunningJobs
> ticks down properly after the job was canceled?
>
>
> On 22/11/2019 13:27, Pavel Potseluev wrote:
>
> Hi Chesnay,
>
> We archive jobs on s3 file system. We don't configure a throttling for
> write operations and afaik it isn't possible now and will be implemented in
> FLINK-13251 . And
> other write operations (like checkpoints saving) work fine. But I don't see
> archived job or message about archiving failure at all. It looks like Flink
> just didn't try to save job to archive.
>
> 21.11.2019, 17:17, "Chesnay Schepler" 
> :
>
> If the archiving fails there should be some log message, like "Failed to
> archive job" or "Could not archive completed job..." .
> If nothing of the sort is logged my first instinct would be that the
> operation is being slowed down, _a lot_.
>
> Where are you archiving them to? Could it be the write operation is being
> throttled heavily?
>
> On 21/11/2019 13:48, Pavel Potseluev wrote:
>
> Hi Vino,
>
> Usually Flink archives jobs correctly and the problem is rarely
> reproduced. So I think it isn't a problem with configuration.
>
> Job Manager log when job 5ec264a20bb5005cdbd8e23a5e59f136 was canceled:
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:52:13.294 [Checkpoint
> Timer] INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  -
> Triggering checkpoint 1872 @ 1574092333218 for job
> 5ec264a20bb5005cdbd8e23a5e59f136.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:52:37.260
> [flink-akka.actor.default-dispatcher-30] INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Completed
> checkpoint 1872 for job 5ec264a20bb5005cdbd8e23a5e59f136 (568048140 bytes
> in 23541 ms).
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:13.314 [Checkpoint
> Timer] INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  -
> Triggering checkpoint 1873 @ 1574092393218 for job
> 5ec264a20bb5005cdbd8e23a5e59f136.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.279
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - Job
> bureau-user-offers-statistics-AUTORU-USERS_AUTORU
> (5ec264a20bb5005cdbd8e23a5e59f136) switched from state RUNNING to
> CANCELLING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.279
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> File Source (1/1) (934d89cf3d7999b40225dd8009b5493c) switched from RUNNING
> to CANCELING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.280
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> kafka-source-moderation-update-journal-autoru -> Filter -> Flat Map (1/2)
> (47656a3c4fc70e19622acca31267e41f) switched from RUNNING to CANCELING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.280
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> kafka-source-moderation-update-journal-autoru -> Filter -> Flat Map (2/2)
> (be3c4562e65d3d6bdfda4f1632017c6c) switched from RUNNING to CANCELING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.280
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> user-offers-statistics-init-from-file -> Map (1/2)
> (4a45ed43b05e4d444e190a44b33514ac) switched from RUNNING to CANCELING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.280
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> user-offers-statistics-init-from-file -> Map (2/2)
> (bb3be311c5e53abaedb06b4d0148c23f) switched from RUNNING to CANCELING.
>
> 771a4992-d694-d2a4-b49a-d4eb382086e5 2019-11-18 18:53:19.280
> [flink-akka.actor.default-dispatcher-40] INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph  -

Re: flink遇到 valueState 自身的 NPE

谢答
但还有一问题想请教：

当 valueState 触发 ttl 被回收后，这里的引用应该也会被 gc 掉，就会变成 null 了吧？难道是 operator 原来处理这个
key 的线程会被一并回收掉，下次这个 key 再来时其实已经是另外新生成的线程提供服务了（这样肯定要重新调用一次 open
方法）？看了代码但还没搞明白，求解惑，谢谢

Benchao Li  于2020年1月9日周四 下午8:59写道：

> 我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
> 除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
> 但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。
>
> Kevin Liao  于2020年1月9日周四 下午8:15写道：
>
> > https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
> >
> > 抱歉，再试试这个
> >
> > Benchao Li  于2020年1月9日周四 下午8:13写道：
> >
> > > 我这边点开是 403 Forbidden
> > >
> > > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> > >
> > > >
> > > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > > >
> > > > 谢谢，看看能否看见
> > > >
> > > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > > >
> > > > > hi Kevin,
> > > > >
> > > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > > 或者你可以直接贴文字？
> > > > >
> > > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > > >
> > > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > > >
> > > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > > >
> > > > > >> hi Kevin，
> > > > > >>
> > > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> > > > > >>
> > > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > > >>
> > > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > > >> >
> > > > > >> > ```
> > > > > >> > 2020-01-09 05:14:04.087
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map
> > ->
> > > > > >> Filter ->
> > > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b)
> switched
> > > from
> > > > > >> > RUNNING to FAILED.
> > > > > >> > java.lang.NullPointerException: null
> > > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > > >> > at
> > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > >> >
> > > > > >>
> > > > >
> > >
> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > > > >> > at
> > > > > >> > org.apache.flink.streaming.runtime.io
> > > > > >> >
> > > > >
> > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > > > > >> > at
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > > > > >> > at
> > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > > > > >> > at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > > > > >> > at java.lang.Thread.run(Thread.java:748)
> > > > > >> > 2020-01-09 05:14:04.088
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> > > > > Calculating
> > > > > >> > tasks to restart to recover the failed task
> > > > > >> > 90bea66de1c231edf33913ecd54406c1_2.
> > > > > >> > 2020-01-09 05:14:04.088
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12
> > > tasks
> > > > > >> should
> > > > > >> > be restarted to recover the failed task
> > > > > >> 90bea66de1c231edf33913ecd54406c1_2.
> > > > > >> > 2020-01-09 05:14:04.089
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> > Source:
> > > > > >> Custom
> > > > > >> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25)
> > switched
> > > > > from
> > > > > >> > RUNNING to CANCELING.
> > > > > >> > 2020-01-09 05:14:04.089
> [flink-akka.actor.default-dispatcher-28]
> > > > INFO
> > > > > >> >

Re: UnsupportedOperationException from org.apache.flink.shaded.asm6.org.objectweb.asm.ClassVisitor.visitNestHostExperimental using Java 11

Hi,
are there any plans to support Java 11?

Thanks,
Krzysztof



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Job Cluster vs Session Cluster deploying and configuration

Hi all,
I'm researching docker/k8s deployment possibilities for Flink 1.9.1.

I'm after reading/watching [1][2][3][4].

Currently we do think that we will try go with Job Cluster approach although
we would like to know what is the community trend with this? We would rather
not deploy more than one job per Flink cluster.

Anyways, I was wondering about few things:

1. How can I change the number of task slots per task manager for Job and
Session Cluster? In my case I'm running docker on VirtualBox where I have 4
CPUs assigned to this machine. However each task manager is spawned with
only one task slot for Job Cluster. With Session Cluster however, on the
same machine, each task manager is spawned with 4 task slots.

In both cases Flink's UI shows that each Task manager has 4 CPUs.

2. How can I resubmit job if I'm using a Job Cluster. I'm referring this use
case [5]. You may say that I have to start the job again but with different
arguments. What is the procedure for this? I'm using checkpoints btw.

Should I kill all task manager containers and rerun them with different
parameters?

3. How I can resubmit job using Session Cluster?

4. How I can provide log config for Job/Session cluster?
I have a case, where I changed log level and log format in log4j.properties
and this is working fine on local (IDE) environment. However when I build
the fat jar, and ran a Job Cluster based on this jar it seams that my log4j
properties are not passed to the cluster. I see the original format and
original (INFO) level.

Thanks,

[1] https://youtu.be/w721NI-mtAA
[2] https://youtu.be/WeHuTRwicSw
[3]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/docker.html
[4]
https://github.com/apache/flink/blob/release-1.9/flink-container/docker/README.md
[5]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Job-claster-scalability-td32027.html

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink Job claster scalability

Thank you David and Zhu Zhu,
this helps a lot.

I have follow up questions though.

Having this
/"Instead the Job must be stopped via a savepoint and restarted with a new
parallelism"/

and slot sharing [1] feature, I got the impression that if I would start my
cluster with more than 6 task slots, Flink will try deploy tasks across all
resources, trying to use all available resources during job submission

I did a two tests having my original task.
1. I started a Job Cluster with 7 task slots (7 task manager since in this
case 1 task manager has one task slot).
2. I started a Session cluster with 28 task slots in total. In this case I
had 7 task managers, 4 task slot each. 

For case 1, I use "FLINK_JOB" variable as stated in [2]. For case 2, I
submitted my job from UI after Flink started to be operative. 


For both cases it used only 6 task slots, so it was still reusing task
slots. I got the impression that it will try to use as much available
resources as it can.

What do you think about this?


[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/concepts/runtime.html#task-slots-and-resources
[2]
https://github.com/apache/flink/blob/release-1.9/flink-container/docker/README.md








--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: flink遇到 valueState 自身的 NPE

我感觉这个地方好像没有道理会有`uniqMark`变成`null`,
除非是什么地方反序列化出来`StreamMap`，并且没有调用`StreamMap.open()`.
但是看起来`StreamTask`是可以保证先调用`open`，再调用operator的处理函数的。我也看不出来这个地方有什么问题。

Kevin Liao  于2020年1月9日周四 下午8:15写道：

> https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg
>
> 抱歉，再试试这个
>
> Benchao Li  于2020年1月9日周四 下午8:13写道：
>
> > 我这边点开是 403 Forbidden
> >
> > Kevin Liao  于2020年1月9日周四 下午8:09写道：
> >
> > >
> > >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> > >
> > > 谢谢，看看能否看见
> > >
> > > Benchao Li  于2020年1月9日周四 下午8:07写道：
> > >
> > > > hi Kevin,
> > > >
> > > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > > 或者你可以直接贴文字？
> > > >
> > > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > > >
> > > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > > >
> > > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > > >
> > > > >> hi Kevin，
> > > > >>
> > > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> > > > >>
> > > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > > >>
> > > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > > >> >
> > > > >> > ```
> > > > >> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map
> ->
> > > > >> Filter ->
> > > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched
> > from
> > > > >> > RUNNING to FAILED.
> > > > >> > java.lang.NullPointerException: null
> > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > > >> > at
> > > > >> > org.apache.flink.streaming.runtime.io
> > > > >> >
> > > > >>
> > > >
> > .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > > >> > at
> > > > >> > org.apache.flink.streaming.runtime.io
> > > > >> >
> > > >
> .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > > > >> > at
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > > > >> > at
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > > > >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > > > >> > at java.lang.Thread.run(Thread.java:748)
> > > > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> > > > Calculating
> > > > >> > tasks to restart to recover the failed task
> > > > >> > 90bea66de1c231edf33913ecd54406c1_2.
> > > > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12
> > tasks
> > > > >> should
> > > > >> > be restarted to recover the failed task
> > > > >> 90bea66de1c231edf33913ecd54406c1_2.
> > > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> Source:
> > > > >> Custom
> > > > >> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25)
> switched
> > > > from
> > > > >> > RUNNING to CANCELING.
> > > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> Source:
> > > > >> Custom
> > > > >> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f)
> switched
> > > > from
> > > > >> > RUNNING to CANCELING.
> > > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > > INFO
> > > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  -
> Source:
> > > > >> Custom
> > > > >> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6)
> switched
> > > > from

Re: flink遇到 valueState 自身的 NPE

https://tva4.sinaimg.cn/large/63137227ly1gaqkn1nlykj20mm0wvgq8.jpg

抱歉，再试试这个

Benchao Li  于2020年1月9日周四 下午8:13写道：

> 我这边点开是 403 Forbidden
>
> Kevin Liao  于2020年1月9日周四 下午8:09写道：
>
> >
> >
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
> >
> > 谢谢，看看能否看见
> >
> > Benchao Li  于2020年1月9日周四 下午8:07写道：
> >
> > > hi Kevin,
> > >
> > > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > > 或者你可以直接贴文字？
> > >
> > > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> > >
> > > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > > >
> > > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > > >
> > > >> hi Kevin，
> > > >>
> > > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> > > >>
> > > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > > >>
> > > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > > >> >
> > > >> > ```
> > > >> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> > > >> Filter ->
> > > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched
> from
> > > >> > RUNNING to FAILED.
> > > >> > java.lang.NullPointerException: null
> > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > > >> > at
> > > >> > org.apache.flink.streaming.runtime.io
> > > >> >
> > > >>
> > >
> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > > >> > at
> > > >> > org.apache.flink.streaming.runtime.io
> > > >> >
> > > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > > >> > at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > > >> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > > >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > > >> > at java.lang.Thread.run(Thread.java:748)
> > > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> > > Calculating
> > > >> > tasks to restart to recover the failed task
> > > >> > 90bea66de1c231edf33913ecd54406c1_2.
> > > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12
> tasks
> > > >> should
> > > >> > be restarted to recover the failed task
> > > >> 90bea66de1c231edf33913ecd54406c1_2.
> > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > > >> Custom
> > > >> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched
> > > from
> > > >> > RUNNING to CANCELING.
> > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > > >> Custom
> > > >> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched
> > > from
> > > >> > RUNNING to CANCELING.
> > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > > >> Custom
> > > >> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched
> > > from
> > > >> > RUNNING to CANCELING.
> > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > > >> Custom
> > > >> > Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched
> > > from
> > > >> > RUNNING to CANCELING.
> > > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> > INFO
> > > >> >

Re: flink遇到 valueState 自身的 NPE

我这边点开是 403 Forbidden

Kevin Liao  于2020年1月9日周四 下午8:09写道：

>
> https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft
>
> 谢谢，看看能否看见
>
> Benchao Li  于2020年1月9日周四 下午8:07写道：
>
> > hi Kevin,
> >
> > 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> > 或者你可以直接贴文字？
> >
> > Kevin Liao  于2020年1月9日周四 下午7:10写道：
> >
> > > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> > >
> > > Benchao Li  于2020年1月9日周四 下午6:42写道：
> > >
> > >> hi Kevin，
> > >>
> > >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> > >>
> > >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> > >>
> > >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> > >> >
> > >> > ```
> > >> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> > >> Filter ->
> > >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched from
> > >> > RUNNING to FAILED.
> > >> > java.lang.NullPointerException: null
> > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > >> > at
> > >> > org.apache.flink.streaming.runtime.io
> > >> >
> > >>
> > .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > >> > at
> > >> > org.apache.flink.streaming.runtime.io
> > >> >
> > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > >> > at
> > >> >
> > >> >
> > >>
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > >> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > >> > at java.lang.Thread.run(Thread.java:748)
> > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> > Calculating
> > >> > tasks to restart to recover the failed task
> > >> > 90bea66de1c231edf33913ecd54406c1_2.
> > >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12 tasks
> > >> should
> > >> > be restarted to recover the failed task
> > >> 90bea66de1c231edf33913ecd54406c1_2.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > >> Custom
> > >> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched
> > from
> > >> > RUNNING to CANCELING.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > >> Custom
> > >> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched
> > from
> > >> > RUNNING to CANCELING.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > >> Custom
> > >> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched
> > from
> > >> > RUNNING to CANCELING.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> > >> Custom
> > >> > Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched
> > from
> > >> > RUNNING to CANCELING.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> > >> Filter ->
> > >> > Sink: Unnamed (6/6) (de624cf5c9d4dec6fe68d4800c701457) switched from
> > >> > RUNNING to CANCELING.
> > >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28]
> INFO
> > >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> > >> Filter ->
> > >> > Sink: Unnamed (4/6)

Re: flink遇到 valueState 自身的 NPE

https://gm1.ggpht.com/FZGtbLggyPPZ_BoU0gt2SQTv7fyhNOKu03ZjsKq7G6DtqWJ5DY0NmL-2s64P-LUzedbTm8DE_FeggNtPAb4VEmypAoPfW8VFSFxOWxMGBvMi5G6xHoZ3THKPYHnAj8KydQ02OjvV-R3IEwBvwIDnZRmwCv3ohyjPF76gbvOOYrKzgaLb_pykWsQDpvROHr3lgU2rezH33Jt3xJEOjXGjHsUFUxiil0PYkQFdA0BP77lypYQLw4RL8BxMz3HfaCiNAGb_q5w8JNmckHLU3g9EuPgtqj6WP3XDv07PBuCXMvmfNcFbAciMeJuOOeE8VBqDCacjuiDtJzVrK1boxcBnzFvT_QazOwaJ27SSuJ_u5KCerTURen2vLBF1RN-x9eOVz9wg6w1oXyMAF7LMjGsYsVzUu3It5AyzLkm-_znosNtAJp2AW_qGmGo-k02fcrMjUoELiGvqn6W1kScnFI4gNWi_dpZe0Uoq1zF2m1crww1oNGOeRjFlCK_-iC19CPfsTVCtwN3tdKnaKdLe2TbfVdFA0DnBUz8NrhV-mvmZlEwi9-ngK-WOy8yjA4fin1zaE2SJCf2zfBSZwGR2eY_E_WZQiFRmSBI2A7vpoyFvTV3E99MIi0MC5PUAeRiu4v4JIVDkV_yUGIUvoa7pxdf7LpZN_DbikQVk7yES8kxxL5qG2Eae8vftWJuBVi5mWTxvElBgInyUntobXHdxfb2YR4JdBgVPN5QionJiIc9g5i0ClGECZbyHPbsQy4pEVw=s0-l75-ft-l75-ft

谢谢，看看能否看见

Benchao Li  于2020年1月9日周四 下午8:07写道：

> hi Kevin,
>
> 邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
> 或者你可以直接贴文字？
>
> Kevin Liao  于2020年1月9日周四 下午7:10写道：
>
> > [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
> >
> > Benchao Li  于2020年1月9日周四 下午6:42写道：
> >
> >> hi Kevin，
> >>
> >> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
> >>
> >> Kevin Liao  于2020年1月9日周四 下午5:57写道：
> >>
> >> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> >> >
> >> > ```
> >> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> >> Filter ->
> >> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched from
> >> > RUNNING to FAILED.
> >> > java.lang.NullPointerException: null
> >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> >> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> >> > at
> >> >
> >> >
> >>
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> >> > at
> >> > org.apache.flink.streaming.runtime.io
> >> >
> >>
> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> >> > at
> >> > org.apache.flink.streaming.runtime.io
> >> >
> .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> >> > at
> >> >
> >> >
> >>
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> >> > at
> >> >
> >> >
> >>
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> >> > at
> >> >
> >> >
> >>
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> >> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> >> > at java.lang.Thread.run(Thread.java:748)
> >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  -
> Calculating
> >> > tasks to restart to recover the failed task
> >> > 90bea66de1c231edf33913ecd54406c1_2.
> >> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12 tasks
> >> should
> >> > be restarted to recover the failed task
> >> 90bea66de1c231edf33913ecd54406c1_2.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> >> Custom
> >> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched
> from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> >> Custom
> >> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched
> from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> >> Custom
> >> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched
> from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
> >> Custom
> >> > Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched
> from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> >> Filter ->
> >> > Sink: Unnamed (6/6) (de624cf5c9d4dec6fe68d4800c701457) switched from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> >> Filter ->
> >> > Sink: Unnamed (4/6) (e52c1e70884a6599205f9e0f5b092bc0) switched from
> >> > RUNNING to CANCELING.
> >> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
> >> Filter ->
> >> > Sink: Unnamed (5/6)

Re: flink遇到 valueState 自身的 NPE

hi Kevin,

邮件里面贴不了图片，如果要贴图片，需要用一些第三方的图床工具。
或者你可以直接贴文字？

Kevin Liao  于2020年1月9日周四 下午7:10写道：

> [image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]
>
> Benchao Li  于2020年1月9日周四 下午6:42写道：
>
>> hi Kevin，
>>
>> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
>>
>> Kevin Liao  于2020年1月9日周四 下午5:57写道：
>>
>> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
>> >
>> > ```
>> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched from
>> > RUNNING to FAILED.
>> > java.lang.NullPointerException: null
>> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
>> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
>> > at
>> >
>> >
>> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>> > at
>> > org.apache.flink.streaming.runtime.io
>> >
>> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
>> > at
>> > org.apache.flink.streaming.runtime.io
>> > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
>> > at
>> >
>> >
>> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
>> > at
>> >
>> >
>> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
>> > at
>> >
>> >
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
>> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>> > at java.lang.Thread.run(Thread.java:748)
>> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
>> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - Calculating
>> > tasks to restart to recover the failed task
>> > 90bea66de1c231edf33913ecd54406c1_2.
>> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
>> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12 tasks
>> should
>> > be restarted to recover the failed task
>> 90bea66de1c231edf33913ecd54406c1_2.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (6/6) (de624cf5c9d4dec6fe68d4800c701457) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (4/6) (e52c1e70884a6599205f9e0f5b092bc0) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (5/6) (60496dddb4bc885ee37a6025662080ad) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (2/6) (bf8515b4f9e852182a5519102fe4fdf3) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (2/6) (bb14d5776c53babcc57edd65bf7159b0) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source:
>> Custom
>> > Source -> Filter (1/6) (4c7cd6eaf5c3ca9c2b0db73e7d230a9e) switched from
>> > RUNNING to CANCELING.
>> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map ->
>> Filter ->
>> > Sink: Unnamed (1/6) (4a157d98db2f2efad72035af279433ff) switched from
>> >

Re: Flink Job claster scalability

2020-01-09 Thread David Maddison

Hi KristoffSC,

As Zhu Zhu explained, Flink does not currently auto-scale a Job as new
resources become available. Instead the Job must be stopped via a savepoint
and restarted with a new parallelism (the old rescale CLI experiment use to
perform this).

Making Flink reactive to new resources and auto scaling jobs is something
I'm currently very interested in. An approach on how to change Flink to
support this has been previously outlined/discussed in FLINK-10407 (
https://issues.apache.org/jira/browse/FLINK-10407)

/David/

On Thu, Jan 9, 2020 at 7:38 AM Zhu Zhu  wrote:

> Hi KristoffSC,
>
> Each task needs a slot to run. However, Flink enables slot sharing[1] by
> default so that one slot can host one parallel instance of each task in a
> job. That's why your job can start with 6 slots.
> However, different parallel instances of the same task cannot share a
> slot. That's why you need at least 6 slots to run your job.
>
> You can set tasks to be in different slot sharing group via
> '.slotSharingGroup(xxx)' to force certain tasks to not share slots. This
> allows the tasks to not burden each other. However, in this way the job
> will need more slots to start.
>
> So for your questions:
> #1 yes
> #2 ATM, you will need to resubmit your job with the adjusted parallelism.
> The rescale cli was experimental and was temporarily removed [2]
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/concepts/runtime.html#task-slots-and-resources
> [2]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Temporarily-remove-support-for-job-rescaling-via-CLI-action-quot-modify-quot-td27447.html
>
> Thanks,
> Zhu Zhu
>
> KristoffSC  于2020年1月9日周四 上午1:05写道：
>
>> Hi all,
>> I must say I'm very impressed by Flink and what it can do.
>>
>> I was trying to play around with Flink operator parallelism and
>> scalability
>> and I have few questions regarding this subject.
>>
>> My setup is:
>> 1. Flink 1.9.1
>> 2. Docker Job Cluster, where each Task manager has only one task slot. I'm
>> following [1]
>> 3. env setup:
>> env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1000, 1000));
>> env.setParallelism(1);
>> env.setMaxParallelism(128);
>> env.enableCheckpointing(10 * 60 * 1000);
>>
>> Please mind that I am using operator chaining here.
>>
>> My pipeline setup:
>> <
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2311/Capture2.png>
>>
>>
>>
>> As you can see I have 7 operators (few of them were actually chained and
>> this is ok), with different parallelism level. This all gives me 23 tasks
>> total.
>>
>>
>> I've noticed that with "one task manager = one task slot" approach I have
>> to
>> have 6 task slots/task managers to be able to start this pipeline.
>>
>> <
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2311/Capture1.png>
>>
>>
>> If number of task slots is lower than 6, job is scheduled but not
>> started.
>>
>> With 6 task slots everything is working fine and I've must say that I'm
>> very
>> impressed with a way that Flinks balanced data between task slots. Data
>> was
>> distributed very evenly between operator instances/tasks.
>>
>> In this setup (7 operators, 23 tasks and 6 task slots), some task slots
>> have
>> to be reused by more than one operator. While inspecting UI I've found
>> examples such operators. This is what I was expecting though.
>>
>> However I was surprised a little bit after I added one additional task
>> manager (hence one new task slot)
>>
>> <
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2311/Capture3.png>
>>
>>
>> After adding new resources, Flink did not re balanced/redistributed the
>> graph. So this host was sitting there and doing nothing. Even after
>> putting
>> some load on the cluster, still this node was not used.
>>
>>
>> *After doing this exercise I have few questions:*
>>
>> 1. It seems that number of task slots must be equal or greater than max
>> number of parallelism used in the pipeline. In my case it was 6. When I
>> changed parallelism for one of the operator to 7, I had to have 7 task
>> slots
>> (task managers in my setup) to be able to even start the job.
>> Is this the case?
>>
>> 2. What I can do to use the extra node that was spanned while job was
>> running?
>> In other words, If I would see that one of my nodes has to much load what
>> I
>> can do? Please mind that I'm using keyBy/hashing function in my pipeline
>> and
>> in my tests I had around 5000 unique keys.
>>
>> I've try to use REST API to call "rescale" but I got this response:
>> /302{"errors":["Rescaling is temporarily disabled. See FLINK-12312."]}/
>>
>> Thanks.
>>
>> [1]
>>
>> https://github.com/apache/flink/blob/release-1.9/flink-container/docker/README.md
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>

Re: flink遇到 valueState 自身的 NPE

[image: B40C260D-DCC3-4B7D-A024-3839803C2234.png]

Benchao Li  于2020年1月9日周四 下午6:42写道：

> hi Kevin，
>
> 能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。
>
> Kevin Liao  于2020年1月9日周四 下午5:57写道：
>
> > 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
> >
> > ```
> > 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched from
> > RUNNING to FAILED.
> > java.lang.NullPointerException: null
> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> > at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> > at
> >
> >
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> > at
> > org.apache.flink.streaming.runtime.io
> > .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> > at
> > org.apache.flink.streaming.runtime.io
> > .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> > at
> >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> > at
> >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> > at
> >
> >
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> > at java.lang.Thread.run(Thread.java:748)
> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - Calculating
> > tasks to restart to recover the failed task
> > 90bea66de1c231edf33913ecd54406c1_2.
> > 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
> >  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12 tasks
> should
> > be restarted to recover the failed task
> 90bea66de1c231edf33913ecd54406c1_2.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (6/6) (de624cf5c9d4dec6fe68d4800c701457) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (4/6) (e52c1e70884a6599205f9e0f5b092bc0) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (5/6) (60496dddb4bc885ee37a6025662080ad) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (2/6) (bf8515b4f9e852182a5519102fe4fdf3) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (2/6) (bb14d5776c53babcc57edd65bf7159b0) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (1/6) (4c7cd6eaf5c3ca9c2b0db73e7d230a9e) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter
> ->
> > Sink: Unnamed (1/6) (4a157d98db2f2efad72035af279433ff) switched from
> > RUNNING to CANCELING.
> > 2020-01-09 05:14:04.096 [flink-akka.actor.default-dispatcher-28] INFO
> >  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> > Source -> Filter (1/6)

Re: 流处理任务失败该如何追回之前的数据

2020-01-09 Thread Px New

rollback 后
taskManager 会去获取持久化存储的snapshot , Source 也会回放到 做CheckPoint 时的那个点上
不论你使用的是是什么时间吧 -


Dian Fu  于2019年11月14日周四 下午1:14写道：

> 如果使用的event
> time，watermark是根据event计算出来的，和系统时间没有关系，所以从最后一次checkpoint恢复即可。为什么你会觉得有问题？
>
> > 在 2019年11月13日，下午8:29，柯桂强  写道：
> >
> >
> 我现在有一个流处理任务失败了，并且保留了checkpoint或者savepoint，我希望从最后一次checkpoint恢复，但是任务使用的是事件时间，超过窗口的数据就会被丢弃，我想到一个方法是，重启之前的数据通过批处理完成然后跑流处理，想问问大家这个方案是否可行，但是感觉如何限定批处理的范围并且和之后的流处理完美拼接是一个比较难的问题
>
>

Re: flink遇到 valueState 自身的 NPE

hi Kevin，

能贴一下MyMapFunction2.java:39 这里的代码吗？ 从上面的日志看不出来是valueState是null呢。

Kevin Liao  于2020年1月9日周四 下午5:57写道：

> 早上发现任务异常，task 在不停重启，遂查看 jm 日志，最开始的报错是这样的
>
> ```
> 2020-01-09 05:14:04.087 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (3/6) (d0e6c4a05d0274c18a4a3df41ab5ff1b) switched from
> RUNNING to FAILED.
> java.lang.NullPointerException: null
> at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:39)
> at com.sogou.qidian.MyMapFunction2.map(MyMapFunction2.java:25)
> at
>
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
> at
> org.apache.flink.streaming.runtime.io
> .StreamOneInputProcessor.processElement(StreamOneInputProcessor.java:164)
> at
> org.apache.flink.streaming.runtime.io
> .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:143)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:279)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.run(StreamTask.java:301)
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:406)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
> at java.lang.Thread.run(Thread.java:748)
> 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
>  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - Calculating
> tasks to restart to recover the failed task
> 90bea66de1c231edf33913ecd54406c1_2.
> 2020-01-09 05:14:04.088 [flink-akka.actor.default-dispatcher-28] INFO
>  o.a.f.r.e.failover.flip1.RestartPipelinedRegionStrategy  - 12 tasks should
> be restarted to recover the failed task 90bea66de1c231edf33913ecd54406c1_2.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (6/6) (ac52050e60236cd1efcd262c8240cd25) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (5/6) (cf4ff0c189315b27e7e2178d8c60e49f) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (4/6) (8c8b07cb39a3f682f41f102e614765e6) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (3/6) (34aadddbffe9f61b1916bcd1427ced96) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (6/6) (de624cf5c9d4dec6fe68d4800c701457) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (4/6) (e52c1e70884a6599205f9e0f5b092bc0) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (5/6) (60496dddb4bc885ee37a6025662080ad) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.089 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (2/6) (bf8515b4f9e852182a5519102fe4fdf3) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (2/6) (bb14d5776c53babcc57edd65bf7159b0) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (1/6) (4c7cd6eaf5c3ca9c2b0db73e7d230a9e) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.090 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Map -> Filter ->
> Sink: Unnamed (1/6) (4a157d98db2f2efad72035af279433ff) switched from
> RUNNING to CANCELING.
> 2020-01-09 05:14:04.096 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (1/6) (4c7cd6eaf5c3ca9c2b0db73e7d230a9e) switched from
> CANCELING to CANCELED.
> 2020-01-09 05:14:04.101 [flink-akka.actor.default-dispatcher-28] INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph  - Source: Custom
> Source -> Filter (2/6) (bb14d5776c53babcc57edd65bf7159b0) switched from
> CANCELING to

flink遇到 valueState 自身的 NPE