Re: Flink1.9批任务yn和ys对任务的影响

2019-12-25 Thread Xintong Song
slot需要多少内存是和具体作业相关的,不同作业差别会比较大。 slot的资源需求是根据所有算子的资源需求相加得到的,如果你对你的作业用到了哪些算子比较了解的话,可以根据算子的资源需求推算出来。 算子的默认资源需求可以参考 [1],里面有五个“table.exec.resource.*”的配置项,也可以调整这些配置项来更改算子使用的内存。 如果对作业使用到的算子不是很了解的话,那比较简单的办法还是直接提交作业试试看,去日志里面搜"Request slot with profile"就能够看到slot的资源需求。 Thank you~ Xintong Song

Re: question: jvm heap size per task?

2019-12-25 Thread Xintong Song
] and FLIP-56 [2] for more details. Another related effort is pluggable slot manager [3], which allows having pluggable resource scheduling strategies such as launch task managers with customized resources according to the tasks' requirements. Thank you~ Xintong Song [1] https://cwiki.apache.org

Re: Flink1.9批任务yn和ys对任务的影响

2019-12-24 Thread Xintong Song
Hi faaron, Flink 1.9 中 -yn参数应该是不生效的,后续版本中已经删除了这个参数。 根据你的参数,在每个 TM 的内存为30G不变的情况下,每个 TM 中的slot个数(-ys)从5变成10,也就意味着平均每个slot占用的内存变为了原来的一半。 Flink 1.9 的sql batch 算子对 flink managed memory 是有确定的需求的,很可能是这个变化导致单个 slot 的managed memory无法满足算子的资源需求了。 Thank you~ Xintong Song On Wed, Dec 25, 2019 at 11:09 AM

Re: The assigned slot bae00218c818157649eb9e3c533b86af_11 was removed

2019-12-24 Thread Xintong Song
这个应该不是root cause,slot was removed通常是tm挂掉了导致的,需要找下对应的tm日志看下挂掉的原因。 Thank you~ Xintong Song On Tue, Dec 24, 2019 at 10:06 PM hiliuxg <736742...@qq.com> wrote: > 偶尔发现,分配好的slot突然就被remove了,导致作业重启,看不出是什么原因导致?CPU和FULL GC都没有,异常信息如下: > > org.apache.flink.util.FlinkException: Th

Re: Flink On K8s, build docker image very slowly, is there some way to make it faster?

2019-12-22 Thread Xintong Song
t;, change the line "FROM openjdk:8-jre-alpine" to point to a domestic or local image source. Thank you~ Xintong Song On Mon, Dec 23, 2019 at 2:46 PM LakeShen wrote: > Hi community , when I run the flink task on k8s , the first thing is that > to build the flink task jar

Re: Flink On K8s, build docker image very slowly, is there some way to make it faster?

2019-12-22 Thread Xintong Song
t;, change the line "FROM openjdk:8-jre-alpine" to point to a domestic or local image source. Thank you~ Xintong Song On Mon, Dec 23, 2019 at 2:46 PM LakeShen wrote: > Hi community , when I run the flink task on k8s , the first thing is that > to build the flink task jar

Re: Re: FLINK 1.9 + YARN+ SessionWindows + 大数据量 + 运行一段时间后 OOM

2019-12-18 Thread Xintong Song
,是针对 flink 1.9 及以前版本的。最新尚未发布的 flink 1.10 中资源配置部分做了比较大的改动,如果有兴趣的话可以等到发布之后关注一下相关的文档。 Thank you~ Xintong Song On Wed, Dec 18, 2019 at 4:49 PM USERNAME wrote: > @tonysong...@gmail.com 感谢回复 > 看了下参数的含义, > taskmanager.memory.off-heap: > 如果设置为true,TaskManager分配用于排序,hash表和缓存中间结果的内存位于JVM堆

Re: FLINK 1.9 + YARN+ SessionWindows + 大数据量 + 运行一段时间后 OOM

2019-12-17 Thread Xintong Song
你这个不是OOM,是 container 内存超用被 yarn 杀掉了。 JVM 的内存是不可能超用的,否则会报 OOM。所以比较可能是 RocksDB 的内存够用量增加导致了超用。 建议: 1. 增加如下配置 taskmanager.memory.off-heap: true taskmanager.memory.preallocate: false 2. 若果已经采用了如下配置,或者改了配置之后仍存在问题,可以尝试调大下面这个配置,未配置时默认值是0.25 containerized.heap-cutoff-ratio Thank you~ Xintong Song

Re: Questions about taskmanager.memory.off-heap and taskmanager.memory.preallocate

2019-12-16 Thread Xintong Song
, but will have to wait until the GC to be truly released. Thank you~ Xintong Song On Tue, Dec 17, 2019 at 12:30 PM Ethan Li wrote: > Thank you very much Xintong! It’s much clear to me now. > > I am still on standalone cluster setup. Before I was using 350GB on-heap > memory on a 378G

Re: Documentation tasks for release-1.10

2019-12-16 Thread Xintong Song
Thank you Kostas. Big +1 for keeping all the documentation related issues at one place. I've added the documentation task for resource management. Thank you~ Xintong Song On Mon, Dec 16, 2019 at 5:29 PM Kostas Kloudas wrote: > Hi all, > > With the feature-freeze for the rel

Re: [DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

2019-12-02 Thread Xintong Song
Sorry, I just realized that I've send my feedbacks to Jingsong's email address, instead of the dev / user mailing list. Please find my comments below. Thank you~ Xintong Song On Wed, Nov 27, 2019 at 4:32 PM Xintong Song wrote: > As a participant of the discussion yesterday, I'm

Re: Flink提jar包部署到Yarn上报错

2019-10-20 Thread Xintong Song
看报错是TM挂了,具体原因需要分析TM日志,有可能是上面答复中相同的问题,也有可能是其他原因造成的。 Thank you~ Xintong Song On Mon, Oct 21, 2019 at 11:36 AM hery...@163.com wrote: > 参考: > http://mail-archives.apache.org/mod_mbox/flink-user-zh/201905.mbox/%3c2019052911134683852...@wsmtec.com%3E > > > > hery.

Re: 如何限制blink中资源使用上限(perjob模式)

2019-10-20 Thread Xintong Song
你好, blink perjob模式是根据job的资源需求按需申请资源的,不能限制整个job的资源上限。 你列出来的这几个参数,只能控制单个TM的资源上限,但是单个TM的资源上限减少了,整个job的资源需求并不会变,只是会申请更多的TM。 Thank you~ Xintong Song On Sat, Oct 19, 2019 at 3:56 PM 蒋涛涛 wrote: > Hi all, > > 我在使用blink提交的任务的时候(perjob模式),如何限制任务的资源使用上限啊,有个任务使用yarn的vcores特别多 &

Re: Batch Job in a Flink 1.9 Standalone Cluster

2019-10-10 Thread Xintong Song
and optimizing performance. Thank you~ Xintong Song On Thu, Oct 10, 2019 at 7:55 PM Timothy Victor wrote: > After a batch job finishes in a flink standalone cluster, I notice that > the memory isn't freed up. I understand Flink uses it's own memory > manager and just allocate

Re: Flink 1.8 版本如何进行 TaskManager 的资源控制

2019-10-08 Thread Xintong Song
=' 。 Thank you~ Xintong Song On Tue, Oct 8, 2019 at 1:59 PM LakeShen wrote: > Flink任务自身无法隔离CPU,我想了一下,在内存方面,你可以结合用户输入的参数提前计算出来任务使用的内存大小,同样,VCore也是。 > 最近我们这边也准备限制用户申请的资源。 > > 龙逸尘 于2019年10月7日周一 下午4:50写道: > > > Dear community, > > 我搭建了一个实时计算平台,由于历史遗留问题,目前使用的 Flink 版本是社区版1

Re: 怎么执行flink代码里边的测试用例

2019-09-29 Thread Xintong Song
首先你要进入测试所在module的目录,在你这个例子中是 flink-connnectors\flink-connector-kafka-base\ 然后执行 mvn -Dtest=KafkaProducerTestBase#testExactlyOnceCustomOperator test -Dtest=后面可以跟<类名>#<方法名>执行某个测试用例,也可以跟<类名>执行某个类的所有测试用例 Thank you~ Xintong Song On Sun, Sep 29, 2019 at 4:32 PM gaofei

Re: 任务内存增长

2019-08-27 Thread Xintong Song
这个邮件列表看不到图片附件的,文本内容可以直接贴出来,图片的话需要放外部链接 Thank you~ Xintong Song On Tue, Aug 27, 2019 at 5:17 PM 张坤 wrote: > > 感谢您的回复,checkpoint使用的rocksDB,现在查看GC日志得到以下信息,堆内存使用正常,线程数使用在500左右,线程回收,但是线程占用的内存好像并没有回收掉。 > > 在 2019/8/27 下午5:02,“Xintong Song” 写入: > > 你用的是heap state backe

Re: 任务内存增长

2019-08-27 Thread Xintong Song
你用的是heap state backend吗?可以看下checkpoint size是否持续在增大,如果是的话很可能就是state增大导致的。作业运行后,随着处理的数据越来越多,state的key数量也会越来越多,大小随之增大。解决方案要么是改用RocksDB,要么是把tm内存配大为state增大留出富裕。 另外,如果checkpoint size持续增长没有趋于平缓的趋势,那么也可能state的使用有问题。 如果观察到不是state的问题,那么可能需要dump下tm的内存,看看是否哪里有内存泄露的情况。 Thank you~ Xintong Song On Mon

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Xintong Song
Congratulations! Thanks Gordon and Kurt for being the release managers, and thanks all the contributors. Thank you~ Xintong Song On Thu, Aug 22, 2019 at 2:39 PM Yun Gao wrote: > Congratulations ! > > Very thanks for Gordon and Kurt for managing the release and very

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Xintong Song
Congratulations Andery~! Thank you~ Xintong Song On Wed, Aug 14, 2019 at 3:31 PM Oytun Tez wrote: > Congratulations Andrey! > > I am glad the Flink committer team is growing at such a pace! > > --- > Oytun Tez > > *M O T A W O R D* > The World's Fastest Human

Re: Flink program,Full GC (System.gc())

2019-08-13 Thread Xintong Song
omatically, as long as there are continuous activities of creating / destroying objects in heap, e.g., due to heartbeats. Please refer to java garbage collection documents [1] for more details. Thank you~ Xintong Song [1] https://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/

Re: some slots are not be available,when job is not running

2019-08-12 Thread Xintong Song
Hi, It would be good if you can provide the job manager and task manager log files, so that others can analysis the problem? Thank you~ Xintong Song On Mon, Aug 12, 2019 at 10:12 AM pengcheng...@bonc.com.cn < pengcheng...@bonc.com.cn> wrote: > Hi all, > some slots are not be av

Re: Re: 有一些TaskManager的slot不可用,尽管没有任务正在运行

2019-08-12 Thread Xintong Song
你的问题描述比较笼统,最好是能提供一些详细的信息和日志,这样其他人才好帮助你。 例如你用的是哪个版本的flink,运行的是什么模式 (perjob / session),是在什么环境下运行的(standalone / yarn / mesos / k8s),是如何判断slot没有被释放的等。 Thank you~ Xintong Song On Mon, Aug 12, 2019 at 3:57 AM pengcheng...@bonc.com.cn < pengcheng...@bonc.com.cn> wrote: > 你好,谢谢,图片显示确实有问题,不

Re: 有一些TaskManager的slot不可用,尽管没有任务正在运行

2019-08-09 Thread Xintong Song
Hi, 邮件中的图片显示不出来。Flink邮件列表的图片附件是有点问题的,如果是截图最好上传到其他地方然后把链接贴出来。 Thank you~ Xintong Song On Fri, Aug 9, 2019 at 10:06 AM pengcheng...@bonc.com.cn < pengcheng...@bonc.com.cn> wrote: > 各位大佬: > > 有对这种情况比较了解的吗?任务结束后,一些slot并没有释放掉。 > > > 如图所示: > > > > > ---

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Xintong Song
Congratulations~! Thank you~ Xintong Song On Wed, Aug 7, 2019 at 4:00 PM vino yang wrote: > Congratulations! > > highfei2...@126.com 于2019年8月7日周三 下午7:09写道: > > > Congrats Hequn! > > > > Best, > > Jeff Yang > > > > > > Origi

Re: Dynamically allocating right-sized task resources

2019-08-05 Thread Xintong Song
/projects/flink/flink-docs-release-1.8/concepts/runtime.html#task-slots-and-resources Thank you~ Xintong Song On Sun, Aug 4, 2019 at 9:40 PM Chad Dombrova wrote: > Hi all, > First time poster, so go easy on me :) > > What is Flink's story for accommodating task workloads with vastly

Re: Support priority of the Flink YARN application in Flink 1.9

2019-07-31 Thread Xintong Song
Thanks for bringing this up, Boxiu. The problem make sense to me. For me the concern is should we limit the priorities to 1-9 or not. I think it would be good to open a jira issue and have the discussion there. Thank you~ Xintong Song On Wed, Jul 31, 2019 at 12:22 PM tian boxiu wrote

Re: Memory constrains running Flink on Kubernetes

2019-07-23 Thread Xintong Song
container OOM, then it should be non-java memory used by RocksDB. [1] https://docs.oracle.com/javase/8/docs/api/java/lang/management/MemoryMXBean.html Thank you~ Xintong Song On Tue, Jul 23, 2019 at 8:42 PM wvl wrote: > Hi, > > We're running a relatively simply Flink applicatio

Re: Question in the tutorial

2019-07-12 Thread Xintong Song
all the new generated files under the log dir here? Thank you~ Xintong Song On Fri, Jul 12, 2019 at 2:25 PM Karthik Guru wrote: > Hello Xintong > > Thanks for your reply. > > 1)I have attached screenshots of the logs directory (Screenshot 1,2) > > 2) In conf/slaves, all

Re: Question in the tutorial

2019-07-11 Thread Xintong Song
by the cluster. - You can find the logs in 'log/flink-*-standalonesession-*.log'. - If you cannot find the problem from the logs, you can also post them in this ML for help. Thank you~ Xintong Song On Fri, Jul 12, 2019 at 1:57 AM Karthik Guru wrote: > Hey Flink team, > > Novic

Re: Apache Flink - Gauge implementation

2019-07-10 Thread Xintong Song
Hi Singh, Could your problem be solved by simply record the previous value and subtract it from the new value? Thank you~ Xintong Song On Wed, Jul 10, 2019 at 3:33 PM M Singh wrote: > Hi: > > I am working on a Flink application and need to collect the number of > events of

Re: YarnResourceManager unresponsive under heavy containers allocations

2019-07-10 Thread Xintong Song
Thanks for the kindly offer, Qi. I think this work should not take much time, so I can take care of it. It's just the community is currently under feature freeze for release 1.9, so we need to wait until the release code branch being cut. Thank you~ Xintong Song On Wed, Jul 10, 2019 at 1:55

Re: Unable to start task manager in debug mode

2019-07-08 Thread Xintong Song
`. Thank you~ Xintong Song On Tue, Jul 9, 2019 at 4:41 AM Vishwas Siravara wrote: > Hi guys, > I am not able to start a stand alone session with one task manager and one > job manager on the same node by adding debug option in flink-conf.yaml as > env.java.opts: > -agentlib

Re: 注册缓存文件的热更新问题

2019-07-04 Thread Xintong Song
你好, 这个应该是不可以的。 Thank you~ Xintong Song On Thu, Jul 4, 2019 at 4:29 PM 戴嘉诚 wrote: > 大家好: > > 我在flink中看到可以注册一个分布式缓存文件StreamExecutionEnvironment.registerCachedFile()然后可以广播到每个tm上给算子使用,那么我想问问,这个文件可以检测到文件更新了,然后会重新广播过去嘛?因为ip会可能会每天都有改变,所以ip库要每天都更新。 > >

Re: flink tasks在taskmanager上分布不均衡

2019-07-01 Thread Xintong Song
你好, 社区此前已经发现你所遇到的问题了,会在后续版本中修复,目前规划的是在1.7.3, 1.8.2, 1.9.0几个版本中修复。详情可以参考: https://issues.apache.org/jira/browse/FLINK-12122 Thank you~ Xintong Song On Tue, Jul 2, 2019 at 11:27 AM Ever <439674...@qq.com> wrote: > 我们测试环境的flink集群(flink 1.8),taskmanager有3个,每个有10个slot。 > > 然后我有一个jo

Re: Flink的Slot是如何做到平均划分TM内存的?

2019-07-01 Thread Xintong Song
you~ Xintong Song On Mon, Jul 1, 2019 at 8:59 PM 徐涛 wrote: > Hi All, > 在官方文档里面,有介绍说多个Slot之间可以平均划分TM内存。 > > 但是我在Flink的源代码里面并没有找到Slot平均划分TM内存的代码。而且不太明白的是,同一个JVM内,不同Slot平均划分内存的实现原理是什么? > 非常感谢! > > > 谢谢 > 徐涛

Re: Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-28 Thread Xintong Song
hen running jobs IDE. Thank you~ Xintong Song On Fri, Jun 28, 2019 at 8:45 PM M Singh wrote: > Hi Xintong: > > I passed the -Dparallelism.default=2 in the run configuration VM > arguments for IntelliJ. > > So what I am looking for is a way to overwrite the config parameters which

Re: Apache Flink - Running application with different flink configurations (flink-conf.yml) on EMR

2019-06-26 Thread Xintong Song
Hi Singh, You can use the environment variable "FLINK_CONF_DIR" to specify path to the directory of config files. You can also override config options with command line arguments prefixed -D (for yarn-session.sh) or -yD (for 'flink run' command). Thank you~ Xintong Song On Wed, Ju

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
"NONE". 2. It "ClassPathJobGraphRetriever#retrieveJobGraph" actually invoked, and is there any exceptions thrown from it. This is to verify whether the correct code path for job cluster is invoked. Thank you~ Xintong Song On Tue, Jun 4, 2019 at 10:48 AM Boris Lublinsky < boris.lublin...@ligh

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
So here are my questions: 1. What environment do you run Flink in? Is it locally, on Yarn or Mesos? 2. How do you trigger "restart a Job Master"? Thank you~ Xintong Song On Tue, Jun 4, 2019 at 10:35 AM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > Thanks, >

Re: Flink job server with HA

2019-06-03 Thread Xintong Song
, can you explain what are the detailed operation steps do you perform when you say "trying to restart a Job Master". Thank you~ Xintong Song On Mon, Jun 3, 2019 at 10:05 PM Boris Lublinsky < boris.lublin...@lightbend.com> wrote: > I am trying to experiment with Flink Job s

Re: How many task managers to launch for a job?

2019-05-26 Thread Xintong Song
be a simple choice because it avoids tuning these two relevant factors at the same time. Thank you~ Xintong Song On Sat, May 25, 2019 at 4:54 AM black chase wrote: > Hi Song, > You said "In that way, the total slots (or number of TaskManagers if you > config on slot for each Tas

Re: How many task managers to launch for a job?

2019-05-24 Thread Xintong Song
As far as I know, Flink does not have any requirements on how the TaskManagers are distributed across physical machines. So I think it really depends on the scheduling policy of the Mesos cluster. I'm not an expert on Mesos, so correct me if I was wrong. Thank you~ Xintong Song On Fri, May 24

Re: How many task managers can a cluster reasonably handle?

2019-05-23 Thread Xintong Song
performance bottleneck, usually it can be solved by increasing the JobManager's resources with a '-jm' argument. Thank you~ Xintong Song On Fri, May 24, 2019 at 2:33 AM Antonio Verardi wrote: > Hello Flink users, > > How many task managers one can expect a Flink cluster to be able to >

Re: How many task managers to launch for a job?

2019-05-23 Thread Xintong Song
1.8/concepts/runtime.html#task-slots-and-resources> . Thank you~ Xintong Song On Thu, May 23, 2019 at 11:49 PM black chase wrote: > > Hi, > > I am redesigning the scheduler of the JobManager to place tasks of a job > across TaskManagers accroding to a scheduling policy. > &

Re: Flink cluster log organization

2019-05-20 Thread Xintong Song
that do not belong to any particular job. For the latter, there are two ways that I can think of to specify log file path for a Flink cluster. - Set evn.log.dir in flink-conf.yaml - Modify log4j.appender.file.file in log4j.properties Hope that helps. Thank you~ Xintong Song On Tue

Re: questions about yarn container(taskmanager)memory allocation

2019-05-20 Thread Xintong Song
about resource management of Flink. AFAIK, the native memory is on the table, but I'm not sure whether the metaspace memory is considered. I think we should create a jira issue on this. Thank you~ Xintong Song On Mon, May 20, 2019 at 4:47 PM XiangWei Huang wrote: > Hi all, > Currentl

Re: flink metrics的 Reporter 问题

2019-05-15 Thread Xintong Song
取hostname的第一部分是为了和hdfs的用法保持一致,可以参考一下当时的issue,作者专门提到了为什么要这么做。 https://issues.apache.org/jira/browse/FLINK-1170?focusedCommentId=14175285=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14175285 Thank you~ Xintong Song On Wed, May 15, 2019 at 9:11 PM Yun Tang wrote

Re: taskmanager faild

2019-04-25 Thread Xintong Song
hi naisili, 我没有在你的邮件里看到任何附件、截图或者文字描述的错误,麻烦你再确认一次。 Thank you~ Xintong Song On Fri, Apr 26, 2019 at 10:46 AM naisili Yuan wrote: > 还是集群稳定性问题,发现了这个错误,我想问下是不是我配置集群高可用的问题,是否不依赖zookeeper会更稳定一点。 > 希望得到回复,谢谢! > > naisili Yuan 于2019年4月22日周一 下午2:23写道: > >> 不好意思,我忘记贴图了。 >>

Re: taskmanager faild

2019-04-21 Thread Xintong Song
Hi naisili, This is the user-zh mailing list, so if you speak Chinese you can ask questions in Chinese. If you prefer using English, you can send emails to u...@flink.apache.org. Hope that helps you. BTW, I think you forgot to attache the screenshot. Thank you~ Xintong Song On Mon, Apr 22

<    1   2   3   4