Re: Question about processing a 3-level List data type in parquet

2020-11-06 Thread Peter Huang
Hi Naehee, Thanks for reporting the issue. Yes, it is a bug in the ParquetInputFormat. Would you please create a jira ticket and assign to me. I will try to fix it by the end of this weekend. My Jira account name Zhenqiu Huang. Thanks Best Regards Peter Huang On Wed, Nov 4, 2020 at 11:57 PM

回复:请教一下目前flink submit能不能指定额外的依赖jar

2020-11-06 Thread 18868816710
不能在 Submit 模式指定的原因是 Flink 提交模式和 Spark 不同。Flink 是在客户端需要编译 JobGgraph,这样导致在客户端运行时候就需要你所加载的 Jar, 而 Submit 模式的话,只是说把 Jar 包加载到运行的 Classpath 下;但是 Spark 程序的编译是在 服务端做的。 PS:目前 Flink-1.11 的话支持 Application 模式,有点类似 Spark 的提交模式,在这个模式下可以尝试看下是否 可以在 Submit 模式下指定。 | | | | | 在2020年11月06日 11:12,silence 写道:

Re:Re: Re:Re: Flink StreamingFileSink滚动策略

2020-11-06 Thread hailongwang
Hi bradyMk, Bulk-encoded Formats 只能在 Checkpoint 时滚动,详见文档一[1]. Best, Hailong Wang [1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html#bulk-encoded-formats 在 2020-11-06 10:47:33,"bradyMk" 写道: >Hi,guoliang_wang1335

Re:请教大神们关于flink-sql中数据赋值问题

2020-11-06 Thread hailongwang
Hi si_tianqiang, 自定义 UDF 可以解决你的问题吗? 比如 接收 kakfa 的数据字段定义成 hbaseQuery,然后自定义 UDF 去根据 query 查询数据。 Best, Hailong Wang 在 2020-11-06 10:41:53,"site" 写道: >看了官网的示例,发现sql中传入的值都是固定的,我有一个场景是从kafka消息队列接收查询条件,然后通过flink-sql映射hbase表进行查询并写入结果表。我使用了将消息队列映射表再join数据表的方式,回想一下这种方式很不妥,有什么好的方法实现sql入参的动态查询呢?

Re: union stream vs multiple operators

2020-11-06 Thread Alexey Trenikhun
Ok, thank you. From: Chesnay Schepler Sent: Thursday, November 5, 2020 3:15:28 PM To: Alexey Trenikhun ; Flink User Mail List Subject: Re: union stream vs multiple operators I don't think the first option has any benefit. On 11/5/2020 1:19 AM, Alexey

Join Bottleneck

2020-11-06 Thread Rex Fenley
Hello, I have a Job that's a series of Joins, GroupBys, and Aggs and it's bottlenecked in one of the joins. The join's cardinality is ~300 million rows on the left and ~200 million rows on the right all with unique keys. I'm seeing this in the plan for that bottlenecked Join.

Re: Rules of Thumb for Setting Parallelism

2020-11-06 Thread Rex Fenley
Great, thanks! So just to confirm, configure # of task slots to # of core nodes x # of vCPUs? I'm not sure what you mean by "distribute them across both jobs (so that the total adds up to 32)". Is it configurable how many task slots a job can receive, so in this case I'd provide ~30/36 * 32 task

Re: Rules of Thumb for Setting Parallelism

2020-11-06 Thread Till Rohrmann
Hi Rex, as a rule of thumb I recommend configuring your TMs with as many slots as they have cores. So in your case your cluster would have 32 slots. Then depending on the workload of your jobs you should distribute them across both jobs (so that the total adds up to 32). A high number of

Re: How to use properly the function: withTimestampAssigner((event, timestamp) ->..

2020-11-06 Thread Till Rohrmann
Hi Simone, The problem is that the Java 1.8 compiler cannot do type inference when chaining methods [1]. The solution would be WatermarkStrategy wmStrategy = WatermarkStrategy .forMonotonousTimestamps()

Re: cannot pull statefun docker image

2020-11-06 Thread Tzu-Li (Gordon) Tai
Hi, The Dockerfiles in the examples in the flink-statefun repo currently work against images built from snapshot development branches. Ververica has been hosting StateFun base images for released versions: https://hub.docker.com/r/ververica/flink-statefun You can change `FROM flink-statefun:*`

cannot pull statefun docker image

2020-11-06 Thread Lian Jiang
Hi, I tried to build statefun-greeter-example docker image with "docker build ." but cannot pull the base statefun docker image due to access denied. Any idea? Thanks. $ docker login Authenticating with existing credentials... Login Succeeded

How to use properly the function: withTimestampAssigner((event, timestamp) ->..

2020-11-06 Thread Simone Cavallarin
Hi, I'm taking the timestamp from the event payload that I'm receiving from Kafka. I'm struggling to get the time and I'm confused on how I should use the function ".withTimestampAssigner()". I'm receiving an error on event.getTime() that is telling me: "cannot resolve method "Get Time" in

Re: Native kubernetes setup

2020-11-06 Thread Yang Wang
Actually, in our document, we have provided a command[1] to create the service account. It is similar to your yaml file. $ kubectl create clusterrolebinding flink-role-binding-default --clusterrole=edit --serviceaccount=default:default Unfortunately, we could not support mounting a PVC. We plan

Re: flink 1.11 on k8s native session cluster模式报找不到configmap

2020-11-06 Thread Yang Wang
失败的根本原因应该不是ConfigMap找不到,warning的那个信息是因为创建JobManager deployment的时候 ConfigMap还没创建出来,不会导致失败的。 你可以参考这个地方[1]把JobManager的log的打到console里面,然后用kubectl logs 来查看,这样 可以排查JobManager一直crash backoff的原因 [1].

Re: Re: 关于cluster.evenly-spread-out-slots参数的底层原理

2020-11-06 Thread Evan
发件人: Shawn Huang 发送时间: 2020-11-06 16:56 收件人: user-zh 主题: Re: 关于cluster.evenly-spread-out-slots参数的底层原理 我说一下我看源码(1.11.2)之后的理解吧,不一定准确,仅供参考。 cluster.evenly-spread-out-slots 这个参数设置后会作用在两个地方: 1. JobMaster 的 Scheduler 组件 2. ResourceManager 的 SlotManager 组件 对于 JobMaster 中的 Scheduler, 它在给

Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2020-11-06 Thread Flavio Pompermaier
I think it's ok.. I suggest also to add JobStatus to onJobExecuted() so you can immediately know if the job finished successfully or if it is was failed or canceled. Thanks for the help, Flavio On Fri, Nov 6, 2020 at 10:41 AM Kostas Kloudas wrote: > Hi Flavio, > > Coould this

Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2020-11-06 Thread Kostas Kloudas
Hi Flavio, Coould this https://issues.apache.org/jira/browse/FLINK-20020 help? Cheers, Kostas On Thu, Nov 5, 2020 at 9:39 PM Flavio Pompermaier wrote: > > Hi everybody, > I was trying to use the JobListener in my job but onJobExecuted() on Flink > 1.11.0 but I can't understand if the job

flink cdc时间问题

2020-11-06 Thread 赵帅
关于cdc有个问题,求大佬能否解释下,是解析bin log的bug还是自己代码bug; mysql数据库中表,创建时间和修改时间设置为current_timestamp 场景一:插入数据 插入数据时忽略创建时间和修改时间字段 cdc接入后,转存到hbase中,转为字符串时间,时间少8个小时 确认了,程序运行的服务器时间、mysql服务器的时间,和hbase服务器的时间,均为UTC+0800时区 场景二:重启服务,重新读取数据 此时cdc接入数据,会将最后的数据拿出来写入hbase,此时按照同样的执行,数据库时间也是放的正确时间,hbase时间也能吻合

Re: Failure to execute streaming SQL query

2020-11-06 Thread Danny Chan
Hi, Satyam ~ What version of Flink release did you use? I tested your first SQL statements in local and they both works great. Your second SQL statement fails because currently we does not support stream-stream join on time attributes because the join would breaks the semantic of time attribute

Re: Flink TLS in K8s

2020-11-06 Thread Patrick Eifler
Hi Chesney, Thanks for the hint. I have mounted my certs in both job and taskmanager volume mounts. When the containers bootup I get the log that the ssl store is successfully loaded. Note: I use the same keystore setup to connect to secured Kafka Cluster and this works. How would you suggest

Re: 关于cluster.evenly-spread-out-slots参数的底层原理

2020-11-06 Thread Shawn Huang
我说一下我看源码(1.11.2)之后的理解吧,不一定准确,仅供参考。 cluster.evenly-spread-out-slots 这个参数设置后会作用在两个地方: 1. JobMaster 的 Scheduler 组件 2. ResourceManager 的 SlotManager 组件 对于 JobMaster 中的 Scheduler, 它在给 execution vertex 分配 slot 是按拓扑排序的顺序依次进行的。 Scheduler 策略是会倾向于把 execution vertex 分配到它的上游节点所分配到的slot上, 因此在给某个具体 execution

KeyBy如何映射到物理分区

2020-11-06 Thread zxyoung
Hi,请教下各位: 我的场景是现在有个Keyby操作,但是我需要指定某一个key落地在某一个具体物理分区中。 我注意到keyby中得KeySelector仅仅是逻辑的分区,其实还是通过hash的方式来物理分区,没有办法指定哪一个key到哪一个分区去做。 我尝试使用partitionCustom中带有partitioner和keySelector的参数函数,但是发现没有办法直接使用类似Sum一类的聚合函数,实际测试发现Sum会将同一物理分区、但是不同Key的值都累加起来。