Re: Create a lookup table in a StreamExecutionEnvironment

2021-09-10 Thread JING ZHANG
Hi Robert, First of all, the built-in Kafka connector source is not a `LookupTableSource`. If we use Kafka as a lookup table, we need to implement a user-defined source [1]. Secondly, about how to define a user-defined lookup table source for Kafka, I'm not an expert in Kafka, please correct me

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread tao xiao
Thanks David for the tips. We have been running Flink with no performance degradation observed in EMR (which is EBS attached) for more than 1 year therefore we believe the same performance can be applied in Kubernetes. On Sat, Sep 11, 2021 at 3:13 AM David Morávek wrote: > OT: Beware that even

Re: [External] : Re: Use FlinkKafkaConsumer to synchronize multiple Kafka topics

2021-09-10 Thread Yan Wang
Hi Arvid, Thanks for your reply. Yes, the warning is throwed by Kafka-clients. Here is the warning log after I deleted the topic that Kafka consumer is listening to. 18:46:27,297 WARN org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-2, groupId=osstest] Error while fetching

Re: CEP library support in Python

2021-09-10 Thread Seth Wiesman
Hi Pedro, The DataStream CEP library is not available in Python but you can use `MATCH_RECOGNIZE` in the table API which is implemented on-top of the CEP library from Python. https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/match_recognize/ Seth On Fri,

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread David Morávek
OT: Beware that even if you manage to solve this, EBS is replicated network storage, therefore rocksdb performance will be affected significantly. Best, D. On Fri 10. 9. 2021 at 16:19, tao xiao wrote: > The use case we have is to store the RocksDB sst files in EBS. The EC2 > instance type (m5)

TaskManagers OOM'ing for Flink App with very large state only when restoring from checkpoint

2021-09-10 Thread Kevin Lam
Hi all, We've seen scenarios where TaskManagers will begin to OOM, shortly after a job restore from checkpoint. Our flink app has a very large state (100s of GB) and we use RocksDB as a backend. Our repro is something like this: run the job for an hour and let it accumulate state, kill a task

Create a lookup table in a StreamExecutionEnvironment

2021-09-10 Thread Robert Cullen
I have a developer that wants to create a lookup table in Kafka with data that will be used later when sinking with S3. The lookup table will have folder information that will be used as a Bucket Assigner in the StreamingFileSink. I thought using the Table API to generate the lookup table and

CEP library support in Python

2021-09-10 Thread Pedro Silva
Hello, Is Flink's CEP library available in python? From the documentation I see no references so I'm guessing the answer is no but wanted some confirmation from the community or developers. Are there plans to support

Re: DataStreamAPI and Stateful functions

2021-09-10 Thread Barry Higgins
Thanks Igal, I appreciate you coming back to me. I have quickly tried the fat jar solution as you've gone through it and am running into an exception: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Failed to execute job 'StatefulFunctions'.

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread tao xiao
The use case we have is to store the RocksDB sst files in EBS. The EC2 instance type (m5) we use doesn't provide local disk storage therefore EBS is the only option to store the local sst file. On Fri, Sep 10, 2021 at 7:10 PM Yang Wang wrote: > I am afraid Flink could not support creating

Re: Usecase for flink

2021-09-10 Thread Timo Walther
If your graphs fit in memory (at least after an initial partitioning), you could use any external library for graph processing within a single node in a Flink ProcessFunction. Flink is a general data processor that allows to have arbitrary logic where user code is allowed. Regards, Timo On

Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
Good point what is the better option for graph processing with flink.. any suggestions On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser wrote: Hi, Please keep in mind that Gelly is approaching end-of-life [1]  Regards, Martijn [1] 

Re: DataStreamAPI and Stateful functions

2021-09-10 Thread Igal Shilman
Hello Barry, I assume that by "we don't need another installation of Flink to manage the stateful functions." You mean that you already have a running Flink cluster and you would like to submit an additional Flink Job that executes a Stateful functions application? Then perhaps just try to

Re: Streaming Patterns and Best Practices - featuring Apache Flink

2021-09-10 Thread Timo Walther
Thanks for sharing this with us Devin. If you haven't considered it already, maybe this could also be something for next Flink Forward? Regards, Timo On 02.09.21 21:02, Devin Bost wrote: I just released a new video that features Apache Flink in several design patterns: Streaming Patterns

Re: Job crashing with RowSerializer EOF exception

2021-09-10 Thread Timo Walther
I assume you are still using toAppendStream or toRetractStream? Otherwise I'm wondering where the RowSerializer is actually coming from. The new planner doesn't use a row serializer. Debugging serializer issue is difficult. We need more information about the pipeline. Regards, Timo On

Re: Questions regarding broadcast join in Flink

2021-09-10 Thread Timo Walther
Hi Gerald, actually, this is a typical issue when performing a streaming join. An ideal solution would be to block the main stream until the broadcast stream is ready. However, this is currently not supported in the API. In any case, a user needs to handle this in a use case specific way to

Re: Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread chang li
没有开启checkpoint execEnv.enableCheckpointing(checkpointInterval); On 2021/09/10 07:41:10, "xia_...@163.com" wrote: > Hi: > 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume > 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 > >

Re: Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread chang li
没有开启checkpoint execEnv.enableCheckpointing(checkpointInterval); On 2021/09/10 07:41:10, "xia_...@163.com" wrote: > Hi: > 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume > 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 > >

Re: Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread chang li
没有开启checkpoint execEnv.enableCheckpointing(checkpointInterval); On 2021/09/10 07:41:10, "xia_...@163.com" wrote: > Hi: > 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume > 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 > >

Re: Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread chang li
没有开启checkpoint execEnv.enableCheckpointing(checkpointInterval); On 2021/09/10 07:41:10, "xia_...@163.com" wrote: > Hi: > 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume > 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 > >

Re: Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread chang li
没有开启Checkpoint execEnv.enableCheckpointing(checkpointInterval); On 2021/09/10 07:41:10, "xia_...@163.com" wrote: > Hi: > 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume > 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 > >

Re: Usecase for flink

2021-09-10 Thread Martijn Visser
Hi, Please keep in mind that Gelly is approaching end-of-life [1] Regards, Martijn [1] https://flink.apache.org/roadmap.html On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder wrote: > Hi Jing, > > Thanks for the input another question i had was can Gelly be used for > processing the

Re: Usecase for flink

2021-09-10 Thread Timo Walther
Hi Dipanjan, Gelly is built on top of the DataSet API which is a batch-only API that is slowly phasing out. It is not possible to connect a DataStream API program with a DataSet API program unless you go through a connector such as CSV in between. Regards, Timo On 10.09.21 09:09,

Re: Issue while creating Hive table from Kafka topic

2021-09-10 Thread Timo Walther
It seems that your Kafka clients dependency is not in your JAR file. ByteArrayDeserializer is a symptom that seems to occur often. At least, I can find a similar question on Stackoverflow:

Re: How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread Yang Wang
I am afraid Flink could not support creating dedicated PVC for each TaskManager pod now. But I think it might be a reasonable requirement. Could you please share why you need to mount a persistent volume claim per TaskManager? AFAIK, the TaskManager will be deleted once it fails. You expect the

Re: Allocation-preserving scheduling and task-local recovery

2021-09-10 Thread Xiang Zhang
Robert, thank you for your reply! I tried to remove "cluster.evenly-spread-out-slots", and then tested two scenarios: 1) restart the leader job manager; 2) restart a single task manager. These tests are done in a testing environment where I have six task managers and only four tasks to

Re: De/Serialization API to tear-down user code

2021-09-10 Thread Arvid Heise
I created FLINK-24250 [1]. [1] https://issues.apache.org/jira/browse/FLINK-24250 On Fri, Sep 10, 2021 at 10:23 AM Sergio Morales wrote: > Please create the feature request ticket, I have no idea how to do it. > > > > Regards, > > Sergio. > > > > *From: *Arvid Heise > *Date: *Monday, 6

Re: Issue while creating Hive table from Kafka topic

2021-09-10 Thread Harshvardhan Shinde
I'm unable to figure out which dependency to add in order for the ByteArrayDeserializer class to get included in the jar. I have added all the dependencies as per the documentation still unable to figure out the cause. On Fri, Sep 10, 2021 at 12:17 AM Robert Metzger wrote: > Does the jar file

Re: De/Serialization API to tear-down user code

2021-09-10 Thread Sergio Morales
Please create the feature request ticket, I have no idea how to do it. Regards, Sergio. From: Arvid Heise Date: Monday, 6 September 2021 at 18:21 To: Dawid Wysakowicz Cc: Sergio Morales , user Subject: Re: De/Serialization API to tear-down user code I think it's a valid request to have a

Re: State processor API very slow reading a keyed state with RocksDB

2021-09-10 Thread David Causse
Thank you all for the great insights and suggestions! I understand that the underlying components used by the state processor api are sufficiently different that it may explain this slowness and this behavior is not something caused by the way we use this API. David. On Fri, Sep 10, 2021 at

Flink Stream + StreamTableEnvironment 结合使用时checkpoint异常问题

2021-09-10 Thread xia_...@163.com
Hi: 有个问题想请教一下大佬们:正在研究流上join操作,使用FlinkKafkaConsume 消费kafka数据作为数据源,随后关联hbase维度表数据,可以成功关联,但是KafkaSource缺始终没有进行checkpoint,代码中是有设置checkpint的,我想请问一下是需要做其他什么配置吗?代码如下 DataStream kafkaSource = env.addSource(source); Map> sideOutStreamMap = new HashMap<>(); for (RowToColumnBean bean : lists) {

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-10 Thread Puneet Duggal
Hi Robert, Thanks for taking out time to go through the logs. Problem: So reason for restarting all the task managers was to incorporate increased jvm metaspace size for each existing task manager. Currently each taskmanager has 32 slots. But JVM metaspace size was 256 MB which used to get

Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
Hi Jing,     Thanks for the input another question i had was can Gelly be used for processing the graph that flink receives through kafka and then using Gelly i decompose the graph into its nodes and edges and then process them individually through substreams and then write the final output of

Questions regarding broadcast join in Flink

2021-09-10 Thread Gerald.Sula
Hello, I am trying to implement a broadcast join of two streams in flink using the broadcast functionality. In my usecase I have a large stream that will be enriched with a much smaller stream. In order to first test my approach, I have adapted the Taxi ride exercise in the official training

How to mount PVC volumes using Flink Native Kubernetes ?

2021-09-10 Thread Xiaolong Wang
Hi, I'm facing a tough question. I want to start a Flink Native Kubernetes job with each of the task manager pod mounted with an aws-ebs PVC. The first thought is to use the pod-template file to do this, but it soon went to a dead end. Since the pod-template on each of the task manager pod

回复: flink oss ha

2021-09-10 Thread wang xiguang
抱歉图片又被吞了,附件是图片 从 Windows 版邮件发送 发件人: wang xiguang 发送时间: 2021年9月10日 14:09 收件人: user-zh@flink.apache.org 主题: 回复: flink oss ha 您好:

回复: flink oss ha

2021-09-10 Thread wang xiguang
您好: 确认endpoint,ak,sk都正确,又尝试使用oss做状态后端,也是一样的报错。[cid:image003.png@01D7A64D.72A24A10] 看报错是com.aliyun.oss.OSSClient.listObjects方法报的。于是我用普通springboot程序使用oss客户端调用该方法,结果调用正常: [cid:image005.png@01D7A64D.72A24A10]

Re: Documentation for deep diving into flink (data-streaming) job restart process

2021-09-10 Thread Robert Metzger
Thanks for the log. >From the partial log that you shared with me, my assumption is that some external resource manager is shutting down your cluster. Multiple TaskManagers are disconnecting, and finally the job is switching into failed state. It seems that you are not stopping only one

Re: Job crashing with RowSerializer EOF exception

2021-09-10 Thread Yuval Itzchakov
Hi Robert, There's no custom Kryo serializer. It's a RowSerializer that is generating the output of a Table -> DataStream conversion. On Thu, Sep 9, 2021, 21:42 Robert Metzger wrote: > Hi Yuval, > > EOF exceptions during serialization are usually an indication that some > serializers in the