Splitting in Stream Formats for File Source

2023-08-16 Thread Chirag Dewan via user
Hi,I am trying to collect files from HDFS in my DataStream job. I need to collect two types of files - CSV and Parquet.  I understand that Flink supports both formats, but in Streaming mode, Flink doesnt support splitting these formats. Splitting is only supported in Table API. I wanted to

Re: 背压分析

2023-08-16 Thread yidan zhao
1 可控范围即可。 2 分析阶段可以分开,实际运行阶段看情况,怎样性能高就如何搞。 3 看监控,flink web ui有根据每个节点的反压情况按照不同颜色展示。 星海 <2278179...@qq.com.invalid> 于2023年8月16日周三 22:03写道: > > hello。大家好,请教几个问题: > 1、flink中背压存在是合理的吗?还是在可控范围内就行?还是尽可能没有呢? > 2、如果出现背压,如果多个operator chain 在一起不好分析,需要先将其拆开分析吗? >

RE: Flink AVRO to Parquet writer - Row group size/Page size

2023-08-16 Thread Kamal Mittal via user
Hello Community, Please share views for below. Rgds, Kamal From: Kamal Mittal via user Sent: 16 August 2023 04:35 PM To: user@flink.apache.org Subject: Flink AVRO to Parquet writer - Row group size/Page size Hello, For Parquet, default row group size is 128 MB and Page size is 1MB but Flink

Re: Flink k8s operator - managde from java microservice

2023-08-16 Thread Yaroslav Tkachenko
Hi Krzysztof, You may want to check flink-kubernetes-operator-api ( https://mvnrepository.com/artifact/org.apache.flink/flink-kubernetes-operator-api), here's an example for reading FlinkDeployments

Flink k8s operator - managde from java microservice

2023-08-16 Thread Krzysztof Chmielewski
Hi, I have a use case where I would like to run Flink jobs using Apache Flink k8s operator. Where actions like job submission (new and from save point), Job cancel with save point, cluster creations will be triggered from Java based micro service. Is there any recommended/Dedicated Java API for

Flink AVRO to Parquet writer - Row group size/Page size

2023-08-16 Thread Kamal Mittal via user
Hello, For Parquet, default row group size is 128 MB and Page size is 1MB but Flink Bulk writer using file sink create the files based on checkpointing interval only. So is there any significance of configured row group size and page size for Flink parquet bulk writer? How Flink uses these

Re: [Question] Good way to monitor data skewness

2023-08-16 Thread Hang Ruan
Hi, Dennis. As Ron said, we could judge this situation by the metrics. We are usually reporting the metrics to the external system like Prometheus by the metric reporter[1]. And these metrics could be shown by some other tools like grafana[2]. Best, Hang [1]

Re: Flink throws exception when submitting a job through Jenkins and Spinnaker

2023-08-16 Thread elakiya udhayanan
Hi Shammon, Thanks for your response. If it is a network issue as you have mentioned, how does it read the contents of the jar file, we can see that the code is read and it throws an error only when executing the SQL. Also can you let us know exactly what address could be wrong here, so that we

Re: [Question] Good way to monitor data skewness

2023-08-16 Thread liu ron
Hi, Dennis, Although all operators are chained together, each operator metrics is there, you can view the metrcis related to the corresponding operator's input and output records through the UI, as following: [image: image.png] Best, Ron Dennis Jung 于2023年8月16日周三 14:13写道: > Hello people, >

[Question] Good way to monitor data skewness

2023-08-16 Thread Dennis Jung
Hello people, I'm trying to monitor data skewness with 'web-ui', between TaskManagers. Currently all operators has been chained, so I cannot find how data has been skewed to TaskManagers (or subtasks). But if I disable chaining, AFAIK, it can degrade performance.

Re: [Issue] Repeatedly receiving same message from Kafka

2023-08-16 Thread Dennis Jung
Hello, Thank you. I think it is a problem caused by Kafka configuration, not Flink. I'll take a look and let you know if there's an issue in Flink. BR, Jung 2023년 8월 15일 (화) 오후 9:40, Hector Rios 님이 작성: > Hi there > > It would be helpful if you could include the code for your pipeline. One >