Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-08 Thread Dongwon Kim
Hi David, There are currently no metrics for the async work-queue size (you should be > able to see the queue stats with debug logs enabled though [1]). Thanks for the input but scraping DEBUG messages into, for example, ElasticSearch for monitoring on Grafana is not possible in my current

flink 1.13.2 ?? Java/Scala ?????????? Python UDF??????????yarn-application??????yarn????????????????????????pyflink?

2021-11-08 Thread Asahi Lee
HI! ??flink 1.13.2??java table apipython udf??yarn-applicationyarn??pyflink?

Dependency injection for TypeSerializer?

2021-11-08 Thread Thomas Weise
Hi, I was looking into a problem that requires a configurable type serializer for communication with a schema registry. The service endpoint can change, so I would not want to make it part of the serializer snapshot but rather resolve it at graph construction time (similar to how a Kafka

Re: Getting mini-cluster logs when debugging pyflink from IDE

2021-11-08 Thread Роман VVvKamper
Thanks for help, I’ll look into it :) Two more questions: Is there a way to configure this path? For example to write logs to a file in the working dir? And is there a way to redirect logst from file to stdout? On 9 Nov 2021, at 09:00, Dian Fu mailto:dian0511...@gmail.com>> wrote: Hi, The

Re: FlinkSQL 使用 streamingSink 写入 hive orc数据,如何控制文件数量。

2021-11-08 Thread yidan zhao
关于FlinkSQL写hive,orc格式,性能和稳定性方面有什么建议吗。 比如并行度设置多少合理,目前compact-coordinator并行度定死为1,不可更改应该,compact-operator是60,日常来看compact-operator经常是红色,busy100%。目前问题是偶尔会发现检查点失败,延迟等,导致实际现象是文件没合并,进而inode不足。(我们的inode的quota不足实际是)。

Question about using Calcite for fetching SQL column lineage

2021-11-08 Thread kangqi
Hi community, I’m trying to extract column lineage from Flink SQL jobs (all of them are single INSERT statements). Here’s what I have done: 1. From `SqlToOperationConverter#convertSqlInsert()`, get the `PlannerQueryOperation` generated by the INSERT statement. 2. Get the corresponding

Re: Getting mini-cluster logs when debugging pyflink from IDE

2021-11-08 Thread Dian Fu
Hi, The logs should appear in the log file of the TaskManger and you could find it under directory $PYTHON_INSTALLATION_DIR/site-packages/pyflink/log/ Regards, Dian On Mon, Nov 8, 2021 at 10:53 PM Роман VVvKamper wrote: > Hello, > > I'm trying to debug flink and pyflink job from IDE using

Flink SQL Join ????????minBatch ????????

2021-11-08 Thread ????
Hi : ??Flink 1.12 SQL ??Join ??Kafka DB ?? FOR SYSTEM_TIME AS Temporal Joins ?? ?? ?? In ?? QPS ??

Flink SQL Join 如何使用minBatch 方式查询

2021-11-08 Thread WuKong
Hi : 我现在有一个场景,基于Flink 1.12 SQL 来实现, 想查询下游数据, 大概逻辑就是多张表Join 其中一张是Kafka 表,其他的是DB 表,我基于处理时间 FOR SYSTEM_TIME AS 做了Temporal Joins , 但是下游数据库查询压力比较大, 现在想通过延迟 批量 In 的方式 减小QPS ,请问如何配置 可以调这个时间长度,我理解默认就是来一条就查询一次 --- Best, WuKong

回复: Re: 提交flink作业抛 java.lang.LinkageError

2021-11-08 Thread WuKong
Hi : 看报错日志,还是类加载问题 提示的报错信息 是说已经由不同类加载器已经加装了改依赖。如果生产环境上已经由了相关依赖包,建议将依赖设置为provided Caused by: java.lang.LinkageError: loader constraint violation: loader >> (instance of org/apache/flink/util/ChildFirstClassLoader) previously >> initiated loading for a different type with name >>

????: ????????

2021-11-08 Thread WuKong
Hi : ?? user-zh-unsubscr...@flink.apache.org ?? ??/ ?? https://flink.apache.org/zh/community.html --- Best, WuKong ?? ?? 2021-11-08 14:42 user-zh ??

回复: Re: 提交flink作业抛 java.lang.LinkageError

2021-11-08 Thread WuKong
Hi : 看报错日志,还是类加载问题 提示的报错信息 是说已经由不同类加载器已经加装了改依赖。如果生产环境上已经由了相关依赖包,建议将依赖设置为provided Caused by: java.lang.LinkageError: loader constraint violation: loader >> (instance of org/apache/flink/util/ChildFirstClassLoader) previously >> initiated loading for a different type with name >>

Re: flink1.12.4 写入hdfs报错 java.lang.OutOfMemoryError: Direct buffer memory

2021-11-08 Thread Caizhi Weng
Hi! 可以通过配置 taskmanager.memory.task.off-heap.size 指定 direct memory 和 native memory 的大小,详见 [1]。 [1]

Re:checkpoint??????????

2021-11-08 Thread sunzili
UnknownHostException: mycluster ?? | | | | | On 11/8/2021 16:04<2572805...@qq.com.INVALID> wrote?? : flink on yarn ??flink hdfs,ark1??hdfs??active??standby ark2standbyactive

Beginner: guidance on long term event stream persistence and replaying

2021-11-08 Thread Simon Paradis
Hi, We have an event processing pipeline that populates various reports from different Kafka topics and would like to centralize processing in Flink. My team is new to Flink but we did some prototyping using Kinesis. To enable new reporting based on past events, we'd like the ability to replay

Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-08 Thread Vijay Balakrishnan
Austin -the flink benchmark is for testing Flink on single machines and not a cluster. I did see this https://oceanrep.geomar.de/50729/1/bsc_nico_biernat_thesis.pdf but it is more for testing the Scaling of Flink instead of testing throughput and latency. On Mon, Nov 8, 2021 at 10:54 AM Vijay

Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-08 Thread Vijay Balakrishnan
Thx, Austin. I was hoping there might be a newer benchmark run similar to done by dataArtisans on Flink in 2016(old). https://www.ververica.com/blog/extending-the-yahoo-streaming-benchmark Looks like Yahoo Streaming benchmark was an initial standard in 2016. Hoping to see something updated for

Re: Restarting a job with drain flag set to true

2021-11-08 Thread David Morávek
It's not a recommended approach, but if you're able to handle this "side-effect" downstream (eg. ignore the "wrong / incomplete" results), then you should be OK On Mon, Nov 8, 2021 at 4:36 PM Pedro Facal wrote: > Hi David, > > Thanks for the quick response. Say our app writes to disk every 10

Re: Restarting a job with drain flag set to true

2021-11-08 Thread Pedro Facal
Hi David, Thanks for the quick response. Say our app writes to disk every 10 minutes: the difference is, in one case a single window is emitted and thus a single file is written, while if we drain and restart the pipeline we will end up with two files (because we will have two windows emitted

Re: Restarting a job with drain flag set to true

2021-11-08 Thread David Morávek
Hi Pedro, draining basically means that all of the sources will finish and progress their watermark to end of the global window, which will fire all of the triggers as a result. In other words, it will trigger the _ON_TIME_ results from all of the unfinished windows, even though they might not

Getting mini-cluster logs when debugging pyflink from IDE

2021-11-08 Thread Роман VVvKamper
Hello, I'm trying to debug flink and pyflink job from IDE using mini cluster (local mode). When i doing it in the java flink, everything works like a charm - i can see flink mini-cluster logs in the console. But when i run pyflink job in local mode (through the IDE of by simply calling

flink1.12.4 写入hdfs报错 java.lang.OutOfMemoryError: Direct buffer memory

2021-11-08 Thread xiao cai
通过flink 1.12.4 streaming file sink 写入hdfs,运行过程中抛出以下异常: 2021-11-08 20:39:05 java.io.IOException: java.lang.OutOfMemoryError: Direct buffer memory at org.apache.hadoop.hdfs.DataStreamer$LastExceptionInStreamer.set(DataStreamer.java:299) at

Restarting a job with drain flag set to true

2021-11-08 Thread Pedro Facal
Hello, We have an apache beam streaming application, running under flink native kubernetes. It consolidates aws kinesis records into parquet files every few minutes. To manage the lifecycle of this app, we use the rest api to stop the job with a savepoint and then restart the cluster/job

Re: How to express the datatype of sparksql collect_list(named_struct(...)) in flinksql?

2021-11-08 Thread JING ZHANG
Hi Vtygoss, You could try the following SQL: ``` select COLLECT(ROW(id, name)) as info from table group by ...; ``` In the above sql, the result type of `COLLECT(ROW(id, name))` is MULTISET. `CollectAggFunction` would store the data in a MapState. key is element type, represent the row

Re: to join or not to join, that is the question...

2021-11-08 Thread Seth Wiesman
There is no such restriction on connected streams; either input may modify the keyed state. Regarding performance, the difference between the two should be negligible and I would go with the option with the cleanest semantics. If both streams are the same type *and* you do not care which input an

Re: Elasticsearch6 connector in flink stand alone

2021-11-08 Thread David Morávek
Hi Ravi, I'm moving this thread to the user@flink mailing list, which is designed for these type of questions. For your issue, I don't think it's related to the elasticsearch integration. It seems like there is something wrong with your log4j setup. Either you have a conflicting log4j jars on

Re: how to expose the current in-flight async i/o requests as metrics?

2021-11-08 Thread David Morávek
Hi Dongwon, There are currently no metrics for the async work-queue size (you should be able to see the queue stats with debug logs enabled though [1]). As far as I can tell from looking at the code, the async operator is able to checkpoint even if the work-queue is exhausted. Arvid can you

Re: Fetch data from Rest API and sink to Kafka topic

2021-11-08 Thread Shuiqiang Chen
Hi Sharma, >From your description, it seem that you need to implement a custom source to fetch data from an Http server. Please refer to data sources [1] to learn how to develop a data source. And FYI, there is a

Re: unsubscribe

2021-11-08 Thread David Morávek
Hi Peter, to unsubscribe, please send an email to user-unsubscr...@flink.apache.org Best, D. On Fri, Nov 5, 2021 at 9:28 AM Peter Schrott wrote: > unsubscribe >

Fetch data from Rest API and sink to Kafka topic

2021-11-08 Thread Manjusha Sharma
Hi I am new to Flink and just getting started. I've watched quite a few Flink Forward videos and excited to get started. I have a need where I need to pull data from a RESTFul API endpoint that is authenticated using username and password and send this data to a Kafka Topic. I would like to pull

How to express the datatype of sparksql collect_list(named_struct(...)) in flinksql?

2021-11-08 Thread vtygoss
Hi, flink community! I am working on migrating data production pipeline from SparkSQL to FlinkSQL(1.12.0). And i meet a problem about MULTISET>. ``` Spark SQL select COLLECT_LIST(named_struct('id', id, 'name', name)) as info from table group by ...; ``` - 1. how to express and store

flinksql insert????

2021-11-08 Thread ??????
sql : String sql1="CREATE TABLE detal (\n" + " id INT,\n" + " produceId VARCHAR,\n"+ " color VARCHAR,\n"+ " size VARCHAR,\n"+ " PRIMARY KEY (id) NOT ENFORCED\n"+ ") WITH (\n" + " 'connector' = 'jdbc',\n" + " 'url' =

Flink SQL Join 如何使用minBatch 方式查询

2021-11-08 Thread WuKong
Hi : 我现在有一个场景,基于Flink 1.12 SQL 来实现, 想查询下游数据, 大概逻辑就是多张表Join 其中一张是Kafka 表,其他的是DB 表,我基于处理时间 FOR SYSTEM_TIME AS 做了Temporal Joins , 但是下游数据库查询压力比较大, 现在想通过延迟 批量 In 的方式 减小QPS ,请问如何配置 可以调这个时间长度,我理解默认就是来一条就查询一次 --- Best, WuKong

Re: flink的一个场景问题

2021-11-08 Thread Caizhi Weng
Hi! 一般将结果写到外部系统是通过 sink 节点。如果 Flink 没有内置你需要的 connector,可以考虑继承并实现 SinkFunction(很基本的 sink)或 RichSinkFunction(带 checkpoint 等功能)等自定义 sink,然后通过 DataStream#addSink 方法把这个 sink 加在 datastream 的末尾。 陈卓宇 <2572805...@qq.com.invalid> 于2021年11月8日周一 下午2:40写道: >

????: ????????

2021-11-08 Thread WuKong
Hi : ?? user-zh-unsubscr...@flink.apache.org ?? ??/ ?? https://flink.apache.org/zh/community.html --- Best, WuKong ?? ?? 2021-11-08 14:42 user-zh ??

回复: Re: 提交flink作业抛 java.lang.LinkageError

2021-11-08 Thread WuKong
Hi : 看报错日志,还是类加载问题 提示的报错信息 是说已经由不同类加载器已经加装了改依赖。如果生产环境上已经由了相关依赖包,建议将依赖设置为provided Caused by: java.lang.LinkageError: loader constraint violation: loader >> (instance of org/apache/flink/util/ChildFirstClassLoader) previously >> initiated loading for a different type with name >>

回复:Flink1.12 Streaming 消费kafka

2021-11-08 Thread JasonLee
hi 可以使用 setPartitions 方法 具体参考官网: https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#topic-partition-subscription Best JasonLee 在2021年11月8日 17:06,guanyq 写道: 请大佬指导下: flink streaming可以指定partition消费kafka么 如有100个partition,但是我只想消费15partiton。

Re:取消订阅

2021-11-08 Thread Yuepeng Pan
Hi, 退订请发送任意内容到 user-zh-unsubscr...@flink.apache.org Best, Roc 在 2021-11-08 14:42:33,"张伟明" <821596...@qq.com.INVALID> 写道: >取消订阅

Re:取消订阅

2021-11-08 Thread Yuepeng Pan
Hi, 退订请发送任意内容到 user-zh-unsubscr...@flink.apache.org Best, Roc 在 2021-11-08 13:58:37,"tanggen...@163.com" 写道: >取消订阅 > > >tanggen...@163.com

Re: New blog post published - Sort-Based Blocking Shuffle Implementation in Flink

2021-11-08 Thread Zhilong Hong
Thank you for writing this blog post, Daisy and Kevin! It helps me to understand what sort-based shuffle is and how to use it. Looking forward to your future improvements! On Wed, Nov 3, 2021 at 6:32 PM Yuxin Tan wrote: > Thanks Daisy and Kevin! The IO scheduling idea of the sequential reading

Flink1.12 Streaming 消费kafka

2021-11-08 Thread guanyq
请大佬指导下: flink streaming可以指定partition消费kafka么 如有100个partition,但是我只想消费15partiton。

checkpoint??????????

2021-11-08 Thread ??????
: flink on yarn ??flink hdfs,ark1??hdfs??active??standby ark2standbyactive :??flink??checkpoint??hdfs??url??hdfs:ark:8082 ,standby??, hdfs??mycluster