Re: Latency Tracking in Apache Flink

2021-10-20 Thread Puneet Duggal
Hi, Yes it is a simple ETL job and i thought of using it start_time, end_time concept… but just wanted to know if flink or any other 3rd party monitoring tools like datadog etc provide out of the box functionality to report latency. Thanks and regards, Puneet Duggal > On 21-Oct-2021, at 8:01

Mesos deploy starts Mesos framework but does not start job managers

2021-10-20 Thread Javier Vegas
I am trying to deploy a Flink cluster via Mesos following https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/ (I know Mesos support has been deprecated, and I am planning to migrate my deployment tools to Kubernetes, but for now I am stuck using

退订

2021-10-20 Thread aegean0...@163.com
| | aegean0933 邮箱:aegean0...@163.com | 退订

Re: Using the flink CLI option --pyRequirements

2021-10-20 Thread Dian Fu
Hi Francis Conroy, Do you want to debug the PyFlink job submitted via `flink run`? There is documentation [1] on how to debug it via `PyCharm`. PS: It supports the loopback mode in PyFlink which is enabled in local deployment. That's when you execute the PyFlink jobs locally, e.g. when executing

Re: Latency Tracking in Apache Flink

2021-10-20 Thread JING ZHANG
Hi Puneet, > Read about latency markers but not much useful as it just skips time taken > by each operator. Yes, latency-tracking indeed has the problem you said. Is there any way to track latency / time taken for each event processing. I'm afraid there is no built-in way to track latency /

flink修改sink并行度后,无法从checkpoint restore问题

2021-10-20 Thread kong
hi,我遇到flink修改sink并行度后,无法从checkpoint restore问题 flink 版本: 1.13.1 flink on yarn DataStream api方式写的java job 试验1:不修改任何代码,cancel job后,能从指定的checkpoint恢复 dataStream.addSink(new Sink(config)).name("").uid(""); 试验2:只修改sink端的并行度,job无法启动,一直是Initiating状态 dataStream.addSink(new

Re: pyflink 1.14.0 udf 执行报错,根据官网写的代码

2021-10-20 Thread Dian Fu
图挂了,邮件列表不能直接发图片。可以发一下更详细的日志信息吗? On Tue, Oct 19, 2021 at 6:34 PM xuzh wrote: > 错误日志 > Exception in thread Thread-14: > Traceback (most recent call last): > File "D:\Anaconda3\envs\py37\lib\threading.py", line 926, in > _bootstrap_inner > self.run() > File >

Re: Apache Flink - Using upsert JDBC sink for DataStream

2021-10-20 Thread M Singh
Thanks Jing for your references.  I will check them.  Mans On Monday, October 18, 2021, 11:24:13 PM EDT, JING ZHANG wrote: Hi Mans, Is there a DataStream api for using the upsert functionality ? You could try use `JdbcSink#sink` method, pass a upsert query as first parameter value.

High availability data clean up

2021-10-20 Thread Weiqing Yang
Hi, Per the doc , `kubernetes.jobmanager.owner.reference` can be used to set up the owners of the job manager Deployment. If the owner is deleted, then the job manager and its

Re: Problem with Flink job and Kafka.

2021-10-20 Thread Marco Villalobos
Hi Qingsheng, Thank you. I am running Flink in EMR 6.3.0 which uses Flink version 1.12.1. We AWS MSK Kafka, and we are currently using Kafka 2.2.1. The stack trace seems empty, it only states this error: org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Thomas Weise
Hi, I see the stable core Flink API as a prerequisite for modularity. And for connectors it is not just the source and sink API (source being stable as of 1.14), but everything that is required to build and maintain a connector downstream, such as the test utilities and infrastructure. Without

Latency Tracking in Apache Flink

2021-10-20 Thread Puneet Duggal
Hi, Is there any way to track latency / time taken for each event processing. Read about latency markers but not much useful as it just skips time taken by each operator. Thanks, Puneet

Flink on kubernetes HA ::Renew deadline reached

2021-10-20 Thread marco
Hello flink community:: I am deploying flink application cluster standalone mode on kubernetes, but i am facing some problems the job starts normally and it continues to run but at some point in time it crushes and gets restarted. Does anyone facing the same problem or know how to resolve

RE: Huge backpressure when using AggregateFunction with Session Window

2021-10-20 Thread Schwalbe Matthias
Hi Ori, Just a couple of comments (some code is missing for a concise explanation): * SimpleAggregator is not used in the job setup below (assuming another job setup) * SimpleAggregator is called for each event that goes into a specific session window, however * The scala

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Timo Walther
Hi Peter, as a temporary workaround I would simply implement a UDF like: public class EverythingToString extends ScalarFunction { public String eval(@DataTypeHint(inputGroup = ANY) Object o) { return o.toString(); } } For the Utf8 issue, you can instruct Avro to generate Java

Re: Huge backpressure when using AggregateFunction with Session Window

2021-10-20 Thread Timo Walther
Hi Ori, this sounds indeed strange. Can you also reproduce this behavior locally with a faker source? We should definitely add a profiler and see where the bottleneck lies. Which Flink version and state backend are you using? Regards, Timo On 20.10.21 16:17, Ori Popowski wrote: I have a

(无主题)

2021-10-20 Thread TROY
退订

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Peter Schrott
Hi Timo, thanks a lot for your suggestion. I also considered this workaround but when going from DataStreams API to Table API (using the POJO generated by maven avro plugin) types are not mapped correctly, esp. UTF8 (avros implementation of CharSquence) and also enums. In the table I have then

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Timo Walther
A current workaround is to use DataStream API to read the data and provide your custom Avro schema to configure the format. Then switch to Table API. StreamTableEnvironment.fromDataStream(...) accepts all data types. Enum classes will be represented as RAW types but you can forward them as

Re: Metric Scopes purpose

2021-10-20 Thread Chesnay Schepler
The metric scope options are used by reporters that do not rely on tags, to generate a fully-qualified metric name. The prometheus reporter identifies metrics in a different way (a generic scope like jobmanager.job.myMetric + a bunch of tags to select a specific instance) instead, and ignores

Re: Metric Scopes purpose

2021-10-20 Thread JP MB
Hi, Sorry for highlighting this one again. Can someone provide me some light on the purpose of metric scope configurations and where can I see the immediate results of changing these properties? Regards, José Brandão Em ter., 19 de out. de 2021 às 18:19, JP MB escreveu: > Hello, > I have been

Huge backpressure when using AggregateFunction with Session Window

2021-10-20 Thread Ori Popowski
I have a simple Flink application with a simple keyBy, a SessionWindow, and I use an AggregateFunction to incrementally aggregate a result, and write to a Sink. Some of the requirements involve accumulating lists of fields from the events (for example, all URLs), so not all the values in the end

Looking back at the Apache Flink 1.14 development cycle / getting better for 1.15

2021-10-20 Thread Johannes Moser
Dear Flink community, In preparation for the 1.15 development cycle of Apache Flink (it already started) and preparing the release management we are collecting feedback from the community. If you didn’t have a chance to look at the release announcement you might want to do that now [1] Also

Re: Fwd: How to deserialize Avro enum type in Flink SQL?

2021-10-20 Thread Peter Schrott
Hi people! I was digging deeper this days and found the "root cause" of the issue and the difference between avro reading from files and avro reading from Kafka & SR. plz see:

回复:flink作业的停止

2021-10-20 Thread lei-tian
我不是用的sql,但是也是同样的问题 | | totorobabyfans | | 邮箱:totorobabyf...@163.com | 签名由 网易邮箱大师 定制 在2021年10月20日 16:21,Kenyore Woo 写道: 我遇到过和你一模一样的问题。 如果你使用的也是Flink SQL,你可以在configuration中增加table.dml_sync=true的配置。这个配置对我有用。 详见TableEnvironment.executeInternal On Oct 20, 2021 at 09:06:54, lei-tian wrote: > Hi

Re: Reset of transient variables in state to default values.

2021-10-20 Thread Yun Tang
Hi, For RocksDB state backend, it will pick the registered kryo serializer for normal read/write use and checkpint/restore. Moreover, since key-values are serialized to store in RocksDB, it actually deep copy them to avoid later unexpected modification. For FileSystem/HashMap state backend,

Re: Wired rows in SQL Query Result (Changelog)

2021-10-20 Thread JING ZHANG
Hi kingofthecity, > > But when I change the result-mode from TABLE to CHANGELOG, > there are 4 rows for each update like: > The results are expected when the result-mode is `CHANGELOG`. > SELECT sector, avg(`value`) as `index` FROM stock INNER JOIN metadata ON > stock.id=metadata.id GROUP BY

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi Konstantin, > the connectors need to be adopted and require at least one release per Flink minor release. However, this will make the releases of connectors slower, e.g. maintain features for multiple branches and release multiple branches. I think the main purpose of having an external

Wired rows in SQL Query Result (Changelog)

2021-10-20 Thread thekingofcity
Hi, I'm working on a simple sql that contains aggregation. The results looks fine in the `SQL Query Result (Table)` but looks weird when I change the result-mode to CHANGELOG. ``` CREATE TABLE stock ( id VARCHAR(10) NOT NULL PRIMARY KEY, `value` DOUBLE NOT NULL ) WITH ( 'connector' = 'pravega',

Re: 回复:flink作业的停止

2021-10-20 Thread Kenyore Woo
我遇到过和你一模一样的问题。 如果你使用的也是Flink SQL,你可以在configuration中增加table.dml_sync=true的配置。这个配置对我有用。 详见TableEnvironment.executeInternal On Oct 20, 2021 at 09:06:54, lei-tian wrote: > Hi , yuepeng-pan: >

Re: [External] metric collision using datadog and standalone Kubernetes HA mode

2021-10-20 Thread Chesnay Schepler
What version are you using, and if you are using 1.13+, are you using the adaptive scheduler or reactive mode? On 20/10/2021 07:39, Clemens Valiente wrote: Hi Chesnay, thanks a lot for the clarification. We managed to resolve the collision, and isolated a problem to the metrics themselves.

Re: Programmatically configuring S3 settings

2021-10-20 Thread Arvid Heise
We had a similar thread on this ML where a user is executing through IDE. It seems as FileSystems are not automatically initialized in LocalExecutor and you should do it manually as a workaround [1] in your main before accessing the FileSystems. [1]

Re: Unable to create connection to Azure Data Lake Gen2 with abfs: "Configuration property {storage_account}.dfs.core.windows.net not found"

2021-10-20 Thread Arvid Heise
Hi Preston, Before FileSystems are accessed, they are first initialized with the configuration coming from the flink-conf. [1] It seems that your local execution in IDE is bypassing that, so I'd call that manually. Which entry point are you using? We should probably fix it, such that it's also

RE: Troubleshooting checkpoint timeout

2021-10-20 Thread Alexis Sarda-Espinosa
Currently the windows are 10 minutes in size with a 1-minute slide time. The approximate 500 event/minute throughput is already rather high for my use case, so I don’t expect it to be higher, but I would imagine that’s still pretty low. I did have some issues with storage space, and I wouldn’t

Re: Troubleshooting checkpoint timeout

2021-10-20 Thread Parag Somani
I had similar problem, where i have concurrent two checkpoints were configured. Also, i used to save it in S3(using minio) on k8s 1.18 env. Flink service were getting restarted and timeout was happening. It got resolved: 1. As minio ran out of disk space, caused failure of checkpoints(this was

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Konstantin Knauf
Hi everyone, regarding the stability of the APIs. I think everyone agrees that connector APIs which are stable across minor versions (1.13->1.14) are the mid-term goal. But: a) These APIs are still quite young, and we shouldn't make them @Public prematurely either. b) Isn't this *mostly*

Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Jark Wu
Hi, I think Thomas raised very good questions and would like to know your opinions if we want to move connectors out of flink in this version. (1) is the connector API already stable? > Separate releases would only make sense if the core Flink surface is > fairly stable though. As evident from