Re: Questions about the client synchronously obtaining task execution results

2023-11-19 Thread 刘峻池
Sorry I forgot to add the version information, the version is 1.17 刘峻池 于2023年11月20日周一 13:59写道: > Hi Flink Community > > When I run this command `flink run-application -t yarn-application -sae > mainClass somejar` to submit some batch-task on YARN with Application > Mode, my shell client always

Questions about the client synchronously obtaining task execution results

2023-11-19 Thread 刘峻池
Hi Flink Community When I run this command `flink run-application -t yarn-application -sae mainClass somejar` to submit some batch-task on YARN with Application Mode, my shell client always terminates after task submission success, then the dispatcher cannot receive the client heartbeat for a

Scalability Questions Concerning Apache Flink Operator 1.17

2023-10-24 Thread Vince Castello via user
I have been working with the 1.17 version of the Apache Flink operator and have below questions. As part of upgrading the application, is the application suspended, i.e. it is checkpointed, prior to doing an upgrade? Is there a concept of a hot update where I can update the application while

Default Flink S3 FileSource Questions

2023-09-25 Thread Varun Narayanan Chakravarthy via user
Hello Flink Community, Flink Version: 1.16.1, Zookeeper for HA. My Flink Applications reads raw parquet files hosted in S3, applies transformations and re-writes them to S3, under a different location. Below is my code to read from parquets from S3: ``` final Configuration configuration = new

Re: Questions related to Autoscaler

2023-08-11 Thread liu ron
his > threashold, autoscaler will prevent any downscaling behavior. > > > Best, > Zhanghao Chen > -- > *发件人:* Hou, Lijuan via user > *发送时间:* 2023年8月9日 3:04 > *收件人:* user@flink.apache.org > *主题:* Questions related to Autoscaler > >

回复: Questions related to Autoscaler

2023-08-10 Thread Chen Zhanghao
8月9日 3:04 收件人: user@flink.apache.org 主题: Questions related to Autoscaler Hi Flink team, This is Lijuan. I am working on our flink job to realize autoscaling. We are currently using flink version of 1.16.1, and using flink operator version of 1.5.0. I have some questions need to confirm wi

Questions related to Autoscaler

2023-08-10 Thread Hou, Lijuan via user
Hi Ron, Thanks for the reply! > 1 - It seems for flink job using flink operator to realize autoscaling, the > only option to realize autoscaling is to enable the Autoscaler feature, and > KEDA won’t work, right? What is KEDA mean? -> KEDA is a Kubernetes based Event Driven Autoscaler. I

Re: Questions related to Autoscaler

2023-08-08 Thread liu ron
1.16.1, and using flink operator > version of 1.5.0. I have some questions need to confirm with you. > > > > 1 - It seems for flink job using flink operator to realize autoscaling, > the only option to realize autoscaling is to enable the Autoscaler feature, > and KEDA won’t work, r

Questions related to Autoscaler

2023-08-08 Thread Hou, Lijuan via user
Hi Flink team, This is Lijuan. I am working on our flink job to realize autoscaling. We are currently using flink version of 1.16.1, and using flink operator version of 1.5.0. I have some questions need to confirm with you. 1 - It seems for flink job using flink operator to realize autoscaling

RE: Questions about java enum when convert DataStream to Table

2023-08-02 Thread Jiabao Sun
Hi haishui, The enum type cannot be mapped as flink table type directly. I think the easiest way is to convert enum to string type first: DataStreamSource> source = env.fromElements( new Tuple2<>("1", TestEnum.A.name()), new Tuple2<>("2", TestEnum.B.name()) ); Or add a map

Questions about java enum when convert DataStream to Table

2023-08-02 Thread haishui
I want to convert dataStream to Table. The type of dataSream is a POJO, which contains a enum field. 1. The enum field is RAW('classname', '...') in table. When I execute `SELECT * FROM t_test` and print the result, It throws EOFException. 2. If I assign the field is STRING in schema, It

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Gyula Fóra
rg/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/job-management/#stateful-and-stateless-application-upgrades >>>>>> >>>>>> The operator is made especially to handle stateful application >>>>>> upgrades robustly.

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-21 Thread Tony Chen
, as >>>>> long as you have last-state or savepoint you will always get the latest >>>>> state. >>>>> >>>>> This is somewhat orthogonal to the savepoint trigger / >>>>> initialSavepointPath mechanisms. The initialSavepointPath

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
gt;> operator is not aware of the latest state. After that all upgrades always >>>> use the latest state unless the upgradeMode is stateless in which case no >>>> state is used. Savepoint triggering can help you keep backups for failure >>>> recovery but they s

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
operator is not aware of the latest state. After that all upgrades always >>> use the latest state unless the upgradeMode is stateless in which case no >>> state is used. Savepoint triggering can help you keep backups for failure >>> recovery but they should not

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
an help you keep backups for failure >> recovery but they should not be executed as part of your upgrade flow >> because the operator already does this for you. >> >> Cheers, >> Gyula >> >> On Wed, Jul 19, 2023 at 8:20 PM Tony Chen >> wrote: >>

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Community, >> >> My name is Tony Chen, and I am a software engineer at Robinhood. I have >> some questions on restarting a Flink Application from a savepoint or >> checkpoint. >> >> We currently store our checkpoints and savepoints in S3, and we would >> like

Re: Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Gyula Fóra
Tony Chen, and I am a software engineer at Robinhood. I have > some questions on restarting a Flink Application from a savepoint or > checkpoint. > > We currently store our checkpoints and savepoints in S3, and we would like > to use the Apache Flink Kubernetes Operator to manage our Flin

Questions on Restarting a Flink Application from a savepoint or checkpoint

2023-07-19 Thread Tony Chen
Hi Flink Community, My name is Tony Chen, and I am a software engineer at Robinhood. I have some questions on restarting a Flink Application from a savepoint or checkpoint. We currently store our checkpoints and savepoints in S3, and we would like to use the Apache Flink Kubernetes Operator

Re: 回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Leon Xu
/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management> > [3] Autoscaler | Apache Flink Kubernetes Operator > <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.5/docs/custom-resource/autoscaler/> > > Best, > Zhanghao Chen >

Re: 回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Madan D via user
| Apache Flink Kubernetes Operator Best,Zhanghao Chen发件人: Leon Xu 发送时间: 2023年6月27日 13:41 收件人: user 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs

回复: Questions regarding adaptive scheduler with YARN and application mode

2023-06-28 Thread Chen Zhanghao
est, Zhanghao Chen 发件人: Leon Xu 发送时间: 2023年6月27日 13:41 收件人: user 主题: Questions regarding adaptive scheduler with YARN and application mode Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN wit

Questions regarding adaptive scheduler with YARN and application mode

2023-06-26 Thread Leon Xu
Hi Flink users, I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs (NOT batch job). Our jobs are running on YARN with application mode. There isn't much doc around how adaptive scheduler works. So I have some questions: 1. How does Adaptive Scheduler work with YARN

Re: Questions on S3 File Sink Behavior

2023-03-29 Thread Mate Czagany
Hi, 1. In case of S3 FileSystem, Flink uses the multipart upload process [1] for better performance. It might not be obvious at first by looking at the docs, but it's noted at the bottom of the FileSystem page [2] For more information you can also check FLINK-9751 and FLINK-9752 2. In case of

Questions on S3 File Sink Behavior

2023-03-29 Thread Chirag Dewan via user
Hi,   We are tying to use Flink's File sink to distribute files to AWS S3 storage. We are using Flink provided Hadoop s3a connector as plugin. We have some observations that we needed to clarify: 1. When using file sink for local filesystem distribution, we can see that the sink creates 3

Re: I want to subscribe users' questions

2023-02-07 Thread yuxia
ser-zh" 发送时间: 星期五, 2023年 2 月 03日 下午 7:48:55 主题: I want to subscribe users' questions Hi, My name is Guanyuan Chen.I am a big data development engineer, tencent wechat department, china. I have 4 years experience in flink developing, and want to subscribe flink's development news and hel

Re: I want to subscribe users' questions

2023-02-07 Thread yuxia
ser-zh" 发送时间: 星期五, 2023年 2 月 03日 下午 7:48:55 主题: I want to subscribe users' questions Hi, My name is Guanyuan Chen.I am a big data development engineer, tencent wechat department, china. I have 4 years experience in flink developing, and want to subscribe flink's development news and hel

Re: I want to subscribe users' questions

2023-02-07 Thread Hang Ruan
Hi, guanyuan, This document( https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list) will be helpful. welcome~ Best, Hang guanyuan chen 于2023年2月7日周二 21:37写道: > Hi, > My name is Guanyuan Chen.I am a big data development engineer, tencent > wechat department, china. I

Re: I want to subscribe users' questions

2023-02-07 Thread Hang Ruan
Hi, guanyuan, This document( https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list) will be helpful. welcome~ Best, Hang guanyuan chen 于2023年2月7日周二 21:37写道: > Hi, > My name is Guanyuan Chen.I am a big data development engineer, tencent > wechat department, china. I

I want to subscribe users' questions

2023-02-07 Thread guanyuan chen
Hi, My name is Guanyuan Chen.I am a big data development engineer, tencent wechat department, china. I have 4 years experience in flink developing, and want to subscribe flink's development news and help someone developing flink job willingly. Thanks a lot.

questions about FLINK-27341

2022-09-03 Thread yidan zhao
Hi, I want to know is there some way to avoid this problem now? I can not guarantee jobmanager and taskmanager do not run in the same machine.

Questions regarding JobManagerWatermarkTracker on AWS Kinesis

2022-07-25 Thread Peter Schrott
Hi there! I have a Flink Job (v 1.13.2, AWS managed) which reads from Kinesis (AWS manger, 4 shards). For reasons the shards are not partitioned properly (at the moment). So I wanted to make use of Watermarks (BoundedOutOfOrdernessTimestampExtractor) and the JobManagerWatermarkTracker to

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-05 Thread Geng Biao
o Geng From: Leon Xu Date: Sunday, June 5, 2022 at 4:04 PM To: Biao Geng Cc: user Subject: Re: Questions regarding classpath loading order in YarnClusterDescriptor Hi Biao, I really appreciate your thorough answers. And yes for now I took the workaround by manipulating the directory names. To

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-05 Thread Leon Xu
e >> yarn.provided.lib.dirs >> *property under the yarn configuration. >> >> By playing with the YarnClusterDescriptor code I have two questions that >> I hope to get some answers: >> 1. YarnClusterDescriptor seems to force the classpath loading in &

Re: Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-05 Thread Biao Geng
ion from Java code to YARN cluster, in the > application mode. We are setting the classpath as the value of *the > yarn.provided.lib.dirs > *property under the yarn configuration. > > By playing with the YarnClusterDescriptor code I have two questions that I > hope to get some

Questions regarding classpath loading order in YarnClusterDescriptor

2022-06-04 Thread Leon Xu
. By playing with the YarnClusterDescriptor code I have two questions that I hope to get some answers: 1. YarnClusterDescriptor seems to force the classpath loading in alphabetical order. See code here <https://github.com/apache/flink/blob/master/flink-yarn/src/main/java/org/apache/flink/y

Questions about Flink Stateful Functions Current Capabilities

2022-05-28 Thread Ryan van Huuksloot
and DataStream interop). Below are my questions. I want to make sure I understand StateFuns and the core functionality - and potentially some of the future roadmap. 1. KeyBy: From the looks of it, ingress / egress can be scaled, but how does the scaling of the functions work? If those functions have state

Re: Savepoint and cancel questions

2022-04-23 Thread Dan Hill
Hi Hangxiang. Thanks! 1. Ah, okay. It makes more sense considering FAILED. 2. Oh cool. I'm migrating to v1.14.4 now. 3. Yes, this is great! On Fri, Apr 22, 2022 at 8:05 PM Hangxiang Yu wrote: > Hi, Dan > 1. Do you mean put the option into savepoint command? If so, I think it > will not work

Re: Savepoint and cancel questions

2022-04-22 Thread Hangxiang Yu
Hi, Dan 1. Do you mean put the option into savepoint command? If so, I think it will not work well. This option describe that how checkpoints will be cleaned up in different job status. e.g. FAILED/CANCELED. It cannot be covered in savepoint command. 2. Which flink version you use? I work on 1.14

Savepoint and cancel questions

2022-04-22 Thread Dan Hill
Hi. 1. Why isn’t –externalizedCheckpointCleanup an option on savepoint (instead of being needed at the start of a job run)? 2. Can we get a confirmation dialog when someone clicks "cancel job" in the UI? Just in case people click on accident. 3. Can we get a way to have Flink clean up the

Apache StateFun - A few questions about of module.yaml

2022-04-12 Thread M Singh
: io.statefun.playground.v1/ingressspec:  port: 8090---kind: io.statefun.playground.v1/egressspec:  port: 8091  topics:    - greetings Questions: 1. How does the ingress component (kind:io.statefun.playground.v1/ingress) get initialized to listen to port 8090 and same for egress (kind

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, I confirm that the FlinkSQL Client is communicating with JM via Rest endpoint. After I changed the “rest.port”, the sql client thrown exception: "[ERROR] Could not execute SQL statement. Reason: java.net.ConnectException: Connection refused”. So for your case, since Flink will

Re: Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-03 Thread yu'an huang
Hi Elkhan, Except for JM have an external IP address, I think the port 6123 also need to be opened. You may need to set a host port for 6123 in JM pod or expose this port by Kubernetes service. But I am not sure whether the sql-client communicate with JM via Rest endpoint or RPC port. Hopes

Questions regarding connecting local FlinkSQL client to remote JobManager in K8s

2022-03-02 Thread Elkhan Dadashov
Hi Flink users, Wanted to check if any of you tried to run the local FlinkSQL client against JobManager running in the Kubernetes environment. For local FlinkSQL Client and local Flink cluster we set these params: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 To make it work, Is

RE: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Schwalbe Matthias
method look like this (for Flink 1.13.0, and should be similar for other releases): [1] Feel free to get back with additional questions  Thias [1] remodeled execute(…) (scala): def execute(jobName: String): JobExecutionResult = { if (fromSavepoint != null

Re: Basic questions about resuming stateful Flink jobs

2022-02-17 Thread Piotr Nowojski
ilure_recovery/ > [3] > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/ops/state/savepoints/#configuration > > > On Wed, Feb 16, 2022 at 12:28 PM Sandys-Lumsdaine, James < > james.sandys-lumsda...@systematica.com> wrote: > >> Thanks for

Re: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Cristian Constantinescu
y, Piotr. > > > > Some follow on questions: > > >". Nevertheless you might consider enabling them as this allows you to > manually cancel the job if it enters an endless recovery/failure loop, fix > the underlying issue, and restart the job from the externalised checkpoint. >

RE: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Sandys-Lumsdaine, James
Thanks for your reply, Piotr. Some follow on questions: >". Nevertheless you might consider enabling them as this allows you to >manually cancel the job if it enters an endless recovery/failure loop, fix the >underlying issue, and restart the job from the externalised ch

Re: Basic questions about resuming stateful Flink jobs

2022-02-16 Thread Piotr Nowojski
Hi James, Sure! The basic idea of checkpoints is that they are fully owned by the running job and used for failure recovery. Thus by default if you stopped the job, checkpoints are being removed. If you want to stop a job and then later resume working from the same point that it has previously

Basic questions about resuming stateful Flink jobs

2022-02-16 Thread James Sandys-Lumsdaine
Hi all, I have a 1.14 Flink streaming workflow with many stateful functions that has a FsStateBackend and checkpointed enabled, although I haven't set a location for the checkpointed state. I've really struggled to understand how I can stop my Flink job and restart it and ensure it carries

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-08 Thread Shane Bishop
type is registered with Kryo or not. [1] https://issues.apache.org/jira/browse/FLINK-25993 Best regards, Shane From: Chesnay Schepler Sent: February 7, 2022 3:08 AM To: Shane Bishop ; user@flink.apache.org Subject: Re: Questions about Kryo

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-07 Thread Shane Bishop
Sent: February 7, 2022 3:08 AM To: Shane Bishop ; user@flink.apache.org Subject: Re: Questions about Kryo setRegistrationRequired(false) There isn't any setting to control setRegistrationRequired(). You can however turn Kryo off via ExecutionConfig#disableGenericTypes, although this may

Re: Questions about Kryo setRegistrationRequired(false)

2022-02-07 Thread Chesnay Schepler
There isn't any setting to control setRegistrationRequired(). You can however turn Kryo off via ExecutionConfig#disableGenericTypes, although this may require changes to your data types. I'd recommend to file a ticket. On 04/02/2022 20:12, Shane Bishop wrote: Hi all, TL;DR: I am concerned

Re: Questions about checkpoint retention

2022-02-05 Thread 陳昌倬
On Fri, Jan 28, 2022 at 02:43:11PM +0800, Caizhi Weng wrote: > Chen-Che Huang 于2022年1月27日周四 11:10写道: > > We have two questions for checkpoint retention. > > > >1. When our cron job creates a savepoint called SP, it seems those > >checkpoints created earlier SP

Questions about Kryo setRegistrationRequired(false)

2022-02-04 Thread Shane Bishop
Hi all, TL;DR: I am concerned that kryo.setRegistrationRequired(false) in Apache Flink might introduce serialization/deserialization vulnerabilities, and I want to better understand the security implications of its use in Flink. There is an issue on the Kryo GitHub repo

Re: Questions about checkpoint retention

2022-01-27 Thread Caizhi Weng
ew checkpoints regularly and keeps only the latest 10 > > checkpoints. Besides, for app upgrade and better reliability, we have a > cron job which creates savepoints at regular intervals. > > > > We have two questions for checkpoint retention. > >1. When our cro

Questions about checkpoint retention

2022-01-26 Thread Chen-Che Huang
upgrade and better reliability, we have a cron job which creates savepoints at regular intervals. We have two questions for checkpoint retention. 1. When our cron job creates a savepoint called SP, it seems those checkpoints created earlier SP still cannot be deleted. We thought the new

Re: [Statefun] Questions on recovery

2021-11-18 Thread Hady Januar Willi
Hi Igal, Thank you for your response, understood the strategies. Best, Hady On Wed, Nov 3, 2021 at 9:06 PM Igal Shilman wrote: > Hello Hady, > Glad to see that you are testing StateFun! > > Regarding that exception, I think that this is not the root cause. The > root cause is as you wrote

Re: Flink SQL build-in function questions.

2021-11-13 Thread Yuval Itzchakov
I recall looking for these once in the SQL standard spec, AFAIR they are not part of it. On Fri, Nov 12, 2021, 11:48 Francesco Guardiani wrote: > Yep I agree with waiting for calcite to support it. As a temporary > workaround you can define your own udfs with that functionality. > > I also

Re: Flink SQL build-in function questions.

2021-11-12 Thread Francesco Guardiani
Yep I agree with waiting for calcite to support it. As a temporary workaround you can define your own udfs with that functionality. I also wonder, are the bitwise operators defined in the ansi sql specification? Or should we just follow the common sense behavior of databases supporting it? On

Re: Flink SQL build-in function questions.

2021-11-12 Thread JIN FENG
Sure, I can take a try. Before starting the work, we should discuss the api of bit operation function. There are two alternatives 1. add some built in functions include bitAnd,bitNot,bitOr,bitXor 2. support &, |, ^, ~ operators in calcite first. Currently, there is a relative jira

Re: Flink SQL build-in function questions.

2021-11-11 Thread Martijn Visser
Hi, I don't think there's currently anyone in the community who is working on the bit operation functions. Would you be interested and able to make a contribution on that? Best regards, Martijn On Thu, 11 Nov 2021 at 03:54, JIN FENG wrote: > hi all, > I met two problems when I use FlinkSQL.

Flink SQL build-in function questions.

2021-11-10 Thread JIN FENG
hi all, I met two problems when I use FlinkSQL. 1. Is there any plan to support bit operation functions ? Currently there is some jira mentioned about this, https://issues.apache.org/jira/browse/FLINK-14990 , https://issues.apache.org/jira/browse/FLINK-12451 But It seems that it hasn't been

Re: [Statefun] Questions on recovery

2021-11-03 Thread Igal Shilman
Hello Hady, Glad to see that you are testing StateFun! Regarding that exception, I think that this is not the root cause. The root cause is as you wrote that the StateFun job failed because it wasn't able to deliver a message to a remote function in the given time frame. If you look at the logs

[Statefun] Questions on recovery

2021-11-03 Thread Hady Januar Willi
Hi everyone, When testing Flink statefun, the job eventually throws the following exception after failing to reach the endpoint or if the endpoint fails after the exponentially increasing delay. java.util.concurrent.RejectedExecutionException:

Re: Questions about keyed streams

2021-09-28 Thread Dan Hill
Hi! I'm just getting back to this. Questions: 1. Across operators, does the same key group ids get mapped to the same task managers? E.g. if an item is in key group 1 of operator A and that runs on taskmanager-0, will key group 1 of operator B also run on taskmanager-0? 2. Are there any

Re: Questions regarding broadcast join in Flink

2021-09-10 Thread Timo Walther
Hi Gerald, actually, this is a typical issue when performing a streaming join. An ideal solution would be to block the main stream until the broadcast stream is ready. However, this is currently not supported in the API. In any case, a user needs to handle this in a use case specific way to

Questions regarding broadcast join in Flink

2021-09-10 Thread Gerald.Sula
Hello, I am trying to implement a broadcast join of two streams in flink using the broadcast functionality. In my usecase I have a large stream that will be enriched with a much smaller stream. In order to first test my approach, I have adapted the Taxi ride exercise in the official training

RE: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-27 Thread Hailu, Andreas [Engineering]
Thanks Caizhi, this was very helpful. // ah From: Caizhi Weng Sent: Thursday, August 26, 2021 10:41 PM To: Hailu, Andreas [Engineering] Cc: user@flink.apache.org Subject: Re: 1.9 to 1.11 Managed Memory Migration Questions Hi! I've read the first mail again and discover that the direct memory

Re: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-26 Thread Caizhi Weng
/ops/config.html#taskmanager-memory-task-off-heap-size > > [2] > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#configure-off-heap-memory-direct-or-native > > > > *// *ah > > > > *From:* Caizhi Weng > *Sent:* Wednesday, Au

RE: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-26 Thread Hailu, Andreas [Engineering]
-direct-or-native // ah From: Caizhi Weng Sent: Wednesday, August 25, 2021 10:47 PM To: Hailu, Andreas [Engineering] Cc: user@flink.apache.org Subject: Re: 1.9 to 1.11 Managed Memory Migration Questions Hi! Why does this ~30% memory reduction happen? I don't know how memory is calculated in Flink

Re: 1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Caizhi Weng
ide from > ‘taskmanager.network.bounded-blocking-subpartition-type: file’ which I see > is now deprecated and replaced with a new option defaulted to ‘file’ (which > works for us!) SO nearly everything else is as default. > > > > We haven’t made any configuration changes ye

1.9 to 1.11 Managed Memory Migration Questions

2021-08-25 Thread Hailu, Andreas [Engineering]
some questions around what I observed. 1. I observed that an application ran with "-ytm 12288" on 1.9 receives 8.47GB JVM Heap space and 5.95 Flink Managed Memory space (as reported by the ApplicationMaster), where on 1.11 it receives 5.79 JVM Heap space and 4.30 Flink Managed Me

Re: Questions on usage of SQL hints

2021-08-12 Thread JING ZHANG
Hi Paul, I'm very happy to hear that., Paul Lam 于2021年8月12日周四 下午3:17写道: > Hi JING, > > Thanks for your inputs! It helps a lot. > > Best, > Paul Lam > > 2021年8月12日 13:13,JING ZHANG 写道: > > Hi Paul, > There are Table hints and Query hints. > Query hints are on the way, there is a JIRA to track

Re: Questions on usage of SQL hints

2021-08-12 Thread Paul Lam
Hi JING, Thanks for your inputs! It helps a lot. Best, Paul Lam > 2021年8月12日 13:13,JING ZHANG 写道: > > Hi Paul, > There are Table hints and Query hints. > Query hints are on the way, there is a JIRA to track this issue [3]. AFAIK, > the issue is almost close to submit a pull request now. >

Re: Questions on usage of SQL hints

2021-08-11 Thread JING ZHANG
Hi Paul, There are Table hints and Query hints. Query hints are on the way, there is a JIRA to track this issue [3]. AFAIK, the issue is almost close to submit a pull request now. Table hints[1][2] are already supported since Flink 1.11. You could find more detail information in [1][2]. For table

Questions on usage of SQL hints

2021-08-11 Thread Paul Lam
Hi community, I’m trying out SQL hints on DML, but there’s not very much about the supported SQL hints on the docs. Are the SQL hints limited to source/sink tables only at the moment? And where can I find the full list of supported SQL hints? Thanks in advance! Best, Paul Lam

Re: Questions on reading JDBC data with Flink Streaming API

2021-08-10 Thread Fuyao Li
Sandys-Lumsdaine Date: Tuesday, August 10, 2021 at 07:58 To: user@flink.apache.org Subject: [External] : Questions on reading JDBC data with Flink Streaming API Hello, I'm starting a new Flink application to allow my company to perform lots of reporting. We have an existing legacy system

Questions on reading JDBC data with Flink Streaming API

2021-08-10 Thread James Sandys-Lumsdaine
deployed Kafka streams. I've spent a lot of time reading the Flink book and web pages but I have some simple questions and assumptions I hope you can help with so I can progress. Firstly, I am wanting to use the DataStream API so we can both consume historic data and also realtime data. I

Re: Questions about keyed streams

2021-07-29 Thread Arvid Heise
Afaik you can express the partition key in Table API now which will be used for co-location and optimization. So I'd probably give that a try first and convert the Table to DataStream where needed. On Sat, Jul 24, 2021 at 9:22 PM Dan Hill wrote: > Thanks Fabian and Senhong! > > Here's an

Re: Questions about keyed streams

2021-07-24 Thread Dan Hill
Thanks Fabian and Senhong! Here's an example diagram of the join that I want to do. There are more layers of joins. https://docs.google.com/presentation/d/17vYTBUIgrdxuYyEYXrSHypFhwwS7NdbyhVgioYMxPWc/edit#slide=id.p 1) Thanks! I'll look into these. 2) I'm using the same key across multiple

Re: Questions about keyed streams

2021-07-23 Thread Senhong Liu
Hi Dan, 1) If the key doesn’t change in the downstream operators and you want to avoid shuffling, maybe the DataStreamUtils#reinterpretAsKeyedStream would be helpful. 2) I am not sure that if you are saying that the data are already partitioned in the Kafka and you want to avoid shuffling in

Re: Questions about keyed streams

2021-07-22 Thread Fabian Paul
Hi Dan, 1) In general, there is no guarantee that your downstream operator is on the same TM although working on the same key group. Nevertheless, you can try force this kind of behaviour to prevent the network transfer by either chaining the two operators (if no shuffle is in between) or

Questions about keyed streams

2021-07-21 Thread Dan Hill
Hi. 1) If I use the same key in downstream operators (my key is a user id), will the rows stay on the same TaskManager machine? I join in more info based on the user id as the key. I'd like for these to stay on the same machine rather than shuffle a bunch of user-specific info to multiple task

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-29 Thread Kai Fu
Thank you for the reply, Jark. In our case, we found that there are no UPDATE_BEFORE records generated since the join is using -D/+I row kinds. *> Note: "+I" represents "INSERT", "-D" represents "DELETE", "+U" represents "UPDATE_AFTER",* * "-U" represents "UPDATE_BEFORE". We forward input

Re: Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-28 Thread Jark Wu
UPDATE_BEFORE is required in cases such as Aggregation with Filter. For example: SELECT * FROM ( SELECT word, count(*) as cnt FROM T GROUP BY word ) WHERE cnt < 3; There is more discussion in this issue: https://issues.apache.org/jira/browse/FLINK-9528 Best, Jark On Mon, 28 Jun 2021 at

Questions about UPDATE_BEFORE row kind in ElasticSearch sink

2021-06-27 Thread Kai Fu
Hi team, We notice the UPDATE_BEFORE row kind is not dropped and treated as DELETE as in code

Re: Questions about implementing a flink source

2021-06-09 Thread Arvid Heise
restarted (could be in an inconsistent state). On Tue, Jun 8, 2021 at 10:51 PM Evan Palmer wrote: > Hello again, > > Thank you for all of your help so far, I have a few more questions if you > have the time :) > > 1. Deserialization Schema > > There's been some debate within

Re: Questions about implementing a flink source

2021-06-08 Thread Evan Palmer
Hello again, Thank you for all of your help so far, I have a few more questions if you have the time :) 1. Deserialization Schema There's been some debate within my team about whether we should offer a DeserializationSchema and SerializationSchema in our source and sink. If we include don't

Re: Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
bos Send Date:Thu May 20 01:16:39 2021 Recipients:Yun Gao CC:user Subject:Re: Questions Flink DataStream in BATCH execution mode scalability advice > On May 19, 2021, at 7:26 AM, Yun Gao wrote: > > Hi Marco, > > For the remaining issues, > > 1. For the aggrega

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-19 Thread Jin Yi
s partitions to remain in order you may > need to use parallelism 1. I'll attach some links here which might be > useful: > > > https://stackoverflow.com/questions/50340107/order-of-events-with-chained-keyby-calls-on-same-key > > https://stackoverflow.com/questions/44156774/orde

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
gt; the timeout. > > Best, > Yun > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#heartbeat-timeout > > > --Original Mail -- > Sender:Marco Villalobos > Send Date:Wed May 19 14:03:48 2021 > Rec

Re: Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Yun Gao
14:03:48 2021 Recipients:user Subject:Questions Flink DataStream in BATCH execution mode scalability advice Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many

Questions Flink DataStream in BATCH execution mode scalability advice

2021-05-19 Thread Marco Villalobos
Questions Flink DataStream in BATCH execution mode scalability advice. Here is the problem that I am trying to solve. Input is an S3 bucket directory with about 500 GB of data across many files. The instance that I am running on only has 50GB of EBS storage. The nature of this data is time

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-18 Thread Ingo Bürk
Hi Jin, 1) As far as I know the order is only guaranteed for events from the same partition. If you want events across partitions to remain in order you may need to use parallelism 1. I'll attach some links here which might be useful: https://stackoverflow.com/questions/50340107/order-of-events

Re: two questions about flink stream processing: kafka sources and TimerService

2021-05-17 Thread Jin Yi
ping? On Tue, May 11, 2021 at 11:31 PM Jin Yi wrote: > hello. thanks ahead of time for anyone who answers. > > 1. verifying my understanding: for a kafka source that's partitioned on > the same piece of data that is later used in a keyBy, if we are relying on > the kafka timestamp as the

two questions about flink stream processing: kafka sources and TimerService

2021-05-12 Thread Jin Yi
hello. thanks ahead of time for anyone who answers. 1. verifying my understanding: for a kafka source that's partitioned on the same piece of data that is later used in a keyBy, if we are relying on the kafka timestamp as the event timestamp, is it guaranteed that the event stream of the source

Re: Questions about implementing a flink source

2021-05-10 Thread Arvid Heise
Hi Evan, A few replies / questions inline. Somewhat relatedly, I'm also wondering > where this connector should live. I saw that there's already a pubsub > connector in > https://github.com/apache/flink/tree/bf40fa5b8cd5b3a8876f37ab4aa7bf06e8b9f666/flink-connectors/flink-connector-gcp-p

Re: some questions about data skew

2021-05-10 Thread Dawid Wysakowicz
Hi, What you could do to improve processing of a skewed data is to introduce an artificial preaggregation. You could add some artificial uniformly distributed secondary key and calculate your aggregates on (original key, secondary uniform key) and then do the final aggregation in an additional

Re: Questions about implementing a flink source

2021-05-07 Thread Evan Palmer
Hi Arvid, thank you so much for the detailed reply! A few replies / questions inline. Somewhat relatedly, I'm also wondering where this connector should live. I saw that there's already a pubsub connector in https://github.com/apache/flink/tree/bf40fa5b8cd5b3a8876f37ab4aa7bf06e8b9f666/flink

  1   2   3   4   >