Re: SIGSEGV error

2019-09-13 Thread Till Rohrmann
Hi Marek, could you share the logs statements which happened before the SIGSEGV with us? They might be helpful to understand what happened before. Moreover, it would be helpful to get access to your custom serializer implementations. I'm also pulling in Gordon who worked on the

Re: Problem starting taskexecutor daemons in 3 node cluster

2019-09-13 Thread Till Rohrmann
Hi Komal, could you check that every node can reach the other nodes? It looks a little bit as if the TaskManager cannot talk to the JobManager running on 150.82.218.218:6123. Cheers, Till On Thu, Sep 12, 2019 at 9:30 AM Komal Mariam wrote: > I managed to fix it however ran into another

Re: Flink web ui authentication using nginx

2019-09-13 Thread Till Rohrmann
Hi Harshith, I'm not an expert of how to setup nginx with authentication for Flink but I could shed some light on the redirection problem. I assume that Flink's redirection response might not be properly understood by nginx. The good news is that with Flink 1.8, we no longer rely on client side

[ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Till Rohrmann
Hi everyone, I'm very happy to announce that Zili Chen (some of you might also know him as Tison Kun) accepted the offer of the Flink PMC to become a committer of the Flink project. Zili Chen has been an active community member for almost 16 months now. He helped pushing the Flip-6 effort over

Re: suggestion of FLINK-10868

2019-09-09 Thread Till Rohrmann
Hi Anyang, I think we cannot take your proposal because this means that whenever we want to call notifyAllocationFailure when there is a connection problem between the RM and the JM, then we fail the whole cluster. This is something a robust and resilient system should not do because connection

Re: suggestion of FLINK-10868

2019-09-06 Thread Till Rohrmann
Hi Anyang, thanks for your suggestions. 1) I guess one needs to make this interval configurable. A session cluster could theoretically execute batch as well as streaming tasks and, hence, I doubt that there is an optimal value. Maybe the default could be a bit longer than 1 min, though. 2)

Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-09-03 Thread Till Rohrmann
The FLIP-62 discuss thread can be found here [1]. [1] https://lists.apache.org/thread.html/9602b342602a0181fcb618581f3b12e692ed2fad98c59fd6c1caeabd@%3Cdev.flink.apache.org%3E Cheers, Till On Tue, Sep 3, 2019 at 11:13 AM Till Rohrmann wrote: > Thanks everyone for the input again. I

Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-09-03 Thread Till Rohrmann
>> worth to be documented. >> >> Thanks, >> Zhu Zhu >> >> Steven Wu 于2019年9月3日周二 上午4:42写道: >> >>> 1s sounds a good tradeoff to me. >>> >>> On Mon, Sep 2, 2019 at 1:30 PM Till Rohrmann >>> wrote: >>> >>&g

Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-09-02 Thread Till Rohrmann
d tasks. >>> As a safer though not optimized option, a default delay larger than 0 s >>> is better in my opinion. >>> >>> >>> 未来阳光 <2217232...@qq.com> 于2019年8月30日周五 下午10:23写道: >>> >>>> Hi, >>>> >>>> >>>

[SURVEY] Is the default restart delay of 0s causing problems?

2019-08-30 Thread Till Rohrmann
Hi everyone, I wanted to reach out to you and ask whether decreasing the default delay to `0 s` for the fixed delay restart strategy [1] is causing trouble. A user reported that he would like to increase the default value because it can cause restart storms in case of systematic faults [2]. The

Re: Using Avro SpecficRecord serialization instead of slower ReflectDatumWriter/GenericDatumWriter

2019-08-30 Thread Till Rohrmann
Hi Roshan, these kind of questions should be posted to Flink's user mailing list. I've cross posted it now. If you are using Flink's latest version and your type extends `SpecificRecord`, then Flink's AvroSerializer should use the `SpecificDatumWriter`. If this is not the case, then this sounds

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-30 Thread Till Rohrmann
For point 2. there exists already a JIRA issue [1] and a PR. I hope that we can merge it during this release cycle. [1] https://issues.apache.org/jira/browse/FLINK-13184 Cheers, Till On Fri, Aug 30, 2019 at 4:06 AM SHI Xiaogang wrote: > Hi Datashov, > > We faced similar problems in our

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Till Rohrmann
The missing support for the Scala shell with Scala 2.12 was documented in the 1.7 release notes [1]. @Oytun, the docker image should be updated in a bit. Sorry for the inconveniences. Thanks for the pointer that flink-queryable-state-client-java_2.11 hasn't been published. We'll upload this in a

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Till Rohrmann
Hi Gavin, if I'm not mistaken, then the community excluded the Scala FlinkShell since a couple of versions for Scala 2.12. The problem seems to be that some of the tests failed. See here [1] for more information. [1] https://issues.apache.org/jira/browse/FLINK-10911 Cheers, Till On Fri, Aug

[ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Till Rohrmann
Hi everyone, I'm very happy to announce that Andrey Zagrebin accepted the offer of the Flink PMC to become a committer of the Flink project. Andrey has been an active community member for more than 15 months. He has helped shaping numerous features such as State TTL, FRocksDB release, Shuffle

Re: Testing with ProcessingTime and FromElementsFunction with TestEnvironment

2019-08-13 Thread Till Rohrmann
of a method, how to test or design Source Function > prevent watermark advancing, so that the timers are not fired (which > is desired behavior - to be guarded by unit tests). > Thanks. > Michal > > On Thu, May 9, 2019 at 9:02 AM Till Rohrmann wrote: > > > > Hi Steve,

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Till Rohrmann
Congrats Hequn and welcome onboard as a committer :-) Cheers, Till On Wed, Aug 7, 2019 at 11:30 AM Becket Qin wrote: > Congrats, Hequn! Well deserved! > > On Wed, Aug 7, 2019 at 11:16 AM Zili Chen wrote: > >> Congrats Hequn! >> >> Best, >> tison. >> >> >> Jeff Zhang 于2019年8月7日周三 下午5:14写道: >>

Re: Why is the size of each checkpoint increasing?

2019-07-29 Thread Till Rohrmann
I think the size of the checkpoint strongly depends on the data you are feeding into this function. Depending on the actual values, it might be that you never fire the window. Please verify what carData actually returns. Cheers, Till On Mon, Jul 29, 2019 at 11:09 AM 陈Darling wrote: > > Flink

Re: Flink on Mesos

2019-07-23 Thread Till Rohrmann
ve a look on our PR: > *https://github.com/apache/flink/pull/8652 > <https://github.com/apache/flink/pull/8652>* > > There are for sure some details to clarify. > > > > Cheers > > Oleksandr > > > > *From: *Till Rohrmann > *Date: *Friday 5 Apri

Re: Flink Zookeeper HA: FileNotFoundException blob - Jobmanager not starting up

2019-07-23 Thread Till Rohrmann
Hi Richard, it looks as if the zNode of a completed job has not been properly removed. Without the logs of the respective JobMaster, it is hard to debug any further. However, I suspect that this is an instance of FLINK-11665. I am currently working on a fix for it. [1]

Re: Fw:Re:Re: About "Flink 1.7.0 HA based on zookeepers "

2019-07-02 Thread Till Rohrmann
Hi, how did you start the job masters? Could you maybe share the logs of all components? It looks as if the leader election is not working properly. One thing to make sure is that you specify for every new HA cluster a different cluster ID via `high-availability.cluster-id: cluster_xy`. That way

Re: Batch mode with Flink 1.8 unstable?

2019-07-02 Thread Till Rohrmann
Thanks for the update Ken. The input splits seem to be org.apache.hadoop.mapred.FileSplit. Nothing too fancy pops into my eye. Internally they use org.apache.hadoop.mapreduce.lib.input.FileSplit which stores a Path, two long pointers and two string arrays with hosts and host infos. I would assume

Re: Maybe a flink bug. Job keeps in FAILING state

2019-07-02 Thread Till Rohrmann
> From:Joshua Fan > Send Time:2019年6月25日(星期二) 11:10 > To:zhijiang > Cc:Chesnay Schepler ; user ; > Till Rohrmann > Subject:Re: Maybe a flink bug. Job keeps in FAILING state > > Hi Zhijiang > > Thank you for your analysis. I agree with it. The solution may be to let

Re: [FLINK-10868] job cannot be exited immediately if job manager is timed out for some reason

2019-07-01 Thread Till Rohrmann
Hi Anyang, as far as I can tell, FLINK-10868 has not been merged into Flink yet. Thus, I cannot tell much about how well it works. The case you are describing should be properly handled in a version which get's merged though. I guess what needs to happen is that once the JM reconnects to the RM

Re: [FLINK-10868] the job cannot be exited immediately if job manager is timed out

2019-07-01 Thread Till Rohrmann
Hi Young, as far as I can tell, FLINK-10868 has not been merged into Flink yet. Thus, I cannot tell much about how well it works. The case you are describing should be properly handled in a version which get's merged though. I guess what needs to happen is that once the JM reconnects to the RM it

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
why you don't see this exception. It could be helpful to better understand the input splits your program is generating. Is it simply a `FileInputSplit` or did you implement a custom InputSplitAssigner with custom InputSplits? Cheers, Till On Mon, Jul 1, 2019 at 11:57 AM Till Rohrmann wrote: >

Re: Batch mode with Flink 1.8 unstable?

2019-07-01 Thread Till Rohrmann
Hi Ken, in order to further debug your problems it would be helpful if you could share the log files on DEBUG level with us. For problem (2), I suspect that it has been caused by Flink releasing TMs too early. This should be fixed with FLINK-10941 which is part of Flink 1.8.1. The 1.8.1 release

Re: dynamic metric

2019-06-21 Thread Till Rohrmann
t; > Le ven. 21 juin 2019 à 01:27, Till Rohrmann a > écrit : > >> Hi David, >> >> I think it is not strictly required that you register the metric in the >> open method. It is just convenient because otherwise you have to make sure >> that you regist

Re: dynamic metric

2019-06-20 Thread Till Rohrmann
Hi David, I think it is not strictly required that you register the metric in the open method. It is just convenient because otherwise you have to make sure that you register the metric only once (e.g. when doing it in the map function). What you need in order to register a metric is the runtime

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
e I said I don't want to have some process in place to remind admins > they need to manually start a node every time they patch or a host goes > down for what ever reason. > > On Tue, 18 Jun 2019 at 04:31, Till Rohrmann wrote: > >> When a single machine fails you should rathe

Re: Need for user class path accessibility on all nodes

2019-06-18 Thread Till Rohrmann
Hi Abdul, as Biao said the `--classpath` option should only be used if you want to make dependencies available which are not included in the submitted user code jar. E.g. if you have installed a large library which is too costly to ship every time you submit a job. Usually, you would not need to

Re: How to restart/recover on reboot?

2019-06-18 Thread Till Rohrmann
mething missing... > > > > On Mon, 17 Jun 2019 at 09:41, Till Rohrmann wrote: > >> Hi John, >> >> I have not much experience wrt setting Flink up via systemd services. Why >> do you want to do it like that? >> >> 1. In standalone mode, Flink won'

Re: How to restart/recover on reboot?

2019-06-17 Thread Till Rohrmann
Hi John, I have not much experience wrt setting Flink up via systemd services. Why do you want to do it like that? 1. In standalone mode, Flink won't automatically restart TaskManagers. This only works on Yarn and Mesos atm. 2. In case of a lost TaskManager, you should run `taskmanager.sh

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-12 Thread Till Rohrmann
+1 for deprecation. Cheers, Till On Wed, Jun 12, 2019 at 4:31 AM Hequn Cheng wrote: > +1 on the proposal! > Maintaining only one Python API is helpful for users and contributors. > > Best, Hequn > > On Wed, Jun 12, 2019 at 9:41 AM Jark Wu wrote: > >> +1 and looking forward to the new Python

Re: What does flink session mean ?

2019-06-04 Thread Till Rohrmann
more general, I think. Cheers, Till On Tue, Jun 4, 2019 at 11:13 AM Jeff Zhang wrote: > Thanks for the reply, @Till Rohrmann . Regarding > reuse computed results. I think JM keep all the metadata of intermediate > data, and interactive programming is also trying to reuse compute

Re: What does flink session mean ?

2019-06-04 Thread Till Rohrmann
Hi Jeff, the session functionality which you find in Flink's client are the remnants of an uncompleted feature which was abandoned. The idea was that one could submit multiple parts of a job to the same cluster where these parts are added to the same ExecutionGraph. That way we wanted to allow to

Re: Distributed cache fault

2019-05-28 Thread Till Rohrmann
Hi Vasyl, please post these kind of question to Flink's user ML since the dev ML is used for development discussions. For the failure on Windows could you share the complete stack trace to see where exactly it fails? It looks as if on Windows the scheme part of the URI makes problems. Looking

Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-28 Thread Till Rohrmann
+1 for removing it. I think it is effectively dead by now. Cheers, Till On Mon, May 27, 2019 at 4:00 PM Hequn Cheng wrote: > Hi Shaoxuan, > > Thanks a lot for driving this. +1 to remove the module. > > The git log of this module shows that it has been inactive for a long > time. I think it's

Re: FW: Constant backpressure on flink job

2019-05-20 Thread Till Rohrmann
Hi Monika and Georgi, it is quite hard to debug this problem remotely because one would need access to the logs with DEBUG log level. Additionally, it would be great to have a better understanding of what the individual operators do. In general, 20 MB of state should be super easy to handle for

Re: Testing with ProcessingTime and FromElementsFunction with TestEnvironment

2019-05-09 Thread Till Rohrmann
-Steve > > Sent from my iPhone > > On May 8, 2019, at 9:33 AM, Till Rohrmann wrote: > > Hi Steve, > > I think the reason for the different behaviour is due to the way event > time and processing time are implemented. > > When you are using event time, watermarks n

Re: Flink on YARN: TaskManager heap auto-sizing?

2019-05-08 Thread Till Rohrmann
Hi Dylan, the container's memory will be calculated here [1]. In the case of Yarn, the user specifies the container memory size and based on this Flink calculates with how much heap memory the JVM is started (container memory size - off heap memory - cut off memory). [1]

Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Till Rohrmann
> > > > On Wed, May 8, 2019 at 7:56 PM Till Rohrmann wrote: > >> Hi Manju, >> >> could you share the full logs or at least the full stack trace of the >> exception with us? >> >> I suspect that after a failover Flink tries to restore the Jo

Re: Class loader premature closure - NoClassDefFoundError: org/apache/kafka/clients/NetworkClient

2019-05-08 Thread Till Rohrmann
Thanks for reporting this issue Chris. It looks indeed as if FLINK-10455 has not been fully fixed. I've reopened it and linked this mailing list thread. If you want, then you could write to the JIRA thread as well. What would be super helpful is if you manage to create a reproducing example for

Re: Migration from flink 1.7.2 to 1.8.0

2019-05-08 Thread Till Rohrmann
Hi Farouk, from the stack trace alone I cannot say much. Would it be possible to share a minimal example which reproduces the problem? My suspicion is that OperatorChain.java:294 produces a null value. Differently, said that somehow there is no StreamConfig registered for the given outputId.

Re: I want to use MapState on an unkeyed stream

2019-05-08 Thread Till Rohrmann
Hi, if you want to increase the parallelism you could also pick a key randomly from a set of keys. The price you would pay is a shuffle operation (network I/O) which would not be needed if you were using the unkeyed stream and used the operator list state. However, with keyed state you could

Re: Getting async function call terminated with an exception

2019-05-08 Thread Till Rohrmann
Hi Avi, you need to complete the given resultFuture and not return a future. You can do this via resultFuture.complete(r). Cheers, Till On Tue, May 7, 2019 at 8:30 PM Avi Levi wrote: > Hi, > We are using flink 1.8.0 (but the flowing also happens in 1.7.2) I tried > very simple unordered async

Re: Testing with ProcessingTime and FromElementsFunction with TestEnvironment

2019-05-08 Thread Till Rohrmann
Hi Steve, I think the reason for the different behaviour is due to the way event time and processing time are implemented. When you are using event time, watermarks need to travel through the topology denoting the current event time. When you source terminates, the system will send a watermark

Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Till Rohrmann
Hi Manju, could you share the full logs or at least the full stack trace of the exception with us? I suspect that after a failover Flink tries to restore the JobGraph from persistent storage (the directory which you have configured via `high-availability.storageDir`) but is not able to do so.

Re: taskmanager.tmp.dirs vs io.tmp.dirs

2019-05-08 Thread Till Rohrmann
Hi Flavio, taskmanager.tmp.dirs is the deprecated configuration key which has been superseded by the io.tmp.dirs configuration option. In the future, you should use io.tmp.dirs. Cheers, Till On Wed, May 8, 2019 at 3:32 PM Flavio Pompermaier wrote: > Hi to all, > looking at >

Re: HA lock nodes, Checkpoints, and JobGraphs after failure

2019-05-02 Thread Till Rohrmann
Thanks for the update Dyana. I'm also not an expert in running one's own ZooKeeper cluster. It might be related to setting the ZooKeeper cluster properly up. Maybe someone else from the community has experience with this. Therefore, I'm cross posting this thread to the user ML again to have a

Re: Watermark for each key?

2019-04-25 Thread Till Rohrmann
nly keeping state for the late > data. > > Make sense? And would it work? > > Med venlig hilsen / Best regards > Lasse Nedergaard > > > Den 24. apr. 2019 kl. 19.00 skrev Till Rohrmann : > > Hi Lasse, > > at the moment this is not supported out of the box

Re: Watermark for each key?

2019-04-24 Thread Till Rohrmann
Hi Lasse, at the moment this is not supported out of the box by Flink. The community thought about this feature but so far did not implement it. Unfortunately, I'm also not aware of an easy workaround one could do in the user code space. Cheers, Till On Wed, Apr 24, 2019 at 3:26 PM Lasse

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-24 Thread Till Rohrmann
> wrote: > >> This makes total sense and actually is smart ( defensive ). Will test and >> report. I think though that this needs to be documented :) >> >> On Wed, Apr 24, 2019 at 6:03 AM Till Rohrmann >> wrote: >> >>> Hi Vishal, >>> &

Re: [EXTERNAL] Re: Looking for help in configuring Swift as State Backend

2019-04-24 Thread Till Rohrmann
> org.apache.flink.runtime.state.StateBackendLoader.loadStateBackendFromConfig(StateBackendLoader.java:121) > >at > org.apache.flink.runtime.state.StateBackendLoader.fromApplicationOrConfigOrDefault(StateBackendLoader.java:222) > >at > org.apache.flink.runtime.execu

Re: Flink Control Stream

2019-04-24 Thread Till Rohrmann
Hi Dominik, I think it is not possible to use Flink's AsyncFunction together with a ConnectedStream or when you use BroadcastState. Therefore, it would be necessary that you inject the control messages into your normal stream and then filter them out in the AsyncFunction#asyncInvoke call.

Re: HA lock nodes, Checkpoints, and JobGraphs not being removed after failure

2019-04-24 Thread Till Rohrmann
Cross linking the dev ML thread [1]. Let us continue the discussion there. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/HA-lock-nodes-Checkpoints-and-JobGraphs-after-failure-td28432.html Cheers, Till On Tue, Apr 23, 2019 at 9:52 AM dyana.rose wrote: > originally posted

Re: May be useful: our reference document for "Understanding State in Flink"

2019-04-24 Thread Till Rohrmann
Thanks for sharing this resource with the community Oytun. It looks really helpful. I'm pulling in David and Fabian who work a lot on documentation. Maybe it's interesting for them to take a look at. The community had once the idea to set up a cook book with common Flink recipes but we never

Re: Fast restart of a job with a large state

2019-04-24 Thread Till Rohrmann
Hi Sergey, at the moment neither local nor incremental savepoints are supported in Flink afaik. There were some ideas wrt incremental savepoints floating around in the community but nothing concrete yet. Cheers, Till On Tue, Apr 23, 2019 at 6:58 PM Sergey Zhemzhitsky wrote: > Hi Stefan, Paul,

Re: Looking for help in configuring Swift as State Backend

2019-04-24 Thread Till Rohrmann
Hi Shakir, have you checked out Flink's documentation for Filesystems [1]? What is the problem you are observing? [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems.html Cheers, Till On Tue, Apr 23, 2019 at 9:30 PM PoolakkalMukkath, Shakir <

Re: No zero ( 2 ) exit code on k8s StandaloneJobClusterEntryPoint when save point with cancel...

2019-04-24 Thread Till Rohrmann
Hi Vishal, it seems that the following is happening: You triggered the cancel with savepoint command from via the REST call. This command is an asynchronous operation which produces a result (the savepoint path). In order to deliver asynchronous results to the caller, Flink waits before shutting

Re: Error restoring from checkpoint on Flink 1.8

2019-04-24 Thread Till Rohrmann
For future reference here is a cross link to the referred ML thread discussion [1]. [1] http://mail-archives.apache.org/mod_mbox/flink-user/201904.mbox/%3cm2ef5tpfwy.wl-nings...@gmail.com%3E Cheers, Till On Wed, Apr 24, 2019 at 4:00 AM Ning Shi wrote: > Hi Congxian, > > I think I have figured

Re: Missing state in RocksDB checkpoints

2019-04-24 Thread Till Rohrmann
Thanks for reporting this issue Ning. I think this is actually a blocker for the next release and should be fixed right away. For future reference here is the issue [1]. I've also pulled in Stefan who knows these components very well. [1] https://issues.apache.org/jira/browse/FLINK-12296

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

2019-04-24 Thread Till Rohrmann
+1 for temporarily removing support for the modify command. Eventually, we have to add it again in order to support auto scaling. The next time we add it, we should address the known limitations. Cheers, Till On Wed, Apr 24, 2019 at 9:06 AM Paul Lam wrote: > Hi Gary, > > + 1 to remove it for

Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2019-04-23 Thread Till Rohrmann
ink client component, > but I don't think it would affect the JobListener interface. What do you > think ? > > > > > Till Rohrmann 于2019年4月18日周四 下午8:57写道: > >> Thanks for starting this discussion Jeff. I can see the need for >> additional hooks for third part

Re: [Discuss] Add JobListener (hook) in flink job lifecycle

2019-04-18 Thread Till Rohrmann
Thanks for starting this discussion Jeff. I can see the need for additional hooks for third party integrations. The thing I'm wondering is whether we really need/want to expose a JobListener via the ExecutionEnvironment. The ExecutionEnvironment is usually used by the user who writes the code and

Re: Service discovery on YARN - find out which port was dynamically assigned to the JobManager Web Interface

2019-04-18 Thread Till Rohrmann
Hi Olivier, since version 1.8 you can set the rest bind port via `rest.bind-port` to a single port or a range. This will now be respected by Yarn deployments. With the next bug fix release of 1.7 you can do the same with `rest.port` but this option only accepts a single port (might lead to port

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-17 Thread Till Rohrmann
nt, and the method ended up timeout. > Honestly I dont know how to debug with this case. > > I’m just curious how people ended up using any async http client without > hassle? > > Thanks, > > Andy, > > On Apr 12, 2019, at 10:23 PM, Till Rohrmann wrote: > > Hi Andy, &g

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-12 Thread Till Rohrmann
ry it in > production first? > Thank a lot, again > > Andy, > > > On Apr 12, 2019, at 2:44 PM, Till Rohrmann wrote: > > Hi Andy, > > there is also a Scala version of the `RichAsyncFunction`. > > In Scala you have to specify a value for class members.

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-12 Thread Till Rohrmann
; ResultFuture[Either[Throwable, ResponseEntity]]): Unit = { > resultFuture.complete(Iterable(Left(new TimeoutException("Async function > call has timed out.")))) > } > } > > ``` > And its run ok. The log was print only one. > > I still asking about this

Re: Scalaj vs akka as http client for Asyncio Flink

2019-04-11 Thread Till Rohrmann
Hi Andy, without being an expert of Akka's http client, I think you should not create a new ActorSystem for every call to `AsyncFunction#asyncInvoke`. What I would recommend you instead is to implement a `RichAsyncFunction` with a transient field for `ActorMaterializer` which you initialize in

Re: What is the best way to handle data skew processing in Data Stream applications?

2019-04-11 Thread Till Rohrmann
Just a small addition: If two hot keys fall into two key groups which are being processed by the same TM, then it could help to change the parallelism, because then the key group mapping might be different. If two hot keys fall into the same key group, you can adjust the max parallelism which

Re: Query on job restoration using relocated savepoint

2019-04-11 Thread Till Rohrmann
Hi Parth, I've pulled Stefan into the conversation who might be able to help you with your problem. Cheers, Till On Wed, Apr 10, 2019 at 7:17 PM Parth Sarathy wrote: > Hi All, >We are trying to restore a job using relocated savepoint > files. As pointed out in the FAQs of

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Till Rohrmann
Thanks a lot to Aljoscha for being our release manager and to the community making this release possible! Cheers, Till On Wed, Apr 10, 2019 at 12:09 PM Hequn Cheng wrote: > Thanks a lot for the great release Aljoscha! > Also thanks for the work by the whole community. :-) > > Best, Hequn > >

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-04-08 Thread Till Rohrmann
is now gated for [50] ms. Reason: [Association failed with > [akka.tcp://fl...@ip-10-10-56-193.eu-west-1.compute.internal:43041]] > Caused by: [Connection refused: ip-10-10-56-193.eu-west-1.compute.internal/ > 10.10.56.193:43041] > > over and over again... > > Thanks for

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-04-08 Thread Till Rohrmann
a couple of things. Not sure this covers everything you would > like to see. > > Thanks! > > Bruno > > On Thu, 21 Mar 2019 at 15:24, Till Rohrmann wrote: > >> Hi Bruno, >> >> could you upload the logs to https://transfer.sh/ or >> https://gist.github.com/ a

Re: Flink on Mesos

2019-04-05 Thread Till Rohrmann
Hi Juan, thanks for reporting this issue. If you could open an issue and also provide a fix for it, then this would be awesome. Please let me know the ticket number so that I can monitor it and give your PR a review. Cheers, Till On Fri, Apr 5, 2019 at 5:34 AM Juan Gentile wrote: > Hello! > >

Re: long lived standalone job session cluster in kubernetes

2019-04-02 Thread Till Rohrmann
t > us work together to get kubernetes integrated natively with flink. Thanks. > > On Fri, Feb 15, 2019 at 12:19 AM Till Rohrmann > wrote: > >> Alright, I'll get back to you once the PRs are open. Thanks a lot for >> your help :-) >> >> Cheers, >> Till &g

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Till Rohrmann
By the way, did the Mesos job mode work for you in the end? On Tue, Apr 2, 2019 at 7:47 AM Till Rohrmann wrote: > Sure I will help with the review. Thanks for opening the PR Jacky. > > Cheers, > Till > > On Tue, Apr 2, 2019 at 2:17 AM Jacky Yin 殷传旺 wrote: > >>

Re: How to run a job with job cluster mode on top of mesos?

2019-04-02 Thread Till Rohrmann
Sure I will help with the review. Thanks for opening the PR Jacky. Cheers, Till On Tue, Apr 2, 2019 at 2:17 AM Jacky Yin 殷传旺 wrote: > Hello Till, > > > > I submitted a PR(#8084) for this issue. Could you help review it? > > > > Many thanks! > > > > *Jacky

Re: How to run a job with job cluster mode on top of mesos?

2019-03-29 Thread Till Rohrmann
ady assigned it to me).  > > > > > > *Jacky Yin * > > *发件人**: *Till Rohrmann > *日期**: *2019年3月26日 星期二 下午6:31 > *收件人**: *Jacky Yin 殷传旺 > *抄送**: *"user@flink.apache.org" > *主题**: *Re: How to run a job with job cluster mode on top of mesos? > > >

Re: How to run a job with job cluster mode on top of mesos?

2019-03-26 Thread Till Rohrmann
Hi Jacky, you're right that we are currently lacking documentation for the `mesos-appmaster-job.sh` script. I've added a JIRA issue to cover this [1]. In order to use this script you first need to store a serialized version of the `JobGraph` you want to run somewhere where the Mesos appmaster

Re: Async Function Not Generating Backpressure

2019-03-25 Thread Till Rohrmann
I think Seed is correct that we don't properly report backpressure from an AsyncWaitOperator. The problem is that not the Task's main execution thread but the Emitter thread will emit the elements and, thus, be stuck in the `requestBufferBuilderBlocking` method. This, however, does not mean that

Re: Flink 1.7.2 extremely unstable and losing jobs in prod

2019-03-21 Thread Till Rohrmann
Hi Bruno, could you upload the logs to https://transfer.sh/ or https://gist.github.com/ and then post a link. For further debugging this will be crucial. It would be really good if you could set the log level to DEBUG. Concerning the number of registered TMs, the new mode (not the legacy mode),

Re: Backoff strategies for async IO functions?

2019-03-19 Thread Till Rohrmann
Sorry for joining the discussion so late. I agree that we could add some more syntactic sugar for handling failure cases. Looking at the existing interfaces, I think it should be fairly easy to create an abstract class AsyncFunctionWithRetry which extends AsyncFunction and encapsulates the retry

Re: Task slot sharing: force reallocation

2019-03-19 Thread Till Rohrmann
arate streams from different source. > Does Flink also support slot splitting among subtasks inside one stream as > well? > > Thanks, > > Le > > On Thu, Mar 7, 2019 at 3:37 AM Till Rohrmann wrote: > >> The community no longer actively supports Flink < 1.6. T

Re: Task slot sharing: force reallocation

2019-03-07 Thread Till Rohrmann
lly be beneficial because it will do what you are looking for. Cheers, Till On Wed, Mar 6, 2019 at 8:13 PM Le Xu wrote: > 1.3.2 -- should I update to the latest version? > > Thanks, > > Le > > On Wed, Mar 6, 2019 at 4:24 AM Till Rohrmann wrote: > >> Which version of Flink a

Re: [1.7.1] job stuck in suspended state

2019-03-06 Thread Till Rohrmann
://issues.apache.org/jira/browse/FLINK-11843 Cheers, Till On Wed, Mar 6, 2019 at 1:21 PM Till Rohrmann wrote: > Hi Steven, > > a quick update from my side after looking through the logs. The problem > seems to be that the Dispatcher does not start recovering the jobs after > regaining the l

Re: [1.7.1] job stuck in suspended state

2019-03-06 Thread Till Rohrmann
send you the complete log offline. We don't know how to reliably > reproduce the problem. but it did happen quite frequently, like once every > a couple of days. Let me see if I can cherry pick the fix/commit to 1.7 > branch. > > Thanks, > Steven > > > On Mon, Mar 4,

Re: EXT :Re: Flink 1.7.1 Inaccessible

2019-03-06 Thread Till Rohrmann
r/201709.mbox/< > 533686a2-71ee-4356-8961-68cf3f858...@expedia.com> > ) > > > On Mon, Mar 4, 2019, 12:38 PM Martin, Nick wrote: > >> Seye, are you running Flink and Zookeeper in Docker? I’ve had problems >> with Jobmanagers not resolving the hostnames for Zookeep

Re: Task slot sharing: force reallocation

2019-03-06 Thread Till Rohrmann
job mode as Piotr > suggests. > > Thanks, > > Le > > On Tue, Mar 5, 2019 at 9:24 AM Till Rohrmann wrote: > >> Hard to tell whether this is related to FLINK-11815. >> >> To me the setup is not fully clear. Let me try to sum it up: According to >> Le Xu's des

Re: Task slot sharing: force reallocation

2019-03-05 Thread Till Rohrmann
Hard to tell whether this is related to FLINK-11815. To me the setup is not fully clear. Let me try to sum it up: According to Le Xu's description there are n jobs running on a session cluster. I assume that every TaskManager has n slots. The observed behaviour is that every job allocates the

Re: Flink 1.7.1 Inaccessible

2019-03-04 Thread Till Rohrmann
Hi Seye, usually, Flink's web UI should be accessible after a successful leader election. Could you share with us the cluster logs to see what's going on? Without this information it is hard to tell what's going wrong. What you could also do is to check the ZooKeeper znode which represents the

Re: [1.7.1] job stuck in suspended state

2019-03-04 Thread Till Rohrmann
Hi Steven, is this the tail of the logs or are there other statements following? I think your problem could indeed be related to FLINK-11537. Is it possible to somehow reliably reproduce this problem? If yes, then you could try out the RC for Flink 1.8.0 which should be published in the next

Re: Re: Re: What are blobstore files and why do they keep filling up /tmp directory?

2019-02-28 Thread Till Rohrmann
run on the TaskManager? > Would it be safe to assume that it is the latest directory created? > > > > Regards, > > Harshith > > > > *From: *Till Rohrmann > *Date: *Thursday, 28 February 2019 at 3:28 PM > *To: *Harshith Kumar Bolar > *Cc: *user > *Subject: *[Exter

Re: Re: What are blobstore files and why do they keep filling up /tmp directory?

2019-02-28 Thread Till Rohrmann
cur when a task manager crashes and restarts – A new > blob-store directory gets created and the old one remains as is, and this > piles up over time. Should these **old** blob-stores be manually cleared > every time a task manager crashes and restarts? > > > > Regards, > &

Re: StreamingFileSink on EMR

2019-02-26 Thread Till Rohrmann
Hmm good question, I've pulled in Kostas who worked on the StreamingFileSink. He might be able to tell you more in case that there is some special behaviour wrt the Hadoop file systems. Cheers, Till On Tue, Feb 26, 2019 at 3:29 PM kb wrote: > Hi Till, > > The only potential issue in the path I

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
gt; Richard > > On Tue, Feb 26, 2019 at 11:57 AM Till Rohrmann > wrote: > >> Hi Richard, >> >> Flink does not support to share state between multiple operators. >> Technically also the broadcast state is not shared but replicated between >> subtasks

Re: Collapsing watermarks after keyby

2019-02-26 Thread Till Rohrmann
As its input streams update their event times, so does the operator. >> >> >> This implies to me that the keyBy splits the watermark? >> >> On Tue, 26 Feb 2019 at 6:40 PM, Till Rohrmann >> wrote: >> >>> Hi Padarn, >>> >>> Fl

Re: Share broadcast state between multiple operators

2019-02-26 Thread Till Rohrmann
Hi Richard, Flink does not support to share state between multiple operators. Technically also the broadcast state is not shared but replicated between subtasks belonging to the same operator. So what you can do is to send the broadcast input to different operators, but they will all keep their

Re: Flink 1.6.4 signing key file in docker-flink repo?

2019-02-26 Thread Till Rohrmann
AD0D034911D5A: public key "Fabian Hueske (CODE SIGNING KEY) < > fhue...@apache.org>" imported > gpg: key 6D072D73B065B356: public key "Tzu-Li Tai (CODE SIGNING KEY) < > tzuli...@apache.org>" imported > gpg: key A8F4FD97121D7293: public key "Aljosch

<    1   2   3   4   5   6   7   8   9   10   >