Re: Using Google Cloud Storage for checkpointing

2018-06-30 Thread Till Rohrmann
ar from the gcs-connector source code, which > worked locally and on K8s. > > Thanks for all the help. > > Rohil > > On Fri, Jun 29, 2018 at 7:02 PM Till Rohrmann > wrote: > >> Hi Rohil, >> >> this sounds a little bit strange. If the GoogleHadoopFileSystem ja

Re: Execute multiple jobs in parallel (threading): java.io.OptionalDataException

2017-10-27 Thread Till Rohrmann
Hi David, I cannot exactly tell how you ended up seeing an OptionalDataException without seeing your code. Flink supports to run multiple jobs on the same cluster. That’s what we call the session mode. You should not reuse the ExecutionEnvironment because then, you will create a single job

Re: Capacity Planning For Large State in YARN Cluster

2017-10-27 Thread Till Rohrmann
Hi Ashish, what you are describing should be a good use case for Flink and it should be able to run your program. When you are seeing a GC overhead limit exceeded error, then it means that Flink or your program are creating too many/too large objects filling up the memory in a short time. I

Re: Checkpoint was declined (tasks not ready)

2017-10-27 Thread Till Rohrmann
Yes please open the PR against Flink's master branch. You can also ping me once you've opened the PR. Then we can hopefully quickly merge it :-) Cheers, Till On Thu, Oct 26, 2017 at 12:44 PM, bartektartanus wrote: > I think we could try with option number one, as it

Re: Queryable State in Flink 1.4

2018-01-05 Thread Till Rohrmann
htbend.com/ > > On Jan 5, 2018, at 6:33 AM, Till Rohrmann <trohrm...@apache.org> wrote: > > Hi Boris, > > if you start 2 TaskManagers on the same host, then you have to define a > port range for the KvState server and the proxy. Otherwise the Flink > cluster should not

Re: "keyed" aggregation

2018-01-05 Thread Till Rohrmann
Hi Christophe, if you don't have a way to recompute the key from the aggregation result, then you have to write an aggregation function which explicitly keeps it (e.g. a tuple value where the first entry is the key and the second the aggregate value). Cheers, Till On Fri, Jan 5, 2018 at 5:51

Re: hadoop-free hdfs config

2018-01-11 Thread Till Rohrmann
core.fs.UnsupportedSchemeFactory.create( > UnsupportedSchemeFactory.java:64) > at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem( > FileSystem.java:401) > ... 13 more > > > I was using for tests clean "Without bundled Hadood" flink binaries and > didn'

Re: Flink and Rest API

2018-01-05 Thread Till Rohrmann
Hi Alberto, currently, the queryable state is not exposed via a REST interface. You have to use the QueryableStateClient for that [1]. If it's not possible to directly connect to the machines running your Flink cluster, then you can also expose the values via the metric system as Sendoh proposed.

Re: akka.remote.ShutDownAssociation: Shut down address: akka.tcp://flink@fps-flink-jobmanager:45652

2018-01-10 Thread Till Rohrmann
Hi, I'm actually not sure what's going on there. I suspect that it must have something to do with the local setup. Not sure from where you submit your job. If you submitted the job from fps-flink-jobmanager then this could the client actor system which shuts down after submitting the job because

Re: hadoop-free hdfs config

2018-01-10 Thread Till Rohrmann
Hi Sasha, you're right that if you want to access HDFS from the user code only it should be possible to use the Hadoop free Flink version and bundle the Hadoop dependencies with your user code. However, if you want to use Flink's file system state backend as you did, then you have to start the

Re: Failed to serialize remote message [class org.apache.flink.runtime.messages.JobManagerMessages$JobFound

2018-01-16 Thread Till Rohrmann
Hi, this indeed indicates that a REST handler is requesting the ExecutionGraph from a JobManager which does not run in the same ActorSystem. Could you please tell us the exact HA setup. Are your running Flink on Yarn with HA or do you use standalone HA with standby JobManagers? It would be

Re: Failed to serialize remote message [class org.apache.flink.runtime.messages.JobManagerMessages$JobFound

2018-01-17 Thread Till Rohrmann
t is intended that this never happens. (because if > will call remote actor systems) so this class not being serializable is not > a bug > > > > > > On 16 January 2018 at 14:51, Till Rohrmann <trohrm...@apache.org> wrote: > >> Hi, >> >> this i

Re: Task Manager detached under load

2018-02-05 Thread Till Rohrmann
Hi, this sounds like a serious regression wrt Flink 1.3.2 and we should definitely find out what's causing this problem. Given from what I see in the logs, the following happens: For some time the JobManager seems to no longer receive heartbeats from the TaskManager. This could be, for example,

Re: AKA and quarantine

2018-01-29 Thread Till Rohrmann
Hi Vishal, Akka usually quarantines remote ActorSystems in case of a system message delivery failure or if the death watch was triggered. This can, for example, happen if your machine is under heavy load or has a high GC pressure and does not find enough time to respond to the heartbeats. - If

Re: AKA and quarantine

2018-01-30 Thread Till Rohrmann
e of how much state is better kept in the memtable > to prevent a SSTable enquiry. > > On Mon, Jan 29, 2018 at 12:13 PM, Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Vishal, >> >> Akka usually quarantines remote ActorSystems in case of a system message >&

Re: Developing and running Flink applications in Linux through Windows editors or IDE's ?

2018-02-09 Thread Till Rohrmann
Hi Esa, welcome to the community :-). For the development of Flink it does not really matter how you code. In general, contributors pick what suits their needs best and so should you. Here is a link for general remarks for setting up IntelliJ and Eclipse [1]. [1]

Re: dataset sort

2018-02-09 Thread Till Rohrmann
Hi David, Flink only supports sorting within partitions. Thus, if you want to write out a globally sorted dataset you should set the parallelism to 1 which effectively results in a single partition. Decreasing the parallelism of an operator will cause the individual partitions to lose its sort

Re: Batch Cascade application

2018-02-09 Thread Till Rohrmann
Hi Puneet, without more information about the job you're running (ideally code), it's hard to help you. Cheers, Till On Fri, Feb 9, 2018 at 10:43 AM, Puneet Kinra < puneet.ki...@customercentria.com> wrote: > Hi > > I am working on batch application i which once the data is get loaded into >

Re: Two issues when deploying Flink on DC/OS

2018-02-09 Thread Till Rohrmann
Hi, "java.io.IOException: Connection reset by peer" is usually thrown if the remote peer terminates the connection. So the interesting bit would be who's requesting static files from Flink. So far we serve the web frontend and the log and stdout files via the StaticFileServerHandler. Maybe it's

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-21 Thread Till Rohrmann
@Bart, I think there is no Flip yet for the proper stop with savepoint implementation. My gut feeling is that the community will soon address this problem since it's a heavily requested feature. Cheers, Till On Wed, Feb 21, 2018 at 10:26 AM, Till Rohrmann <trohrm...@apache.org> wrote:

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-21 Thread Till Rohrmann
ly. > > Did I misunderstand this thread? > > If not this sounds like pretty annoying? Do people have some sort of > workaround for that? > > Thanks, > -- > Christophe > > > > On Mon, Feb 19, 2018 at 5:50 PM, Till Rohrmann <trohrm...@apache.org> > wr

Re: Correlation between number of operators and Job manager memory requirements

2018-02-20 Thread Till Rohrmann
JM? Is it OK to assume > that the amount of memory required by JM grows linearly with the total > number of operators deployed? > > Thanks, > Shailesh > > > On Mon, Feb 19, 2018 at 10:18 PM, Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi Shailesh, >&g

Re: Managed State Custom Serializer with Avro

2018-02-20 Thread Till Rohrmann
gt; * Read (key1, value1) with schema2 and read with (key2, value2) with > schema1->schema2. > > Thanks for any feedback > > Arvid > > On Mon, Feb 19, 2018 at 7:17 PM, Niels Denissen <nielsdenis...@gmail.com> > wrote: > > Hi Till, > > Thanks for the quick

Re: Manipulating Processing elements of Network Buffers

2018-02-19 Thread Till Rohrmann
Hi Max, the network buffer order is quite important at the moment, because the network stream does not only transport data but also control events such as the checkpoint barriers. In order to guarantee that you don't lose data in case of a failure it is (at the moment) strictly necessary that

Re: Question about the need of consumer groups from kafka

2018-02-19 Thread Till Rohrmann
Hi Ricardo, could you please give a bit more details what you mean with "not using its own mechanism"? Flink's Kafka connector uses the Kafka consumer and producer (to some extent) API to talk to Kafka. The consumer groups are a central concept of Kafka and as such, the Flink Kafka connector has

Re: Iterating over state entries

2018-02-19 Thread Till Rohrmann
Hi Ken, just for my clarification, the `RocksDBMapState#entries` method does not satisfy your requirements? This method does not allow you to iterate across different keys of your keyed stream of course. But it should allow you to iterate over the different entries for a given key of your keyed

Re: Concurrent modification Exception when submitting multiple jobs

2018-02-19 Thread Till Rohrmann
Hi Vinay, could you try to create a dedicated RemoteEnvironment for each parallel thread. I think that the StreamExecutionEnvironment is not thread safe and should, thus, not be shared across multiple threads if that's the case. Getting a glimpse at your code would also help to further

Re: Regarding Task Slots allocation

2018-02-19 Thread Till Rohrmann
Hi Vinay, try to set the parallelism to 2 for the job you are executing via the RemoteExecutionEnvironment. Where have you specified the number of TaskManager slots? In the flink-conf.yaml file which you used to deploy the remote Flink cluster? Cheers, Till On Fri, Feb 16, 2018 at 7:14 PM,

Re: A "per operator instance" window all ?

2018-02-19 Thread Till Rohrmann
Hi Julien, at the moment Flink only supports parallel windows which are keyed. What you would need is something like a per-partition window which is currently not supported. The problem with that is that it is not clear how to rescale a per-partition window because it effectively means that you

Re: Stopping a kafka consumer gracefully (no losing of inflight events, StoppableFunction)

2018-02-19 Thread Till Rohrmann
Hi Bart, you're right that Flink currently does not support a graceful stop mechanism for the Kafka source. The community has already a good idea how to solve it in the general case and will hopefully soon add it to Flink. Concerning the StoppableFunction: This interface was introduced quite

Re: Regarding BucketingSink

2018-02-19 Thread Till Rohrmann
Hi Vishal, what pending files should indeed get eventually finalized. This happens on a checkpoint complete notification. Thus, what you report seems not right. Maybe Aljoscha can shed a bit more light into the problem. In order to further debug the problem, it would be really helpful to get

Re: Managed State Custom Serializer with Avro

2018-02-19 Thread Till Rohrmann
Hi Niels, which version of Flink are you using? Currently, Flink does not support to upgrade the TypeSerializer itself, if I'm not mistaken. As you've described, it will try to use the old serializer stored in the checkpoint stream to restore state. I've pulled Gordon into the conversation who

Re: Correlation between number of operators and Job manager memory requirements

2018-02-19 Thread Till Rohrmann
Hi Shailesh, my question would be where do you see the OOM happening? Does it happen on the JM or the TM. The memory requirements for each operator strongly depend on the operator and it is hard to give a general formula for that. It mostly depends on the user function. Flink itself should not

Re: Breakage in Flink CLI in 1.5.0

2018-06-21 Thread Till Rohrmann
t.address and the rest.address is valid and working fine but my question > is why am I supposed to give jobmanager.rpc.address for flink client to > connect to flink cluster if flink client depends only on rest.address? > > On Thu, Jun 21, 2018 at 12:41 PM, Till Rohrmann > wrote: > >&

Re: Flink application does not scale as expected, please help!

2018-06-19 Thread Till Rohrmann
Hi, as Fabian explained, if you exceed the number of slots on a single TM, then Flink needs to deploy tasks on other TMs which causes a network transfer between the sources and the mappers. This will have a negative impact if you compare it to a setup where everything runs locally. The

Re: Flink 1.5 Yarn Connection unexpectedly closed

2018-06-22 Thread Till Rohrmann
100% used slots to only 1-2 used slots >> as I hit that timeout. I'll do that today and let you know, if it works I >> can share the code in here as an example. >> >> On Thu, Jun 21, 2018 at 5:01 AM Till Rohrmann >> wrote: >> >>> Hi Garrett, >>

Re: Could not retrieve the redirect address - No REST endpoint has been started

2018-08-03 Thread Till Rohrmann
Hi Pedro, it looks as if Flink could not start the WebRuntimeMonitor. Could you maybe share the jobmanager.log of the newly started JobManager which has no WebRuntimeMonitor started with us? Maybe there is a port conflict. Cheers, Till On Thu, Aug 2, 2018 at 11:35 AM vino yang wrote: > Hi

Re: Delay in REST/UI readiness during JM recovery

2018-08-03 Thread Till Rohrmann
Hi Joey, your analysis is correct. Currently, the Dispatcher will first try to recover all jobs before it confirms the leadership. 1) The Dispatcher provides much of the relevant information you see in the web-ui. Without a leading Dispatcher, the web-ui cannot show much information. But this

[ANNOUNCE] Weekly community update #31

2018-08-03 Thread Till Rohrmann
Dear community, this is the weekly community update thread #31. Please post any news and updates you want to share with the community to this thread. # Flink 1.5.2 has been released The community released the second bug fix release Flink 1.5.2 [1]. The release contains more than 20 improvements

Re: Old job resurrected during HA failover

2018-08-03 Thread Till Rohrmann
Hi Elias and Vino, sorry for the late reply. I think your analysis is pretty much to the point. The current implementation does not properly respect the situation with multiple standby JobManagers. In the single JobManager case, a loss of leadership either means that the JobManager has died and,

Re: Questions on Unbounded number of keys

2018-07-31 Thread Till Rohrmann
of devices and > uniqueness of key is simply a time identifier prefixed to device > identifier. Even though there could be a little delayed data, the chances > of number of unique keys growing constantly for days is probably none as > device list is constant. > > Thanks, Ashish >

Re: [ANNOUNCE] Apache Flink 1.5.2 released

2018-08-01 Thread Till Rohrmann
Great work! Thanks to everyone who helped with the release. Cheers, Till On Tue, Jul 31, 2018 at 7:43 PM Bowen Li wrote: > Congratulations, community! > > On Tue, Jul 31, 2018 at 1:44 AM Chesnay Schepler > wrote: > >> The Apache Flink community is very happy to announce the release of >>

Re: 答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-07-31 Thread Till Rohrmann
I think that the web ui automatically redirects to the current leader. So if you should access the JobManager which is not leader, then you should get an HTTP redirect to the current leader. Due to that it should not be strictly necessary to know which of the JobManagers is the leader. The

Re: Questions on Unbounded number of keys

2018-07-31 Thread Till Rohrmann
t;> >> So to reply to your answers, no matter using different State backends or >> regularly cleaning up the States (which is exactly what I am doing), it >> does not concern the number of keyed operator instances. >> >> I would like to know: >> >>- Will t

Re: Logs are not easy to read through webUI

2018-07-31 Thread Till Rohrmann
Hi Xinyu, thanks for starting this discussion. I think you should open a JIRA issue for this feature. I can see the benefit of such a feature if the DailyRollingAppender is activated. Cheers, Till On Mon, Jul 30, 2018 at 1:47 PM vino yang wrote: > Hi Xinyu, > > Thanks for your suggestion. I

[ANNOUNCE] Apache Flink 1.6.0 released

2018-08-09 Thread Till Rohrmann
The Apache Flink community is very happy to announce the release of Apache Flink 1.6.0. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. The release is available for download at:

[ANNOUNCE] Weekly community update #33

2018-08-15 Thread Till Rohrmann
Dear community, this is the weekly community update thread #33. Please post any news and updates you want to share with the community to this thread. # Flink 1.6.0 has been released The community released Flink 1.6.0 [1]. Thanks to everyone who made this release possible. # Improving tutorials

[ANNOUNCE] Weekly community update #32

2018-08-07 Thread Till Rohrmann
Dear community, this is the weekly community update thread #32. Please post any news and updates you want to share with the community to this thread. # Flink 1.6.0 voting period about to end The voting period for the 4th RC of Flink 1.6.0 will end Wednesday 6:30pm CET [1]. So far there seem to

Re: flink on kubernetes

2018-08-13 Thread Till Rohrmann
Hi Mingliang, I'm currently writing the updated documentation for Flink's job cluster container entrypoint. It should be ready later today. In the meantime, you can checkout the `flink-container` module and its subdirectories. They already contain some information in the README.md on how to

Re: Job Manager killed by Kubernetes during recovery

2018-08-21 Thread Till Rohrmann
Hi Bruno, in order to debug this problem we would need a bit more information. In particular, the logs of the cluster entrypoint and your K8s deployment specification would be helpful. If you have some memory limits specified these would also be interesting to know. Cheers, Till On Sun, Aug 19,

Re: 答复: Best way to find the current alive jobmanager with HA mode zookeeper

2018-08-21 Thread Till Rohrmann
would be more user friendly and would eliminate the > extra step of figuring out the Job Manager address. > > Thanks, > M > > > > On Tue, Jul 31, 2018 at 3:54 PM, Till Rohrmann > wrote: > >> I think that the web ui automatically redirects to the current leader. S

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-21 Thread Till Rohrmann
Just a small addition. Concurrent cancel call will interfere with the cancel-with-savepoint command and directly cancel the job. So it is better to use the cancel-with-savepoint call in order to take savepoint and then cancel the job automatically. Cheers, Till On Thu, Aug 9, 2018 at 9:53 AM

Re: [ANNOUNCE] Apache Flink 1.5.3 released

2018-08-21 Thread Till Rohrmann
Great news. Thanks a lot for managing the release Chesnay and to all who have contributed to this release. Cheers, Till On Tue, Aug 21, 2018 at 2:12 PM Chesnay Schepler wrote: > |The Apache Flink community is very happy to announce the release of > Apache Flink 1.5.3, which is the third bugfix

[ANNOUNCE] Weekly community update #34

2018-08-22 Thread Till Rohrmann
Dear community, this is the weekly community update thread #33. Please post any news and updates you want to share with the community to this thread. # Flink 1.5.3 has been released The community released Flink 1.5.3 [1]. It contains more than 20 fixes and improvements. The community recommends

Re: [DISCUSS] Remove the slides under "Community & Project Info"

2018-08-28 Thread Till Rohrmann
+1 for removing these slides. On Mon, Aug 27, 2018 at 10:03 AM Fabian Hueske wrote: > I agree to remove the slides section. > A lot of the content is out-dated and hence not only useless but might > sometimes even cause confusion. > > Best, > Fabian > > > > Am Mo., 27. Aug. 2018 um 08:29 Uhr

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
08-28 11:54:10,789 INFO >>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - >>>> Recovered SubmittedJobGraph(9e0a109b57511930c95d3b54574a66e3, null). >>>> >>>> Note that this log appears on the standby jobmanager immedia

Re: anybody can start flink with job mode?

2018-08-29 Thread Till Rohrmann
Great to hear :-) On Tue, Aug 28, 2018, 14:59 Hao Sun wrote: > Thanks Till for the follow up, I can run my job now. > > On Tue, Aug 28, 2018, 00:57 Till Rohrmann wrote: > >> Hi Hao, >> >> Vino is right, you need to specify the -j/--job-classname option which >

Re: JobGraphs not cleaned up in HA mode

2018-08-28 Thread Till Rohrmann
Hi Encho, thanks a lot for reporting this issue. The problem arises whenever the old leader maintains the connection to ZooKeeper. If this is the case, then ephemeral nodes which we create to protect against faulty delete operations are not removed and consequently the new leader is not able to

Re: Implement Joins with Lookup Data

2018-08-28 Thread Till Rohrmann
; >>>>> BTW, >>>>> >>>>> We got around bootstrap problem for similar use case using a “nohup” >>>>> topic as input stream. Our CICD pipeline currently passes an initialize >>>>> option to app IF there is a need to boo

Re: anybody can start flink with job mode?

2018-08-28 Thread Till Rohrmann
Hi Hao, Vino is right, you need to specify the -j/--job-classname option which specifies the job name you want to execute. Please make sure that the jar containing this class is on the class path. I recently pushed some fixes which generate a better error message than the one you've received. If

Re: Could not cancel job (with savepoint) "Ask timed out"

2018-08-22 Thread Till Rohrmann
t another cancel-with-savepoint call is > just ignored. > > On Tue, Aug 21, 2018 at 1:18 PM Till Rohrmann > wrote: > >> Just a small addition. Concurrent cancel call will interfere with the >> cancel-with-savepoint command and directly cancel the job. So it is better >

Re: Job Manager killed by Kubernetes during recovery

2018-08-22 Thread Till Rohrmann
y indicated as an event. > > Thanks Vino and Till for your time! > > Bruno > > On Tue, 21 Aug 2018 at 10:21 Till Rohrmann wrote: > >> Hi Bruno, >> >> in order to debug this problem we would need a bit more information. In >> particular, the logs of the

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-08-29 Thread Till Rohrmann
tml > https://issues.apache.org/jira/browse/FLINK-10011 > > so I guess I'll have to wait until there is a fix as I have not seens any > workaround (other than removing the jobgraph from Zookeeper after > cancelling the task) > > Thanks, > > Gerard > > > On Tue, Jul 2

Re: JobGraphs not cleaned up in HA mode

2018-08-29 Thread Till Rohrmann
gets:* > *-FlinkBatchPipelineTranslator pipeline logs- (we use Apache Beam)* > > 2018-08-29 11:47:06,006 INFO > org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Submitting > job d69b67e4d28a2d244b06d3f6d661bca1 > (sicassandrawriterbeam-flink-0829114703-7d95fabd). >

Re: Flink CLI properties with HA

2018-07-17 Thread Till Rohrmann
Hi Sampath, technically the client does not need to know the `high-availability.storageDir` to submit a job. However, due to how we construct the ZooKeeperHaServices it is still needed. The reason behind this is that we use the same services for the server and the client. Thus, the implementation

Re: FlinkCEP and scientific papers ?

2018-07-18 Thread Till Rohrmann
You are right Vino, the initial implementation was based on the above mentioned paper. Cheers, Till On Tue, Jul 17, 2018 at 5:34 PM vino yang wrote: > Hi Esa, > > AFAIK, the earlier Flink CEP refers to the Paper 《Efficient Pattern > Matching over Event Streams》[1]. Flink absorbed two major

Re: Flink on Mesos: containers question

2018-07-19 Thread Till Rohrmann
erized in Mesos? > > > > Thanks! > > Alex > > > > *From:* Fabian Hueske [mailto:fhue...@gmail.com] > *Sent:* Monday, July 16, 2018 7:57 AM > *To:* NEKRASSOV, ALEXEI > *Cc:* user@flink.apache.org; Till Rohrmann > *Subject:* Re: Flink on Mesos: c

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-19 Thread Till Rohrmann
Hi Gerard, the logging statement `Removed job graph ... from ZooKeeper` is actually not 100% accurate. The actual deletion is executed as an asynchronous background task and the log statement is not printed in the callback (which it should). Therefore, the deletion could still have failed. In

Re: Flink 1.5 batch job fails to start

2018-07-25 Thread Till Rohrmann
-07-24T12:09:38.853+ >>>> [org.apache.flink.runtime.entrypoint.ClusterEntrypoint] INFO cluster >>>> 2018-07-24T12:09:38.853+ >>>> [org.apache.flink.runtime.entrypoint.ClusterEntrypoint] INFO --host >>>> 2018-07-24T12:09:38.853+ >&

Re: LoggingFactory: Access to static method across operators

2018-07-24 Thread Till Rohrmann
Hi Jayant, I think you should be able to implement your own StaticLoggerBinder which returns your own LoggerFactory. That is quite similar to how the different logging backends (log4j, logback) integrate with slf4j. Cheers, Till On Tue, Jul 24, 2018 at 5:41 PM Jayant Ameta wrote: > I am using

Re: Implement Joins with Lookup Data

2018-07-24 Thread Till Rohrmann
Hi Harshvardhan, I agree with Ankit that this problem could actually be solved quite elegantly with Flink's state. If you can ingest the product/account information changes as a stream, you can keep the latest version of it in Flink state by using a co-map function [1, 2]. One input of the co-map

[ANNOUNCE] Weekly community update #30

2018-07-24 Thread Till Rohrmann
Dear community, this is the weekly community update thread #30. Please post any news and updates you want to share with the community to this thread. # First RC for Flink 1.6.0 The community is published the first release candidate for Flink 1.6.0 [1]. Please help the community by trying the RC

Re: When a jobmanager fails, it doesn't restart because it tries to restart non existing tasks

2018-07-24 Thread Till Rohrmann
eeperLeaderRetrievalService /leader/rest_server_lock. >> 11:29:17.812 [main] INFO >> o.a.f.r.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting >> ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. >> 11:29:18.007 [main] INFO org.apache.flink.runtime.rest.RestClient

Re: Flink 1.5 batch job fails to start

2018-07-24 Thread Till Rohrmann
Hi Alex, I'm not entirely sure what causes this problem because it is the first time I see it. First question would be if the problem also arises if using a different Hadoop version. Are you using the same Java versions on the client as well as on the server? Could you provide us with the

Re: Questions on Unbounded number of keys

2018-07-24 Thread Till Rohrmann
Hi Chang Liu, if you are dealing with an unlimited number of keys and keep state around for every key, then your state size will keep growing with the number of keys. If you are using the FileStateBackend which keeps state in memory, you will eventually run into an OutOfMemoryException. One way

Re: Memory Logging

2018-07-24 Thread Till Rohrmann
Hi Oliver, which Flink image are you using? If you are using the docker image from docker hub [1], then the memory logging will go to stdout and not to a log file. The reason for this behavior is that the docker image configures the logger to print to stdout such that one can easily access the

Re: NoClassDefFoundError when running Twitter Example

2018-07-24 Thread Till Rohrmann
Hi Syed, could you check whether this class is actually contained in the twitter example jar? If not, then you have to build an uber jar containing all required dependencies. Cheers, Till On Tue, Jul 24, 2018 at 5:11 AM syed wrote: > I am facing the *java.lang.NoClassDefFoundError: >

Re: SingleOutputStreamOperator vs DataStream?

2018-07-24 Thread Till Rohrmann
Hi Chris, a `DataStream` represents a stream of events which have the same type. A `SingleOutputStreamOperator` is a subclass of `DataStream` and represents a user defined transformation applied to an input `DataStream` and producing an output `DataStream` (represented by itself). Since you can

Re: Implement Joins with Lookup Data

2018-07-24 Thread Till Rohrmann
itial hydration of the Product and Account data > since it’s currently in a relational DB? Do we have to copy over data to > Kafka and then use them? > > Regards, > Harsh > > On Tue, Jul 24, 2018 at 09:22 Till Rohrmann wrote: > >> Hi Harshvardhan, >> >> I ag

Re: Why is flink master bump version to 1.7?

2018-07-17 Thread Till Rohrmann
Yes, pulling from https://git-wip-us.apache.org/repos/asf/flink.git should show you the release-1.6 branch. Cheers, Till On Tue, Jul 17, 2018 at 10:37 AM Chesnay Schepler wrote: > The release-1.6 branch exists ( >

Re: StateMigrationException when switching from TypeInformation.of to createTypeInformation

2018-07-17 Thread Till Rohrmann
t 8:44 AM Till Rohrmann wrote: > >> could you check whether the `TypeInformation` returned by >> `TypeInformation.of(new TypeHint[ConfigState]() {}))` and >> `createTypeInformation[ConfigState]` return the same `TypeInformation` >> subtype? The problem is that the former

[ANNOUNCE] Weekly community update #29

2018-07-17 Thread Till Rohrmann
Dear community, this is the weekly community update thread #29. Please post any news and updates you want to share with the community to this thread. # Feature freeze Flink 1.6 The Flink community has cut off the release branch for Flink 1.6 [1]. From now on, the community will concentrate on

[ANNOUNCE] Weekly community update #35

2018-08-29 Thread Till Rohrmann
Dear community, this is the weekly community update thread #35. Please post any news and updates you want to share with the community to this thread. # Flink 1.7 community roadmap The Flink community started discussing which features will be included in the next upcoming major release. Join the

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-06 Thread Till Rohrmann
remotewatcher >> detects it unreachable and the TaskManager unregisters it, I do not see any >> other activity in the JobManager with regards to reallocation etc. >> - Does the quarantining of the TaskManager not happen until the >> exit-on-fatal-akka-error is turned on

[ANNOUNCE] Weekly community update #37

2018-09-10 Thread Till Rohrmann
Dear community, this is the weekly community update thread #37. Please post any news and updates you want to share with the community to this thread. # Flink Forward Berlin 2018 over Last week the community met in Berlin for the 4th edition of Flink Forward 2018 [1]. It was great to see people

Re: Configuring Ports for Job/Task Manager Metrics

2018-08-31 Thread Till Rohrmann
Hi Deirdre, Flink does not support to control where Yarn containers are placed. This is the responsibility of Yarn as the cluster manager. In Yarn 3.1.0 it is possible to specify placement constraints for containers but also this won't fully solve your problem. Imagine that you have a single Yarn

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-31 Thread Till Rohrmann
ation_1535135887442_0906. > > 2018-08-29 23:15:31,370 INFO org.apache.flink.yarn.YarnJobManager > - Stopping JobManager akka.tcp://fl...@blahxyz.sfdc.net > <http://blahabc.sfdc.net/>:1235/user/jobmanager. > > 2018-08-29 23:15:41,225 ERROR org.apache.flink.yarn.YarnJobMa

Re: Problem with querying state on Flink 1.6.

2018-08-30 Thread Till Rohrmann
Hi Joe, it looks as if the queryable state server binds to the local loopback address. This looks like a bug to me. Could you maybe share the complete cluster entrypoint and the task manager logs with me? In the meantime you could try to do the following: Change AbstractServerBase.java:227 into

Re: checkpoint timeout

2018-08-30 Thread Till Rohrmann
Hi John, which version of Flink are you using. I just tried it out with the current snapshot version and I could configure the checkpoint timeout via CheckpointConfig checkpointConfig = env.getCheckpointConfig(); checkpointConfig.setCheckpointTimeout(1337L); Could you provide us the logs and

Re: withFormat(Csv) is undefined for the type BatchTableEnvironment

2018-08-30 Thread Till Rohrmann
Hi François, as Vino said, the BatchTableEnvironment does not provide a `withFormat` method. Admittedly, the documentation does not state it too explicitly but you can only call the `withFormat` method on a table connector as indicated here [1]. If you think that you need to get the data from

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-08-30 Thread Till Rohrmann
Hi Subramanya, in order to help you I need a little bit of context. Which version of Flink are you running? The configuration yarn.reallocate-failed is deprecated since version Flink 1.1 and does not have an effect anymore. What would be helpful is to get the full jobmanager log from you. If the

Re: History Server in Kubernetes

2018-08-30 Thread Till Rohrmann
Hi Encho, currently, the existing image does not support to start a HistoryServer. The reason is simply that it has not been exposed because the image contains everything needed. In order to do this, you would need to extend the docker-entrypoint.sh script with an additional history-server

Re: WriteTimeoutException in Cassandra sink kill the job

2018-08-30 Thread Till Rohrmann
Hi Jayant, afaik it is currently not possible to control how failures are handled in the Cassandra Sink. What would be the desired behaviour? The best thing is to open a JIRA issue to discuss potential improvements. Cheers, Till On Thu, Aug 30, 2018 at 12:15 PM Jayant Ameta wrote: > Hi, >

Re: Exception when run flink-storm-example

2018-09-11 Thread Till Rohrmann
Hi Hanjing, I think the problem is that the Storm compatibility layer only works with legacy mode at the moment. Please set `mode: legacy` in your flink-conf.yaml. I hope this will resolve the problems. Cheers, Till On Tue, Sep 11, 2018 at 7:10 AM jing wrote: > Hi vino, > Thank you very much.

Re: Exception when run flink-storm-example

2018-09-11 Thread Till Rohrmann
ttp%3A%2F%2Fmail-online.nosdn.127.net%2Fsmda6015df3a52ec22402a83bdad785acb.jpg=%5B%22%22%2C%22%22%2C%22%22%2C%22%22%2C%22%22%5D> > 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制 > On 9/11/2018 14:43,Till Rohrmann > wrote: > > Hi Hanjing, > > I thi

Re: 1.5 some thing weird

2018-07-10 Thread Till Rohrmann
l 10, 2018 at 3:16 AM, Till Rohrmann > wrote: > >> Hi Vishal, >> >> it looks as if the flushing of the checkpoint data to HDFS failed due to >> some expired lease on the checkpoint file. Therefore, Flink aborted the >> checkpoint `chk-125` and removed it. This

Re: REST API time out ( flink 1.5 ) on SP

2018-07-10 Thread Till Rohrmann
4 and I > could not figure out from > https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/config.html > which setting to work with. > > Thanks much. > > > > On Tue, Jul 10, 2018 at 3:18 AM, Till Rohrmann > wrote: > >> Hi Vishal, >> >> you nee

Re: high availability with automated disaster recovery using zookeeper

2018-07-10 Thread Till Rohrmann
Hi Tovi, that is an interesting use case you are describing here. I think, however, it depends mainly on the capabilities of ZooKeeper to produce the intended behavior. Flink itself relies on ZooKeeper for leader election in HA mode but does not expose any means to influence the leader election

Re: REST API time out ( flink 1.5 ) on SP

2018-07-10 Thread Till Rohrmann
Hi Vishal, you need to give us a little bit more context in order to understand your question. Cheers, Till On Mon, Jul 9, 2018 at 10:36 PM Vishal Santoshi wrote: > java.util.concurrent.CompletionException: > akka.pattern.AskTimeoutException: Ask timed out on >

Re: 1.5 some thing weird

2018-07-10 Thread Till Rohrmann
Hi Vishal, it looks as if the flushing of the checkpoint data to HDFS failed due to some expired lease on the checkpoint file. Therefore, Flink aborted the checkpoint `chk-125` and removed it. This is the normal behaviour if Flink cannot complete a checkpoint. As you can see, afterwards, the

<    1   2   3   4   5   6   7   8   9   10   >