Using Kafka 0.10.x timestamps as a record value in Flink Streaming

2017-05-11 Thread Jia Teoh
Hi, Is there a way to retrieve the timestamps that Kafka associates with each key-value pair within Flink? I would like to be able to use these as values within my application flow, and defining them before or after Kafka is not acceptable for the use case due to the latency involved in sending

Re: ConnectedStream keyby issues

2017-05-11 Thread Aljoscha Krettek
Yes, that looks right. > On 10. May 2017, at 14:56, yunfan123 wrote: > > In upstairs example, it seems I should clear the state in onTimer function in > order to free resource like follows: > public void onTimer(long l, OnTimerContext onTimerContext, >

Storage options for RocksDBStateBackend

2017-05-11 Thread ayush
Hello, I had a few questions regarding checkpoint storage options using RocksDBStateBackend. In the flink 1.2 documentation, it is the recommended state backend due to it's ability to store large states and asynchronous snapshotting. For high availabilty it seems HDFS is the recommended store for

Re: Re: Re: ElasticsearchSink on DataSet

2017-05-11 Thread Flavio Pompermaier
Great! I was just thinking that, in principle, a streaming sink is an extension of a batch one. Am I wrong? This would avoid a lot of code duplication and would improve the overall maintainability.. On Thu, May 11, 2017 at 4:35 AM, wyphao.2007 wrote: > Hi Flavio, I made a

Re: issue running flink in docker

2017-05-11 Thread Stephan Ewen
Glad to hear it! Overlay networks (as most container infras use) are tricky and we need to add some code to make diagnostics of issues easier in those cases... Stephan On Wed, May 10, 2017 at 9:30 PM, David Brelloch wrote: > Stephan, > > Thanks for pointing us in the

Re: Gelly - generics with custom vertex value

2017-05-11 Thread Kaepke, Marc
Thanks for the hint. I focused on it and get a strange behavior. If I change from EdgeDirection.ALL (what I need) to EdgeDirection.OUT (or .IN), everything seems okey. The sublist operation was still active. Then I replaced the sublist with the entire list and there was no exception

Storage options for RocksDBStateBackend

2017-05-11 Thread Ayush Goyal
Hello, I had a few questions regarding checkpoint storage options using RocksDBStateBackend. In the flink 1.2 documentation, it is the recommended state backend due to it's ability to store large states and asynchronous snapshotting. For high availabilty it seems HDFS is the recommended store for

Re: High Availability on Yarn

2017-05-11 Thread Jain, Ankit
Following up further on this. 1) We are using a long running EMR cluster to submit jobs right now and as you know EMR hasn’t made Yarn ResourceManager HA. Is there any way we can use the information put in Zookeeper by Flink Job Manager to bring the jobs back up on a new EMR cluster if

Job submission: Fail using command line. Success using web (flink1.2.0)

2017-05-11 Thread Rami Al-Isawi
Hi, The same exact jar on the same machine is being deployed just fine in couple of seconds using the web interface. On the other hand, if I used the command line, I get: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Couldn't retrieve the

Re: UnknownHostException during start

2017-05-11 Thread Till Rohrmann
Hi Dominique, I’m not exactly sure but this looks more like a Hadoop or a Hadoop configuration problem to me. Could it be that the Hadoop version you’re running does not support the specification of multiple KMS servers via kms://ht...@lfrarxxx1.srv.company;lfrarXXX2.srv.company:16000/kms?

Re: UnknownHostException during start

2017-05-11 Thread Ted Yu
Dominique: Which hadoop release are you using ? Please pastebin the classpath. Cheers On Thu, May 11, 2017 at 7:27 AM, Till Rohrmann wrote: > Hi Dominique, > > I’m not exactly sure but this looks more like a Hadoop or a Hadoop > configuration problem to me. Could it be

Streaming use case: Row enrichment

2017-05-11 Thread Flavio Pompermaier
Hi to all, we have a particular use case where we have a tabular dataset on HDFS (e.g. a CSV) that we want to enrich filling some cells with the content returned by a query to a reverse index (e.g. solr/elasticsearch). Since we want to be able to make this process resilient and scalable we thought

Re: Deactive a job like storm

2017-05-11 Thread Till Rohrmann
To properly implement stop we have to change some internal orchestration structure. This is not a trivial task and so far nobody had found time to work on it. Moreover, the individual sources have to be adapted as well. Cheers, Till On Thu, May 11, 2017 at 4:54 AM, yunfan123

UnknownHostException during start

2017-05-11 Thread Dominique Rondé
Dear all, i got some trouble during the start of Flink in a Yarn-Container based on Cloudera. I have a start script like that: sla:/applvg/home/flink/mvp $ cat run.sh export FLINK_HOME_DIR=/applvg/home/flink/mvp/flink-1.2.0/ export FLINK_JAR_DIR=/applvg/home/flink/mvp/cache export

Re: Storage options for RocksDBStateBackend

2017-05-11 Thread Till Rohrmann
Hi Ayush, you’re right that RocksDB is the recommend state backend because of the above-mentioned reasons. In order to make the recovery properly work, you have to configure a shared directory for the checkpoint data via state.backend.fs.checkpointdir. You can basically configure any file system

Re: JobManager Web UI

2017-05-11 Thread Shannon Carey
Since YARN must support people running multiple Flink clusters, the JobManager web UI binds to an ephemeral port by default (to prevent port usage conflicts). Also, the AM (and web UI) may be placed on any of the Yarn nodes. Therefore, if you wanted to access it directly instead of through the

JobManager Web UI

2017-05-11 Thread Shravan R
I am running flink-1.1.4 on Cloudera distributed Hadoop (Yarn). I am not able to get through JobManager webUI through http://:8081. I am able to get to it through Yarn Running applications ---> application master. My flink-conf.yaml has jobmanager.web.port: 8081. Amy I missing something here? -

Queryable State Client with 1.3.0-rc0

2017-05-11 Thread Fahey, Claudio
I've been using QueryableStateClient in Flink 1.2 successfully. I have now upgraded to release-1.3.0-rc0 and QueryableStateClient now requires a HighAvailabilityServices parameter. The documentation hasn't been updated on using HighAvailabilityServices so I'm a bit lost on what exactly I should

Re: Storage options for RocksDBStateBackend

2017-05-11 Thread Stephan Ewen
Small addition to Till's comment: In the case where file:// points to a mounted distributed file system (NFS, MapRFs, ...), then it actually works. The important thing is that the filesystem where the checkpoints go is replicated (fault tolerant) and accessible from all nodes. On Thu, May 11,

Re: High Availability on Yarn

2017-05-11 Thread Jain, Ankit
Got the answer on #2, looks like that will work, still looking for suggestions on #1. Thanks Ankit From: "Jain, Ankit" Date: Thursday, May 11, 2017 at 8:26 AM To: Stephan Ewen , "user@flink.apache.org" Subject: Re: High