akka timeout with metric fetcher

2017-08-17 Thread Steven Wu
We have set akka.ask.timeout to 60 s in yaml file. I also confirmed the setting in Flink UI. But I saw akka timeout of 10 s for metric query service. two questions 1) why doesn't metric query use the 60 s value configured in yaml file? does it always use default 10 s value? 2) could this cause

akka timeout

2017-08-17 Thread Steven Wu
We have set akka.ask.timeout to 60 s in yaml file. I also confirmed the setting in Flink UI. But I saw akka timeout of 10 s for metric query service. two questions 1) why doesn't metric query use the 60 s value configured in yaml file? does it always use default 10 s value? 2) could this cause

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi Raja, Can you please confirm if I have to use the below settings to ensure I use keytabs?   security.kerberos.login.use-ticket-cache: Indicates whether to read from your Kerberos ticket cache (default: true).   security.kerberos.login.keytab: Absolute path to a Kerberos keytab file that

Re: Efficient grouping and parallelism on skewed data

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi John, Do you need to do any sort of grouping on the keys and aggregation? Or are you simply using Flink to route the Kafka messages to different Elasticsearch indices? For the following I’m assuming the latter: If there’s no need for aggregate computation per key, what you can do is simply

Re:Re:Re:Re:Re:How to verify the data to Elasticsearch whether correct or not ?

2017-08-17 Thread Tzu-Li (Gordon) Tai
Hi, I see what you were asking about now. Yes, it doesn’t make sense to sink an object to Elasticsearch. You either need to transform the object to a JSON using libraries like Protobuf / Jackson / etc., or disintegrate it yourself into a Map. One thing I noticed is: json.put("json",

Efficient grouping and parallelism on skewed data

2017-08-17 Thread Jakes John
Can some one help me in figuring out how to implement in flink. I have to create a pipeline Kafka->flink->elasticsearch. I have high throughput data coming into Kafka. All messages in Kafka have a key called 'id' and value is a integer that ranges 1 to N. N is dynamic with max value as 100. The

Re: Flink CEP questions

2017-08-17 Thread Basanth Gowda
Hi Kostas, For 3 -> I was able to do the following and it worked perfectly fine. Is there a way we could reset? Looks like the following code behaves more like a sliding count. What I want to do is reset the count once the alert has matched, and start over the count. May be I will have to have

Re: Question about Global Windows.

2017-08-17 Thread Steve Jerman
 https://issues.apache.org/jira/browse/FLINK-7473 From: Steve Jerman Sent: Thursday, August 17, 2017 11:34:09 AM To: Nico Kruber; user@flink.apache.org Subject: Re: Question about Global Windows. Thank you Nico. I *think* I should have

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Prabhu V
+1 on the 7 day expiry explanation, This is most likely the cause. I faced the 7 day expiry issue with a previous version of flink that dint support keytabs, I am currently running flink-1.3 with keytabs (it has been going okay for 2 days now), I will update after the 7 day mark. Thanks, Prabhu

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Raja . Aravapalli
Thanks a lot Eron… If I am understanding you correct, you suggest using keytabs to launch streaming applications! Can you please confirm if I have to use the below settings to ensure I use keytabs? * security.kerberos.login.use-ticket-cache: Indicates whether to read from your Kerberos

Re: How to run a flink wordcount program

2017-08-17 Thread Eron Wright
Hello, You can run the WordCount program directly within your IDE. An embedded Flink environment is automatically created. -Eron On Thu, Aug 17, 2017 at 8:03 AM, P. Ramanjaneya Reddy wrote: > Thanks. > But I want to run in debug mode..how to run without jar and using the

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Eron Wright
Raja, According to those configuration values, the delegation token would be automatically renewed every 24 hours, then expire entirely after 7 days. You say that the job ran without issue for 'a few days'. Can we conclude that the job hit the 7-day DT expiration? Flink supports the use of

Re: Question about Global Windows.

2017-08-17 Thread Steve Jerman
Thank you Nico. I *think* I should have one stream per key... the stream I get is pretty fast and there may be some corner cases I'm not aware of. However, I really need to process as a single window per key. I am worried about the cardinality of the key ... I wanted to use a timeout to

Re: [EXTERNAL] Re: Fink application failing with kerberos issue after running successfully without any issues for few days

2017-08-17 Thread Raja . Aravapalli
I don’t have access to the site.xml files, it is controlled by a support team. Does flink has any configuration settings or api’s thru which we can control this ? Regards, Raja. From: Ted Yu Date: Thursday, August 17, 2017 at 11:07 AM To: Raja Aravapalli

Re: How to run a flink wordcount program

2017-08-17 Thread Chao Wang
The following quickstart offers an end-to-end instruction I think: https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/setup_quickstart.html Chao On 08/17/2017 08:25 AM, P. Ramanjaneya Reddy wrote: On Thu, Aug 17, 2017 at 6:42 PM, P. Ramanjaneya Reddy

Avro serialization issue with Kafka09AvroTableSource

2017-08-17 Thread Morrigan Jones
I recently started working on a PoC with Flink 1.3 that connects to our Kafka cluster and pulls Avro data. Here's the code: ConsoleAppendStreamTableSink is just a simple TableSink I created while looking at the different table sinks types. It just calls print() on the DataStream. Ping is an

Flink workers OOM Stream2Batch application

2017-08-17 Thread Javier Lopez
Hi all, One of our use cases implies to do some Stream2Batch processing. We are using Flink to read from a streaming source and deliver files to S3, after applying some transformation to the stream. These Flink jobs are not running 24/7, they are running on demand and consume a finite number of

Flink CEP questions

2017-08-17 Thread Basanth Gowda
All, New to Flink and more so with Flink CEP. I want to write a sample program that does the following : Lets suppose data cpu usage of a given server. 1. Want to Alert when CPU usage is above or below certain value 2. Want to Alert when CPU usage falls in a range 3. Want to Alert

How to run a flink wordcount program

2017-08-17 Thread P. Ramanjaneya Reddy
On Thu, Aug 17, 2017 at 6:42 PM, P. Ramanjaneya Reddy wrote: > Hi ALL, > > I'm new to flink and understanding the flink wordcount program. > > so downloaded the git hub > https://github.com/apache/flink.git > > Can somebody help how to run wordcount example? > > Thanks >

Re: Change state backend.

2017-08-17 Thread Ted Yu
bq. we add the key-group to the heap format (1-2 bytes extra per key). This seems to be better choice among the two. bq. change the heap backends to write in the same way as RocksDB +1 on above. These two combined would give users flexibility in state backend migration. On Thu, Aug 17, 2017

Re: Why ListState of flink don't support update?

2017-08-17 Thread Stefan Richter
Hi, this is because the list state is intended to be append only. The underlying reason is that this allows certain optimizations in the underlying datastructures. For example, a list state for the RocksDB backend can make use of RocksDB’s merge operator and does not require a full rewrite to

Re: Change state backend.

2017-08-17 Thread Stefan Richter
This is not possible out of the box. Historically, the checkpoint/savepoint formats have been different between heap based and RocksDB based backends. We have already eliminated most differences in 1.3. However, there are two problems remaining. The first problem is just how the number of

RE: Great number of jobs and numberOfBuffers

2017-08-17 Thread Gwenhael Pasquiers
Hello, This bug was met in flink 1.0.1 over yarn (maybe the yarn behavior is different ?). We've been having this issue for a long time and we were careful not to schedule too many jobs. I'm currently upgrading the application towards flink 1.2.1 and I'd like to try to solve this issue. I'm

Re: Great number of jobs and numberOfBuffers

2017-08-17 Thread Ufuk Celebi
PS: Also pulling in Nico (CC'd) who is working on the network stack. On Thu, Aug 17, 2017 at 11:23 AM, Ufuk Celebi wrote: > Hey Gwenhael, > > the network buffers are recycled automatically after a job terminates. > If this does not happen, it would be quite a major bug. > > To

Re: Great number of jobs and numberOfBuffers

2017-08-17 Thread Ufuk Celebi
Hey Gwenhael, the network buffers are recycled automatically after a job terminates. If this does not happen, it would be quite a major bug. To help debug this: - Which version of Flink are you using? - Does the job fail immediately after submission or later during execution? - Is the following

Query Rpc address/port remotely?

2017-08-17 Thread Francisco Gonzalez Barea
Hey guys! I´ve got a new question. Having a Flink v1.3.0 running on Mesos, is there any remotely way (rest, or another) to query which is the rpc address and port to connect that Flink via akka? We´ve been taking a look at the rest API but the config endpoint doesn’t seem to provide this

Re: Change state backend.

2017-08-17 Thread Biplob Biswas
I am not really sure you can do that out of the box, if not, indeed that should be possible in the near future. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#handling-serializer-upgrades-and-compatibility There are already plans for state migration (with

Great number of jobs and numberOfBuffers

2017-08-17 Thread Gwenhael Pasquiers
Hello, We're meeting a limit with the numberOfBuffers. In a quite complex job we do a lot of operations, with a lot of operators, on a lot of folders (datehours). In order to split the job into smaller "batches" (to limit the necessary "numberOfBuffers") I've done a loop over the batches

Unexpected behaviour of a periodic trigger.

2017-08-17 Thread Tomasz Dobrzycki
Hi, I'm working on a custom trigger that is supposed to trigger periodically and at the end of session window. These are the main methods from my trigger: public TriggerResult onElement(Object element, long timestamp, TimeWindow window, TriggerContext ctx) throws Exception { long currentTime

Re: Reload DistributedCache file?

2017-08-17 Thread Conrad Crampton
Whilst this looks great and thanks for pointing this out, the problem isn’t so much as with identifying changes in the HDFS file as the overhead in re-loading it isn’t significant (10s of lines long) but the problem is more to do with using the Flink mechanisms i.e. DistributedCache and