Couldn't figure out - How to do this in Flink? - Pls assist with suggestions

2019-02-07 Thread Titus Rakkesh
Dears, I have a data stream continuously coming, DataStream> splitZTuple; Eg - (775168263,113182,0.0) I have to store this for 24 hrs expiry in somewhere (Window or somewhere) to check against another stream. The second stream is DataStream> splittedVomsTuple which also continuously

Re: MapState - TypeSerializer

2019-02-07 Thread Alexey Trenikhun
But it will be two TypeSerializerConfigSnapshots, otherwise unclear how TypeSerializer2 will able to check compatibility? Thanks, Alexey From: Congxian Qiu Sent: Thursday, February 7, 2019 8:14 PM To: Alexey Trenikhun Cc: user@flink.apache.org Subject: Re:

Re: MapState - TypeSerializer

2019-02-07 Thread Congxian Qiu
Hi, Alexey In your case, only TypeSerializer2 will be stored in meta information. and TypeSerializer2 and TypeSeriaizer1 have to be compatible. Best, Congxian Alexey Trenikhun 于2019年2月8日周五 上午10:39写道: > What if I’m using RocksDB, and MapState had single entry and > TypeSerializer1, then we

Re: MapState - TypeSerializer

2019-02-07 Thread Alexey Trenikhun
What if I’m using RocksDB, and MapState had single entry and TypeSerializer1, then we take save point upgrade job (TypeSerializer2), put new entry, at that point we have two entries written by different serializers, so both TypeSerializers should be stored in meta information? Thanks, Alexey

Flink Job and Watermarking

2019-02-07 Thread Kaustubh Rudrawar
Hi, I'm writing a job that wants to make an HTTP request once a watermark has reached all tasks of an operator. It would be great if this could be determined from outside the Flink job, but I don't think it's possible to access watermark information for the job as a whole. Below is a workaround

stream of large objects

2019-02-07 Thread Aggarwal, Ajay
In my use case my source stream contain small size messages, but as part of flink processing I will be aggregating them into large messages and further processing will happen on these large messages. The structure of this large message will be something like this: Class LargeMessage {

Re: [External] Re: Flink and S3 AWS keys rotation

2019-02-07 Thread Antonio Verardi
Hi Bruno, The problem with such a solution would be that those permissions will apply to any application running on the Kubernetes cluster, not only to Flink. Sharing resources with other applications is one of the cool things about Kubernetes and it would be ideal not to lose such a property.

Re: Flink and S3 AWS keys rotation

2019-02-07 Thread Bruno Aranda
Hi, You can give specific IAM instance roles to the instances running Flink. This way you never expose access keys anywhere. As the docs say, that is the recommended way (and not just for Flink, but for any service you want to use, never set it up with AWS credentials in config). IAM will

Re: Exactly Once Guarantees with StreamingFileSink to S3

2019-02-07 Thread Kostas Kloudas
No problem! On Wed, Feb 6, 2019 at 6:38 PM Kaustubh Rudrawar wrote: > Hi Kostas, > > Thanks for the response! Yes - I see the commitAfterRecovery being called > when a Bucket is restored. I confused myself in thinking that > 'onSuccessfulCompletionOfCheckpoint' is called on restore as well,

Re: Avro serialization and deserialization to Kafka in Scala

2019-02-07 Thread Kostas Kloudas
Hi Wouter, I think Gordon or Igal are the best to answer this question. Cheers, Kostas On Thu, Feb 7, 2019 at 11:04 AM Wouter Zorgdrager wrote: > Hello all, > > > I saw the recent updates in Flink related to supporting Avro schema > evolution in state. I'm curious how Flink handles this

Re: Flink and S3 AWS keys rotation

2019-02-07 Thread Kostas Kloudas
Hi Antonio, I am cc'ing Till who may have something to say on this. Cheers, Kostas On Thu, Feb 7, 2019 at 1:32 PM Antonio Verardi wrote: > Hi there, > > I'm trying out to run Flink on Kubernetes and I run into a problem with > the way Flink sets up AWS credentials to talk with S3 and the way

Flink and S3 AWS keys rotation

2019-02-07 Thread Antonio Verardi
Hi there, I'm trying out to run Flink on Kubernetes and I run into a problem with the way Flink sets up AWS credentials to talk with S3 and the way we manage AWS secrets in my company. To give permissions to Flink I am using AWS keys embedded in flink.conf, as per

Re: How to use ExceptionUtils.findThrowable() in Scala

2019-02-07 Thread Averell
P/S: This is the full stack trace 2019-02-07 01:53:12.790 [I/O dispatcher 16] ERROR o.a.f.s.connectors.elasticsearch.ElasticsearchSinkBase - Failed Elasticsearch item request: [...][[...][1]] ElasticsearchException[Elasticsearch exception [type=version_conflict_engine_exception,

Re: No resource available error while testing HA

2019-02-07 Thread Averell
Hi Gary, I am trying to reproduce that problem. BTW, is that possible to change log level (I'm using logback) for a running job? Thanks and regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

How to use ExceptionUtils.findThrowable() in Scala

2019-02-07 Thread Averell
Hello, I am trying to implement error handling in ElasticSearch sink (following the seem-outdated Flink document [1]) override def onFailure(actionRequest: ActionRequest, failure: Throwable, restStatusCode: Int, indexer: RequestIndexer): Unit = { if

Avro serialization and deserialization to Kafka in Scala

2019-02-07 Thread Wouter Zorgdrager
Hello all, I saw the recent updates in Flink related to supporting Avro schema evolution in state. I'm curious how Flink handles this internally for Scala case classes. I'm working on custom (de-)serialization schema's to write and read from Kafka. However, I'm currently stuck because of the

Re: graceful shutdown of taskmanager

2019-02-07 Thread Till Rohrmann
Hi Bernd, at the moment this is not supported out of the box by Flink. What you can do is the following: First cancel a job with savepoint. After the job has been terminated, terminate TaskManagers and then resume the job from the savepoint you've just taken. This assumes that you have a single

Re: Running JobManager as Deployment instead of Job

2019-02-07 Thread Till Rohrmann
Hi Sergey, the rationale why we are using a K8s job instead of a deployment is that a Flink job cluster should terminate after it has successfully executed the Flink job. This is unlike a session cluster which should run forever and for which a K8s deployment would be better suited. If in your