Re: Question about flink checkpoint

2018-02-07 Thread Chengzhi Zhao
Thanks, Fabian, I opened an JIRA ticket and I'd like to work on it if people think this would be a improvement: https://issues.apache.org/jira/browse/FLINK-8599 Best, Chengzhi On Wed, Feb 7, 2018 at 4:17 AM, Fabian Hueske wrote: > Hi Chengzhi Zhao, > > I think this is

RE: S3 for state backend in Flink 1.4.0

2018-02-07 Thread Marchant, Hayden
WE actually got it working. Essentially, it's an implementation of HadoopFilesytem, and was written with the idea that it can be used with Spark (since it has broader adoption than Flink as of now). We managed to get it configured, and found the latency to be much lower than by using the s3

RE: Latest version of Kafka

2018-02-07 Thread Marchant, Hayden
Thanks for the info! -Original Message- From: Piotr Nowojski [mailto:pi...@data-artisans.com] Sent: Friday, February 02, 2018 4:37 PM To: Marchant, Hayden [ICG-IT] Cc: user@flink.apache.org Subject: Re: Latest version of Kafka Hi, Flink as for now provides

Re: How to handle multiple yarn sessions and choose at runtime the one to submit a ha streaming job ?

2018-02-07 Thread Chesnay Schepler
For future reference, the created JIRA: https://issues.apache.org/jira/browse/FLINK-8580 On 07.02.2018 10:48, LINZ, Arnaud wrote: Hi, Without any other solution, I made a shell script that copies the original content of FLINK_CONF_DIR in a temporary rep, modify flink-conf.yaml to set

Re: Flink CEP with files and no streams?

2018-02-07 Thread Fabian Hueske
Hi, I'm not aware of a good example but I can give you some pointers. - Implement the SourceFunction interface. This function will not be executed in parallel, so you don't have to worry about parallelism. - Since you said, you want to run it as a batch job, you might not need to implement

RE: kafka as recovery only source

2018-02-07 Thread Sofer, Tovi
Hi Fabian, Thank you for the suggestion. We will consider it. Would be glad to hear other ideas how to handle such requirement. Thanks again, Tovi From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: יום ד 07 פברואר 2018 11:47 To: Sofer, Tovi [ICG-IT] Cc:

Re: Classloader leak with Kafka Mbeans / JMX Reporter

2018-02-07 Thread Chesnay Schepler
This could be a bug in Kafkas JmxReporter class: https://issues.apache.org/jira/browse/KAFKA-6307 On 07.02.2018 13:37, Edward wrote: We are using FlinkKafkaConsumer011 and FlinkKafkaProducer011, but we also experienced the same behavior with FlinkKafkaConsumer010 and FlinkKafkaProducer010.

RE: Flink CEP with files and no streams?

2018-02-07 Thread Esa Heikkinen
Hi Thanks for the reply, but because I am a newbie with Flink, do you have any good Scala code examples about this ? Esa From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Wednesday, February 7, 2018 11:21 AM To: Esa Heikkinen Cc: user@flink.apache.org Subject:

Re: Classloader leak with Kafka Mbeans / JMX Reporter

2018-02-07 Thread Edward
We are using FlinkKafkaConsumer011 and FlinkKafkaProducer011, but we also experienced the same behavior with FlinkKafkaConsumer010 and FlinkKafkaProducer010. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Rebalance to subtasks in same TaskManager instance

2018-02-07 Thread johannes.barn...@clarivate.com
Hi Piotrek Yes I've compared rebalance with rescale. I adjusted the parallelism of the source and target operators so that rescale would behave more or less like the "local or shuffle grouping" option. I was able to show that for my use case a "local or shuffle grouping" option would yield at

Re: Kafka and parallelism

2018-02-07 Thread Christophe Jolif
Ok thanks! I should have seen this. Sorry. -- Christophe On Wed, Feb 7, 2018 at 10:27 AM, Tzu-Li (Gordon) Tai wrote: > Hi Christophe, > > Yes, you can achieve writing to different topics per-message using the > `KeyedSerializationSchema` provided to the Kafka producer. >

RE: How to handle multiple yarn sessions and choose at runtime the one to submit a ha streaming job ?

2018-02-07 Thread LINZ, Arnaud
Hi, Without any other solution, I made a shell script that copies the original content of FLINK_CONF_DIR in a temporary rep, modify flink-conf.yaml to set yarn.properties-file.location, and change FLINK_CONF_DIR to that temp rep before executing flink. I am now able to select the container I

Re: kafka as recovery only source

2018-02-07 Thread Fabian Hueske
Hi Tovi, I've been thinking about this idea. It might be possible, but I think you have to implement a custom source for this. I don't think it would work to have the JMSConsumer, KafkaSink, and RecoverySource in separate operators because otherwise it would not be possible to share the Kafka

Re: Triggering a Savepoint

2018-02-07 Thread Fabian Hueske
Hi Gregory, IMO, that would be a viable approach. You have to ensure that all operators (except the sources) have the same UIDs and state types but I guess you don't want to change the application logic and just replace the sources. What might be tricky is to perform the savepoint at the right

Re: Kafka and parallelism

2018-02-07 Thread Tzu-Li (Gordon) Tai
Hi Christophe, Yes, you can achieve writing to different topics per-message using the `KeyedSerializationSchema` provided to the Kafka producer. The schema interface has a `getTargetTopic` method which allows you to override the default target topic for a given record. I agree that the method

Re: Spurious warning in logs about flink-queryable-state-runtime

2018-02-07 Thread Chesnay Schepler
This is expected behavior; we try to load the queryable state classes via reflection as it is an optional feature. I'll open a jira to make it less verbose if the classes cannot be found, in which case the stacktrace isn't particularly interesting anyway. On 05.02.2018 10:18, Fabian Hueske

Re: Flink CEP with files and no streams?

2018-02-07 Thread Fabian Hueske
Hi Esa, you can also read files as a stream. However, you have to be careful in which order you read the files and how you generate watermarks. The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks. Things become more

Re: Question about flink checkpoint

2018-02-07 Thread Fabian Hueske
Hi Chengzhi Zhao, I think this is rather an issue with the ContinuousFileReaderOperator than with the checkpointing algorithm in general. A source can decide which information to store as state and also how to handle failures such as file paths that have been put into state but have been removed

Re: Kafka and parallelism

2018-02-07 Thread Christophe Jolif
Hi Gordon, or anyone else reading this, Still on this idea that I consume a Kafka topic pattern. I want to then to sink the result of the processing in a set of topics depending on from where the original message came from (i.e. if this comes from origin-topic-1 I will serialize the result in

Flink CEP with files and no streams?

2018-02-07 Thread Esa Heikkinen
Hello I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime). Is that possible ? If yes, do you know examples Scala codes about that ? Or should I convert the log files (with time stamps) into streams ? But how to handle time stamps in Flink ? If I can