Re: Logging Kafka during exceptions

2018-11-21 Thread Scott Sue
Yeah I think that would work for incorrect data consumed, but not for if deserialization passes correctly, but one of my custom functions post deserialization generates an error? Regards, Scott SCOTT SUE CHIEF TECHNOLOGY OFFICER Support Line : +44(0) 2031 371 603 Mobile : +852 9611 3969

Re: Metric on JobManager

2018-11-21 Thread bastien dine
Hi Jamie, thanks for your reponse.. Erm this will not be easy.. any idea on how to deal with the end time ? I can have some Runtime exception in my topology, so i would like to do it like : try { // Start time here env.execute() } catch (e: Expcetion) { } finally { // End time here }

Re: Logging Kafka during exceptions

2018-11-21 Thread miki haiat
If so , then you can implement your own deserializer[1] with costume logic and error handling 1. https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchemaWrapper.html On Thu, Nov 22, 2018 at 8:57 AM Scott Sue

Re: Logging Kafka during exceptions

2018-11-21 Thread Scott Sue
Hi Paul, Yes correct, Flink shouldn’t make any assumptions on what is inside the user function. That is true, the exception may not be from a direct result of unexpected data, but the incoming data coupled by the state of the job is causing unexpected behaviour. From my perspective, I

Re: Logging Kafka during exceptions

2018-11-21 Thread Paul Lam
Hi Scott, IMHO, the exception is caused by the user codes so it should be handled by the user function, and Flink runtime shouldn’t make any assumption about what’s happening in the user function. The exception may or may not be caused by unexpected data, so logging the current processing

Flink restart strategy on specific exception

2018-11-21 Thread Ali, Kasif
Hello, Looking at existing restart strategies they are kind of generic. We have a requirement to restart the job only in case of specific exception/issues. What would be the best way to have a re start strategy which is based on few rules like looking at particular type of exception or some

OutOfMemoryError while doing join operation in flink

2018-11-21 Thread Akshay Mendole
Hi, We are converting one of our pig pipelines to flink using apache beam. The pig pipeline reads two different data sets (R1 & R2) from hdfs, enriches them, joins them and dumps back to hdfs. The data set R1 is skewed. In a sense, it has few keys with lot of records. When we converted the

Re: Logging Kafka during exceptions

2018-11-21 Thread Scott Sue
Hi Paul, Thanks for the quick reply. Ok does that mean that as general practice, I should be catching all exceptions for the purpose of logging in any of my Operators? This seems like something that could be handled by Flink itself as they are unexpected exceptions. Otherwise every single

Re: Logging Kafka during exceptions

2018-11-21 Thread Paul Lam
Hi Scott, I think you can do it by catching the exception in the user function and log the current message that the operator is processing before re-throwing (or not) the exception to Flink runtime. Best, Paul Lam > 在 2018年11月22日,12:59,Scott Sue 写道: > > Hi all, > > When I'm running my

Logging Kafka during exceptions

2018-11-21 Thread Scott Sue
Hi all, When I'm running my jobs I am consuming data from Kafka to process in my job. Unfortunately my job receives unexpected data from time to time which I'm trying to find the root cause of the issue. Ideally, I want to be able to have a way to know when the job has failed due to an

Re: your advice please regarding state

2018-11-21 Thread Avi Levi
Thanks a lot! got it :) On Wed, Nov 21, 2018 at 11:40 PM Jamie Grier wrote: > Hi Avi, > > The typical approach would be as you've described in #1. #2 is not > necessary -- #1 is already doing basically exactly that. > > -Jamie > > > On Wed, Nov 21, 2018 at 3:36 AM Avi Levi wrote: > >> Hi ,

Re: TaskManager & task slots

2018-11-21 Thread yinhua.dai
OK, thanks. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink stream with RabbitMQ source: Set max "request" message amount

2018-11-21 Thread vino yang
Hi Marke, AFAIK, you can set *basic.qos* to limit the consumption rate, please read this answer.[1] I am not sure if Flink RabbitMQ connector lets you set this property. You can check it. Thanks, vino. [1]:

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Rong Rong
Hi Vishal, You can probably try using similar offset configuration as a service consumer. Maybe this will be useful to look at [1] Thanks, Rong [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-consumers-start-position-configuration On Wed, Nov 21, 2018

Re: backpressure metrics

2018-11-21 Thread Steven Wu
Nargarjun, thanks a lot for the reply, which makes sense to me. Yes, we are running with AT_LEAST_ONCE mode. On Wed, Nov 21, 2018 at 3:19 PM Nagarjun Guraja wrote: > Hi Steven, > > The metric 'Buffered During Alignment' you are talking about will always > be zero when the job is run in

Re: backpressure metrics

2018-11-21 Thread Nagarjun Guraja
Hi Steven, The metric 'Buffered During Alignment' you are talking about will always be zero when the job is run in ATLEAST_ONCE mode. Is that the case with your job? My understanding is, backpressure can only be monitored by sampling thread stacktraces and interpreting the situation based on the

backpressure metrics

2018-11-21 Thread Steven Wu
Flink has two backpressure related metrics: “lastCheckpointAlignmentBuffered” and “checkpointAlignmentTime”. But they seems to always report zero. Similar thing in web UI, “Buffered During Alignment” always shows zero, even backpressure testing shows high backpressure for some operators. Has

Re: Metric on JobManager

2018-11-21 Thread Jamie Grier
What you're describing is not possible. There is no runtime context or metrics you can use at that point. The best you can probably do (at least for start time) is just keep a flag in your function and log a metric once and only once when it first starts executing. On Wed, Nov 21, 2018 at 5:18

Re: your advice please regarding state

2018-11-21 Thread Jamie Grier
Hi Avi, The typical approach would be as you've described in #1. #2 is not necessary -- #1 is already doing basically exactly that. -Jamie On Wed, Nov 21, 2018 at 3:36 AM Avi Levi wrote: > Hi , > I am very new to flink so please be gentle :) > > *The challenge:* > I have a road sensor that

Re: Assign IDs to Operators

2018-11-21 Thread Jamie Grier
Hi Chang, The partitioning steps, like keyBy() are not operators. In general you can let Flink's fluent-style API tell you the answer. If you can call .uid() in the API and it compiles then the thing just before that is an operator ;) -Jamie On Wed, Nov 21, 2018 at 5:59 AM Chang Liu wrote:

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Jamie Grier
Hi Vishal, No, there is no way to do this currently. On Wed, Nov 21, 2018 at 10:22 AM Vishal Santoshi wrote: > Any one ? > > On Tue, Nov 20, 2018 at 12:48 PM Vishal Santoshi < > vishal.santo...@gmail.com> wrote: > >> Is it possible to have checkpointing but reset the kafka offsets to >>

Re: Reset kafka offets to latest on restart

2018-11-21 Thread Vishal Santoshi
Any one ? On Tue, Nov 20, 2018 at 12:48 PM Vishal Santoshi wrote: > Is it possible to have checkpointing but reset the kafka offsets to > latest on restart on failure ? >

Re: IntervalJoin is stucked in rocksdb'seek for too long time in flink-1.6.2

2018-11-21 Thread Xingcan Cui
Hi Jiangang, The IntervalJoin is actually the DataStream-level implementation of the SQL time-windowed join[1]. To ensure the completeness of the join results, we have to cache all the records (from both sides) in the most recent time interval. That may lead to state backend problems when

Assign IDs to Operators

2018-11-21 Thread Chang Liu
Dear All, As stated here (https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html ), it is highly recommended to assign IDs to Operators, especially for the stateful ones. My question

Metric on JobManager

2018-11-21 Thread bastien dine
Hello all, I am using metric to count some sutff in my topology, this is pretty easy with the metric API in getRuntimeContext in a Rich function However I would like to use this metric API to log start date & end date of my processing, but in the source code executed on the job manager (i.e not

Re: Can JDBCSinkFunction support exectly once?

2018-11-21 Thread Jocean shi
Hi, Thanks for help. I find that many sink dont't support stream exectly-once. I need use exectly-once Hbase sink and Mysql sink in my work. I try to contribute to Hbase Sink firstly Best Regards, Jocean Fabian Hueske 于2018年11月21日周三 下午6:34写道: > Hi, > > JDBCSinkFunction is a simple wrapper

your advice please regarding state

2018-11-21 Thread Avi Levi
Hi , I am very new to flink so please be gentle :) *The challenge:* I have a road sensor that should scan billons of cars per day. for starter I want to recognise if each car that passes by is new or not. new cars (never been seen before by that sensor ) will be placed on a different topic on

Re: Flink streaming automatic scaling (1.6.1)

2018-11-21 Thread Fabian Hueske
Hi, Flink 1.6 does not support automatic scaling. However, there is a REST call to trigger the rescaling of a job. You need to call it manually though. Have a look at the */jobs/:jobid/rescaling *call in the REST docs [1]. Best, Fabian [1]

Re: Store Predicate or any lambda in MapState

2018-11-21 Thread Jayant Ameta
Here are the error logs. First error log was encountered when getting the values from the MapState. java.lang.ClassNotFoundException: com.test.MatcherFactory$$Lambda$879.1452224137 at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at

Re: Can JDBCSinkFunction support exectly once?

2018-11-21 Thread Fabian Hueske
Hi, JDBCSinkFunction is a simple wrapper around the JDBCOutputFormat (the DataSet / Batch API output interface). Dominik is right, that JDBCSinkFunction does not support exactly-once output. It is not strictly required that an exactly-once sink implements TwoPhaseCommitFunction. TPCF is a

Re: About the issue caused by flink's dependency jar package submission method

2018-11-21 Thread Oscar Westra van Holthe - Kind
On Wed, 21 Nov 2018 at 05:26, clay wrote: > hi > > I have checked all the dependences, and don't find the jar with different > version, so ,I double the way to submit jar has some issue? my commend is > like this: > Did you also check the runtime dependencies where the code is run? Because

Re: Can JDBCSinkFunction support exectly once?

2018-11-21 Thread Dominik Wosiński
Hey, As far as I know, the function needs to implement the *TwoPhaseCommitFunction* and not the *CheckpointListener. JDBCSinkFunction *does not implement the two-phase commit, so currently it does not support exactly once. Best Regards, Dom. śr., 21 lis 2018 o 11:07 Jocean shi napisał(a): >

Re: BucketingSink vs StreamingFileSink

2018-11-21 Thread Edward Alexander Rojas Clavijo
Thank you very much for the information Andrey. I'll try on my side to do the migration of what we have now and try to add the sink with Parquet and I'll be back to you if I have more questions :) Edward El vie., 16 nov. 2018 a las 19:54, Andrey Zagrebin (< and...@data-artisans.com>) escribió:

Flink streaming automatic scaling (1.6.1)

2018-11-21 Thread Marke Builder
Hi, I tried to found somethink about "flink automatic scaling", is that available and how does it work? There are any documentation or other resources? And especially how it works with YARN. We are using flink1.61 with yarn , the parameters are still relevant (-yn, -ys)? Thanks for your

Can JDBCSinkFunction support exectly once?

2018-11-21 Thread Jocean shi
Hi, Can JDBCSinkFunction support exectly once? Is it that The JDBCSinkFunction dont't implement CheckpointListener meaning JDBCSinkFunction dont't support exectly once? cheers Jocean

Re: TaskManager & task slots

2018-11-21 Thread Fabian Hueske
Yes, this hasn't changed. Best, Fabain Am Mi., 21. Nov. 2018 um 08:18 Uhr schrieb yinhua.dai < yinhua.2...@outlook.com>: > Hi Fabian, > > Is below description still remain the same in Flink 1.6? > > Slots do not guard CPU time, IO, or JVM memory. At the moment they only > isolate managed memory

Re: Store Predicate or any lambda in MapState

2018-11-21 Thread Dominik Wosiński
Hey Jayant, I don't really think that the sole fact of using Predicate should cause the *ClassNotFoundException* that You are talking about. The exception may come from the fact that some libraries are missing from Your cluster environment. Have You tried running the job locally to verify that

Re: Exception occurred while processing valve output watermark & NullPointerException

2018-11-21 Thread Dawid Wysakowicz
Hi, I think vino is right. It seems that the NullPointerException comes from your condition. Please add handling of the situation when the string that you are comparing is null. Best, Dawid On 21/11/2018 04:32, vino yang wrote: > Hi Steve, > > It seems the NPE caused by the property of the

Flink stream with RabbitMQ source: Set max "request" message amount

2018-11-21 Thread Marke Builder
Hi, we are using rabbitmq queue as streaming source. Sometimes (if the queue contains a lot of messages) we get the follow ERROR: ERROR org.apache.hadoop.yarn.client.api.impl.NMClientImpl - Failed to stop Container container_1541828054499_0284_01_04when stopping NMClientImpl and

Re: About the issue caused by flink's dependency jar package submission method

2018-11-21 Thread clay4444
hi yinhua, I consirdered about that way,but I don't think that way is suitable , because I want each flink job has its own business logic and dependence jar ,separate from other job, that's what I want to do, -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Kafka: Unable to retrieve any partitions with KafkaTopicsDescriptor

2018-11-21 Thread Abhijeet Kumar
Hello Team, I have written a streaming job that takes data from Kafka. I've noticed for some Kafka topics things are working fine but, with the few topics when I'm trying to get the data it's throwing error: Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: