Guide on writing Flink plugins

2020-10-05 Thread Kien Truong
Hi all, We want to write a Flink plugins to integrate Flink jobs with our in-house monitoring system. Are there any guide or tutorial that we can follow to write a Flink plugins ? The official documents are a bit bare bone. Regards, Kien

Flink on YARN: delegation token expired prevent job restart

2020-11-16 Thread Kien Truong
Hi all, We are having an issue where Flink Application Master is unable to automatically restart Flink job after its delegation token has expired. We are using Flink 1.11 with YARN 3.1.1 in single job per yarn-cluster mode. We have also add valid keytab configuration and taskmanagers are able to

Re: Flink on YARN: delegation token expired prevent job restart

2020-11-17 Thread Kien Truong
nfig the "security.kerberos.login.principal" and the "security.kerberos.login.keytab" together? If you only set the keytab, it will not take effect. Best, Yangze Guo On Tue, Nov 17, 2020 at 3:03 PM Kien Truong wrote: > > Hi all, > > We are having an issue where Flink Application Master is unab

Re: Flink on YARN: delegation token expired prevent job restart

2020-11-17 Thread Kien Truong
> AFAIK, Flink does exclude the HDFS_DELEGATION_TOKEN in the > HadoopModule when user provides the keytab and principal. I'll try to > do a deeper investigation to figure out is there any HDFS access > before the HadoopModule installed. > > Best, > Yangze Guo > > &

Re: How to get past "bad" Kafka message, restart, keep state

2018-06-20 Thread Kien Truong
Hi, You can use FlatMap instead of Map, and only collect valid elements. Regards, Kien On 6/20/2018 7:57 AM, chrisr123 wrote: First time I'm trying to get this to work so bear with me. I'm trying to learn checkpointing with Kafka and handling "bad" messages, restarting without losing state.

Re: How to clear keyed states periodically?

2018-09-14 Thread Kien Truong
Hi Paul, We have actually done something like this. Clearing a state with rocksdb state backend can actually be a very expensive operation, and block the operators for minutes with large states. To mitigate that, there are 2 approaches that we are using 1. Keeping the state small by increasing t

Flink does not checkpoint if operator is in Finished state

2018-10-15 Thread Kien Truong
Hi, As mentioned in the title, my Flink job will not check point if there are any finished source operator. Is this a bug or is it working as intended ? Regards, Kien

Re: Error with custom `SourceFunction` wrapping `InputFormatSourceFunction` (Flink 1.6)

2018-10-24 Thread Kien Truong
Hi, Since InputFormatSourceFunction is a subclass of RichParallelSourceFunction, your wrapper should also extend this class. In addition, remember to overwrite the methods defined in the AbstractRichFunction interface and proxy the call to the underlying InputFormatSourceFunction, in order

Re: Checkpoint acknowledge takes too long

2018-10-24 Thread Kien Truong
Hi, In my experience, this is most likely due to one sub-task is blocked doing some long-running operation. Try to run the task manager with some profiler (like VisualVM) and check for hot spot. Regards, Kien On 10/24/2018 4:02 PM, 徐涛 wrote: Hi I am running a flink application w

Re: Size of Checkpoints increasing with time

2018-10-24 Thread Kien Truong
Hi, Do you use incremental checkpoint ? RocksDB is an append-only DB, so you will experience the steady increase in state size until a compaction occurs and old values of keys are garbage-collected. However, the average state size should stabilize after a while, if the load doesn't change.

Re: Flink Task Allocation on Nodes

2018-10-24 Thread Kien Truong
Hi, How are your task managers deploy ? If you cluster only have one task manager with one slot in each node, then the job should be spread evenly. Regards, Kien On 10/24/2018 4:35 PM, Sayat Satybaldiyev wrote: Is there any way to indicate flink not to allocate all parallel tasks on one no

Re: Flink Task Allocation on Nodes

2018-10-24 Thread Kien Truong
case it won't have hardware resource for other jobs. On Wed, Oct 24, 2018 at 2:20 PM Kien Truong wrote: > Hi, > > How are your task managers deploy ? > > If you cluster only have one task manager with one slot in each node, > then the job should be spread evenly. > >

Re: Flink Task Allocation on Nodes

2018-10-26 Thread Kien Truong
o allocate for each job a separate cluster? On Wed, Oct 24, 2018 at 3:23 PM Kien Truong <mailto:duckientru...@gmail.com>> wrote: Hi, You can have multiple Flink clusters on the same set of physical machines. In our experience, it's best to deploy a separate Flink

Re: When can the savepoint directory be deleted?

2019-01-23 Thread Kien Truong
Hi, As of Flink 1.7, the savepoint should not be deleted until after the first checkpoint has been successfully taken. https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html#savepoints-being-used-for-recovery Regards, Kien On 1/23/2019 6:57 PM, Ben Yan w

Re: How to trigger a Global Window with a different Message from the window message

2019-01-23 Thread Kien Truong
Hi Oliver, Try replacing Global Window with a KeyedProcessFunction. Store all the item received between CalcStart and CalcEnd inside a ListState the process them when CalcEnd is received. Regards, Kien On 1/17/2019 1:06 AM, Oliver Buckley-Salmon wrote: Hi, I have a Flink job where I rec

Re: [Flink 1.6] How to get current total number of processed events

2019-01-23 Thread Kien Truong
Hi Nhan, Logically, the total number of processed events before an event cannot be accurately calculated unless events processing are synchronized. This is not scalable, so naturally I don't think Flink supports it. Although, I suppose you can get an approximate count by using a non-keyed Tu

Re: [Flink 1.6] How to get current total number of processed events

2019-01-24 Thread Kien Truong
, Thanh-Nhan Vo wrote: Hi Kien Truong, Thank you for your answer. I have another question, please ! If I count the number of messages processed for a given key j (denoted c_j), is there a way to retrieve max{c_j}, min{c_j}? Thanks *De :*Kien Truong [mailto:duckientru...@gmail.com] *Envoyé

Re: [Flink 1.6] How to get current total number of processed events

2019-01-25 Thread Kien Truong
7, v7), I expect to obtain: oMaximum number of processed messages: 3 (corresponding to key k1) oMinimum number of processed messages: 1 (corresponding to keys 4 and 7) Do you have any idea to obtain this, please? Thank you so much ! Nhan *De :* Kien Truong [mailto:duckientru...@gmail.com] *E

Watermark on connected stream

2017-10-19 Thread Kien Truong
Hi, If I connect two stream with different watermark, how are the watermark of the resulting stream determined ? Best regards, Kien

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Kien Truong
Hi, For batch API, you can use GroupReduceFunction, which give you the same benefit as a MapReduce combiner. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/batch/dataset_transformations.html#combinable-groupreducefunctions Regards, Kien On 10/26/2017 7:37 PM, Le Xu wrote:

Re: Local combiner on each mapper in Flink

2017-10-26 Thread Kien Truong
Hi, For Streaming API, use a ProcessFunction as Fabian's suggestion. You can pretty much do anything with a ProcessFunction :) Best regards, Kien On 10/26/2017 8:01 PM, Le Xu wrote: Hi Kien: Is there a similar API for DataStream as well? Thanks! Le On Oct 26, 2017, at 7:58 AM,

Re: Flink memory usage

2017-11-04 Thread Kien Truong
Hi, How did you measure the memory usage ? JVM processes tend to occupy the maximum memory allocated to them, regardless of whether those memory are actively in used or not. To correctly measure the memory usage, you should use Flink's metric system[1] Regards, Kien [1] https://ci.apache.

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

2017-11-11 Thread Kien Truong
Hi Mans, They're not executed in the same thread, but the methods that called them are synchronized[1] and therefore thread-safe. Best regards, Kien [1] https://github.com/apache/flink/blob/1cd3ba3f2af454bc33f2c880163c014d1738/flink-streaming-java/src/main/java/org/apache/flink/streamin

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

2017-11-12 Thread Kien Truong
any other recommendation. Thanks again. On Sunday, November 12, 2017 1:16 AM, Jörn Franke wrote: Be careful though with racing conditions . On 12. Nov 2017, at 02:47, Kien Truong <mailto:duckientru...@gmail.com>> wrote: Hi Mans, They're not executed in the same thread, but

Re: Flink drops messages?

2017-11-13 Thread Kien Truong
Getting late elements from side-output is already available with Flink 1.3 :) Regards, Kien On 11/13/2017 5:00 PM, Fabian Hueske wrote: Hi Andrea, you are right. Flink's window operators can drop messages which are too late, i.e., have a timestamp smaller than the last watermark. This is e

Re: [Flink] merge-sort for a DataStream

2017-11-16 Thread Kien Truong
Hi Jiewen, Since a DataStream can have infinite number of elements, you can't globally sorted all the elements. If the number of element is finite, you can use the DataSet API, which will look smth like this DataSet> a; DataSet aFlatten = a.flatMap(..); DataSet aSorted = aFla

Re: Kafka consumer to sync topics by event time?

2017-11-22 Thread Kien Truong
Hi, When you join multiple stream with different watermarks, the resulting stream's watermark will be the smallest of the input watermark, as long as you don't explicitly assign a new watermarks generator. In your example, if small_topic has watermark at time t1, big_topic has watermark at

Bad entry in block exception with RocksDB

2017-11-22 Thread Kien Truong
Hi, We are seeing this exception in one of our job, whenever a check point or save point is performed. java.lang.RuntimeException: Error while adding data to RocksDB at org.apache.flink.contrib.streaming.state.RocksDBListState.add(RocksDBListState.java:119) at org.apache.flink.runtime.state.Us

Non-intrusive way to detect which type is using kryo ?

2017-11-27 Thread Kien Truong
Hi, Are there any way to only log when Kryo serializer is used? It's a pain to disable generic type then try to solve the exception one by one. Best regards, Kien

Re: Non-intrusive way to detect which type is using kryo ?

2017-12-01 Thread Kien Truong
stem that will make it easier to tell if a type is a POJO or not. I have some utility in mind like `ensurePojo(MyType.class)` that would throw an exception with a reason why this type must be treated as a generic type. Would this help in your case? Regards, Timo Am 11/28/17 um 2:40 AM schrieb K

Re: [Docs] Can't add metrics to RichFilterFunction

2017-12-14 Thread Kien Truong
That syntax is incorrect, should be. @transient private var counter:Counter = _ Regards, Kien On 12/14/2017 8:03 PM, Julio Biason wrote: @transient private var counter:Counter

Service discovery for flink-metrics-prometheus

2017-12-14 Thread Kien Truong
Hi, Does anyone have recommendations about integrating flink-metrics-prometheus with some SD mechanism so that Prometheus can pick up the Task Manager's location dynamically ? Best regards, Kien

NullPointerException with Avro Serializer

2017-12-19 Thread Kien Truong
Hi, After upgrading to Flink 1.4, we encounter this exception Caused by: java.lang.NullPointerException: in com.viettel.big4g.avro.LteSession in long null of long in field tmsi of com.viettel.big4g.avro.LteSession at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:161) a

Re: NullPointerException with Avro Serializer

2017-12-20 Thread Kien Truong
y the default doesn't work for us. Best regards, Kien ⁣Sent from TypeApp ​ On Dec 20, 2017, 14:09, at 14:09, Kien Truong wrote: >Hi, > >After upgrading to Flink 1.4, we encounter this exception > >Caused by: java.lang.NullPointerException: in >com.viettel.big4g.avro.LteSes

Re: NullPointerException with Avro Serializer

2017-12-20 Thread Kien Truong
It turn out that our flink branch is out-of-date. Sorry for all the noise. :) Regards, Kien ⁣Sent from TypeApp ​ On Dec 20, 2017, 16:42, at 16:42, Kien Truong wrote: >Upon further investigation, we found out that the reason: > >* The cluster was started on YARN with the hadoop classpa

Re: Service discovery for flink-metrics-prometheus

2018-01-05 Thread Kien Truong
gt;wrote: > >> I'm not aware of how this is typically done but maybe Chesnay (cc'ed) >has >> an idea. >> >> > On 14. Dec 2017, at 16:55, Kien Truong >wrote: >> > >> > Hi, >> > >> > Does anyone have recommendations ab

Re: Flink on YARN

2018-01-20 Thread Kien Truong
Hi, You only need to install Flink on the node where you want to perform job submission. Regards, Kien On 1/20/2018 3:23 PM, Soheil Pourbafrani wrote: Hi, I have a YARN cluster(containing no Flink installation) that I want to run Flink application on that. I was wondering if it is needed

Re: Task Manager detached under load

2018-01-20 Thread Kien Truong
Hi, You should enable and check your garbage collection log. We've encountered case where Task Manager disassociated due to long GC pause. Regards, Kien On 1/20/2018 1:27 AM, ashish pok wrote: Hi All, We have hit some load related issues and was wondering if any one has some suggestions

Re: How does BucketingSink generate a SUCCESS file when a directory is finished

2018-02-01 Thread Kien Truong
Hi, I did not actually test this, but I think with Flink 1.4 you can extend BucketingSink and overwrite the invoke method to access the watermark Pseudo code: invoke(IN value, SinkFunction.Context context) {    long currentWatermark = context.watermark()    long taskIndex = getRuntimeContex

Reduce parallelism without network transfer.

2018-02-02 Thread Kien Truong
Hi, Assuming that I have a streaming job, using 30 task managers with 4 slot each. I want to change the parallelism of 1 operator from 120 to 30. Are there anyway so that each subtask of this operator get data from 4 upstream subtasks running in the same task manager, thus avoiding network comp

Re: RocksDB / checkpoint questions

2018-02-02 Thread Kien Truong
⁣Sent from TypeApp ​ On Feb 3, 2018, 10:48, at 10:48, Kien Truong wrote: >Hi, >Speaking from my experience, if the distributed disk fail, the >checkpoint will fail as well, but the job will continue running. The >checkpoint scheduler will keep running, so the first scheduled >c

Re: Reduce parallelism without network transfer.

2018-02-05 Thread Kien Truong
gt;Piotrek > >> On 3 Feb 2018, at 04:39, Kien Truong wrote: >> >> Hi, >> >> Assuming that I have a streaming job, using 30 task managers with 4 >slot each. I want to change the parallelism of 1 operator from 120 to >30. Are there anyway so that each subtask o

Re: Java heap size in YARN

2018-02-15 Thread Kien Truong
Hi, The relevant settings is: |containerized.heap-cutoff-ratio|: (Default 0.25) Percentage of heap space to remove from containers started by YARN. When a user requests a certain amount of memory for each TaskManager container (for example 4 GB), we can not pass this amount as the maximum hea

Re: Java heap size in YARN

2018-02-15 Thread Kien Truong
: >Thanks Kien. I will at least play with the setting :) We use hadoop >(s3) as >a chekpoint store. In our case off heap memory is around 300MB as >reported >on task manager statistic page. > >15 lut 2018 17:24 "Kien Truong" napisał(a): > >> Hi, >> >>

Cannot used managed keyed state in sink

2018-02-23 Thread Kien Truong
Hi, It seems that I can't used managed keyed state inside sink functions. Is this unsupported with Flink 1.4 or am I doing something wrong ? Regards, Kien ⁣Sent from TypeApp ​

Re: Strange behavior on filter, group and reduce DataSets

2018-03-16 Thread Kien Truong
Hi, Just a guest, but string compare in Java should be using equals method, not == operator. Regards, Kien On 3/16/2018 9:47 PM, simone wrote: /subject.getField("field1") == "";// /

Re: Error running on Hadoop 2.7

2018-03-22 Thread Kien Truong
Hi Ashish, Yeah, we also had this problem before. It can be solved by recompiling Flink with HDP version of Hadoop according to instruction here: https://ci.apache.org/projects/flink/flink-docs-release-1.4/start/building.html#vendor-specific-versions Regards, Kien On 3/22/2018 12:25 AM,

Re: Consumer offsets not visible in Kafka

2018-04-19 Thread Kien Truong
Hi, That tool only shows active consumer-groups that make use of the automatic partitions assignment API. Flink use the manual partitions assignment API, so it will now show up there. The best way to monitor kafka offset with Flink is using Flink's own metrics system. Otherwise, you can

Re: debug for Flink

2018-04-19 Thread Kien Truong
Hi, Our most useful tool when debugging Flink is actually the simple log files, because debugger just slow things down too much for us. However, having to re-deploy the entire cluster to change the logging level is a pain (we use YARN), so we would really like an easier method to change the

Re: Substasks - Uneven allocation

2018-04-19 Thread Kien Truong
Hi Pedro, You can try to call either .rebalance() or|.shuffle()| || |before the Async operator. Shuffle might give a better result if you have fewer tasks than parallelism. Best regards, Kien | On 4/18/2018 11:10 PM, PedroMrChaves wrote: Hello, I have a job that has one async operational

Re: debug for Flink

2018-04-20 Thread Kien Truong
totally logged during the system running? And then they can further be analyzed by category them by level? Best, Stephen On Apr 19, 2018, at 7:19 AM, Kien Truong wrote: Hi, Our most useful tool when debugging Flink is actually the simple log files, because debugger just slow things down

Re: managin order to use epoll (tasker.network.netty.transport: epoll), is it required that linux version is 4.0.16 or newer or not

2018-04-21 Thread Kien Truong
Hi, The number 4.0.16 refer to netty version, not the Linux kernel version Regards, Kien ⁣Sent from TypeApp ​ On Apr 20, 2018, 21:45, at 21:45, Ted Yu wrote: >I think you should upgrade Linux to said version or newer. > >Cheers > >On Fri, Apr 20, 2018 at 6:35 AM, makeyang >wrote: > >> my flin

Re: Submitting Flink application on YARN parameter

2018-04-28 Thread Kien Truong
Hi, You have to enable CPU scheduling in YARN, otherwise it always shows that only 1 CPU is allocated for each container, regardless of how many Flink try to allocated. TaskManager memory is 1400MB, but Flink reserves some amount for for off-heap memory, so the actual heap size is smaller.

Re: Raspberry Pi Memory Configuration

2018-04-28 Thread Kien Truong
Hi Nicholas, Try reducing containerized.heap-cutoff-min from the default 600MB to a lower number. Still, I think RPi is way too under-powered to run any serious Flink apps. Regards, Kien On 4/28/2018 4:38 PM, Nicholas Walton wrote: Hi, Hope this is the right place to ask, but

Re: Multiple hdfs

2018-05-22 Thread Kien Truong
Hi, If your cluster are not high-availability clusters then just use the full path to the cluster. For example, to refer to directory /checkpoint on cluster1, use hdfs://namenode1_ip:port/checkpoint Like wise, /data on cluster2 will be hdfs://namenode2_ip:port/data If your cluster is a HA

Re: Multiple hdfs

2018-05-22 Thread Kien Truong
<mailto:raul.valdoleiros.olive...@gmail.com>> wrote: Hi Kien, Thanks for you reply. Your goal is to store the checkpoints in one hdfs cluster and the data in other hdfs cluster. So the flink should be able to connect to two different hdfs clusters. Thanks 2018-05-22 15:00 GMT+0

Re: Multiple Task Slots support in Flink 1.5

2018-05-31 Thread Kien Truong
Hi, We're using multiple slots per TaskManager with legacy mode, and everything works fine. For the new default mode, it also seems to works for us, so I'm not sure what is not supported. May be someone from Flink team could clarify. Best regards, Kien On 5/31/2018 4:26 AM, Abdul Qadeer

Re: flink kafka consumer lag

2017-07-09 Thread Kien Truong
Hi, You should setup a metric reporter to collect Flink's metrics. https://ci.apache.org/projects/flink/flink-docs-release-1.2/monitoring/metrics.html There's a lot of useful information in the metrics, including the consumer lags. I'm using the Graphite reporter with InfluxDB for storage +

Re: data loss after implementing checkpoint

2017-07-10 Thread Kien Truong
Hi, I think you need to create a savepoint and restore from there. https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/savepoints.html Checkpoint are for automatic recovery within the lifetime of a job, they're deleted when you stop the job manually. Regards, Kien On 7/10/1

High back-pressure after recovering from a save point

2017-07-13 Thread Kien Truong
Hi all, I have one job where back-pressure is significantly higher after resuming from a save point. Because that job makes heavy use of stateful functions with RocksDBStateBackend , I'm suspecting that this is the cause of performance degradation. Does anyone encounter simillar issues or

Re: High back-pressure after recovering from a save point

2017-07-13 Thread Kien Truong
, the backpressure should disappear. > >Best, Fabian > >2017-07-13 15:48 GMT+02:00 Kien Truong : > >> Hi all, >> >> I have one job where back-pressure is significantly higher after >resuming >> from a save point. >> >> Because that job makes

Re: High back-pressure after recovering from a save point

2017-07-14 Thread Kien Truong
ssure situations? > >Thanks, >Stephan > > >On Fri, Jul 14, 2017 at 1:15 AM, Kien Truong >wrote: > >> Hi Fabian, >> This happens to me even when the restore is immediate, so there's not >much >> data in Kafka to catch up (5 minutes max) >> >

Re: High back-pressure after recovering from a save point

2017-07-16 Thread Kien Truong
()` would be it. >> >> On 15 July 2017 at 12:33:53 AM, Stephan Ewen (se...@apache.org) >wrote: >> >> Can you try starting from the savepoint, but telling Kafka to start >from >> the latest offset? >> >> (@gordon: Is that possible in Flink 1.3.1 or only

Re: How can i merge more than one flink stream

2017-07-19 Thread Kien Truong
Hi, To expand on Fabian's answer, there's a few API for join. * connect - you have to provide a CoprocessFunction. * window join/cogroup - you provide  key selector functions, a time window and a join/cogroup function. With the first method, you have to write more code, in exchange for much mo

Accept Avro object in ListCheckpointed interface

2017-07-21 Thread Kien Truong
Hi, ListCheckpointed only accept Serializable object at the moment, which make it cumbersome to checkpoint avro objects (have to convert them to byte arrays first). Is there any plan to support avro object directly? Best regards, Kien

Re: FlinkKafkaConsumer010 - Memory Issue

2017-07-21 Thread Kien Truong
Hi, From the log, it doesn't seem that the task manager use a lot of memory. Can you post the output of top. Regards, Kien On 7/20/2017 1:23 AM, PedroMrChaves wrote: Hello, Whenever I submit a job to Flink that retrieves data from Kafka the memory consumption continuously increases. I've c

Re: Split Streams not working

2017-07-24 Thread Kien Truong
Hi, I think you're hitting this bug https://issues.apache.org/jira/browse/FLINK-5031 Try the workaround mentioned in a bug: add a map function between map and select Regards, Kien On 7/25/2017 3:14 AM, smandrell wrote: Basically, we are not splitting the streams correctly because when we t

Re: Split Streams not working

2017-07-24 Thread Kien Truong
Hi, I meant adding a select function between the two consecutive select. Or if you use Flink 1.3, you can use the new side output functionality. Regards, Kien On 7/25/2017 7:54 AM, Kien Truong wrote: Hi, I think you're hitting this bug https://issues.apache.org/jira/browse/FLINK

Re: Purging Late stream data

2017-07-25 Thread Kien Truong
Hi, One method you can use is using a ProcessFunction. In the process function, you get the timer service through the function context, which can then be used to schedule a task to clean up late data. Check out the docs for ProcessFunction https://ci.apache.org/projects/flink/flink-docs-rel

Re: FlinkKafkaConsumer010 - Memory Issue

2017-07-25 Thread Kien Truong
Hi Pedro, As long as there's no OutOfMemoryError/long garbage collection pause, there's nothing to worry about keeping memory allocated. The memory should be garbage-collected by the JVM when necessary. Regards, Kien On 7/25/2017 10:53 PM, PedroMrChaves wrote: Hello, Thank you for the rep

Re: Memory Leak - Flink / RocksDB ?

2017-07-25 Thread Kien Truong
Hi, What're your task manager memory configuration ? Can you post the TaskManager's log ? Regards, Kien On 7/25/2017 8:41 PM, Shashwat Rastogi wrote: Hi, We have several Flink jobs, all of which reads data from Kafka do some aggregations (over sliding windows of (1d, 1h)) and writes data

Re: Memory Leak - Flink / RocksDB ?

2017-07-27 Thread Kien Truong
l Thank you in advance. Regards Shashwat On 26-Jul-2017, at 12:10 PM, Kien Truong <mailto:duckientru...@gmail.com>> wrote: Hi, What're your task manager memory configuration ? Can you post the TaskManager's log ? Regards, Kien On 7/25/2017 8:41 PM, Shashwat Rastogi

Constant write stall warning with RocksDB state backend

2017-08-02 Thread Kien Truong
but I want to ask if there are any downside to it? Best regards, Kien Truong

Re: Using latency markers

2017-08-10 Thread Kien Truong
Hi, I just want to say we're having the same issues. There's no metric for latency when we attempted to export the metrics through graphite either. Regards, Kien On 8/10/2017 7:36 PM, Aljoscha Krettek wrote: Hi, I must admit that I've never used this but I'll try and look into it. Best

Re: Using latency markers

2017-08-11 Thread Kien Truong
Did you enable that? Best, Aljoscha On 10. Aug 2017, at 18:36, Kien Truong mailto:duckientru...@gmail.com>> wrote: Hi, I just want to say we're having the same issues. There's no metric for latency when we attempted to export the metric

Re: Error during Kafka connection

2017-08-11 Thread Kien Truong
Hi, You mentioned that your kafka broker is behind a proxy. This could be a problem, because when the client try to get the cluster's topology, it will get the brokers ' private addresses , which is not reachable. Regards, Kien On Aug 11, 2017, 18:18, at 18:18, "Tzu-Li (Gordon) Tai" wrote:

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
Hi, While this task is quite trivial to do with Flink Dataset API, using readTextFile to read the input and a flatMap function to perform the downloading, it might not be a good idea. The download process is I/O bound, and will block the synchronous flatMap function, so the throughput will

Re: Distribute crawling of a URL list using Flink

2017-08-14 Thread Kien Truong
ack your URL download into the asynchronous part and >collect >the resulting string for further processing in your pipeline. > > > >Nico > > >[1] >https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/ >asyncio.html > >On Monday, 14 August 2017 17:50

Exception for Scala anonymous class when restoring from state

2017-08-16 Thread Kien Truong
Hi, After some refactoring: moving some operator to separate functions/file, I'm encountering a lot of exceptions like these. The logic of the application did not change, and all the refactored operators are stateless, e.g simple map/flatmap/filter. Does anyone know how to fix/avoid/work aroun

Re: Exception for Scala anonymous class when restoring from state

2017-08-17 Thread Kien Truong
ed as keys? >Maybe what you could try doing, as a means to avoid that for now, is to >make sure that the key classes are untouched. > >Please keep us updated on how this works out for you, I’ll continue to >look into it. > >Thanks, >Gordon > >On 17 August 2017 at

Re: Process Function

2017-09-05 Thread Kien Truong
Hi, You can register a processing time timer inside the onTimer and the open function to have a timer that run periodically. Pseudo-code example: |ValueState lastRuntime; void open() { ctx.timerService().registerProcessingTimeTimer(current.timestamp + 6); } void onTimer() { // Run the p

Re: Does RocksDB need a dedicated CPU?

2017-09-05 Thread Kien Truong
Hi, In my experience, RocksDB uses very little CPU, and doesn't need a dedicated CPU. However, it's quite disk intensive. You'd need fast, ideally dedicated SSDs to achieve the best performance. Regards, Kien On 9/5/2017 1:15 PM, Bowen Li wrote: Hi guys, Does RocksDB need a dedicated C

Task Manager segfault randomly with RocksDB

2017-09-12 Thread Kien Truong
Hi all, Our task managers are segfaulting randomly when using RocksDB state backend, any tips regarding how to debug this situation are much appreciated. We are using Flink 1.3.2  with Yarn on Centos 7 Regards, Kiên

RocksDB segfault inside timer when accessing/clearing state

2017-10-06 Thread Kien Truong
Hi, We are using processing timer to implement some state clean up logic. After switching from FsStateBackend to RocksDB, we encounter a lot of segfault from the Time Trigger threads when accessing/clearing state value. We currently uses the latest 1.3-SNAPSHOT, with the patch upgrading RocksDB

Re: RocksDB segfault inside timer when accessing/clearing state

2017-10-09 Thread Kien Truong
mehow reworking how the timers events are executed or interact with normal processing. Best, Stefan Am 07.10.2017 um 05:44 schrieb Kien Truong : Hi, We are using processing timer to implement some state clean up logic. After switching from FsStateBackend to RocksDB, we encounter a lot of seg

Delete save point when using incremental checkpoint

2017-10-11 Thread Kien Truong
Hi, When using increment checkpoint mode, can I delete the save point that the job recovered from after sometime ? Or do I have to keep that checkpoint forever because it's a part of the snapshot chain ? Best regards, Kien

Re: Delete save point when using incremental checkpoint

2017-10-11 Thread Kien Truong
ote: > >> Hi, >> >> There is an important distinction between checkpoints (triggered by >Flink, >> may be incremental) and savepoints (manually triggered, always >> self-contained). >> >> Your question is unfortunately mixing both terms, please exp

Timer coalescing necessary?

2017-10-12 Thread Kien Truong
Hi, We are having a streaming job where we use timers to implement key timeout for stateful functions. Should we implement coalescing logic to reduce the number of timer trigger, or it is not necessary with Flink? Best regards, Kien

Re: Timer coalescing necessary?

2017-10-13 Thread Kien Truong
ing them across different keys would not be possible right now. Best, Aljoscha On 13. Oct 2017, at 06:37, Kien Truong wrote: Hi, We are having a streaming job where we use timers to implement key timeout for stateful functions. Should we implement coalescing logic to reduce the number of timer tr

Re: Timer coalescing necessary?

2017-10-13 Thread Kien Truong
key (and a namespace). Best, Aljoscha On 13. Oct 2017, at 13:56, Kien Truong wrote: Hi Aljoscha, Could you clarify how the timer system works right now ? For example, let's say I have a function F, with 3 keys that are registered to execute at processing time T. Would Flink maintain a sing

Re: Garbage collection concerns with Task Manager memory

2017-10-18 Thread Kien Truong
Hi, Yes, GC is still a major concern. Even G1 has a hard time dealing with >64GB heap in our experience. To mitigate, we run multiple TMs with smaller heap per machine, and use RocksDBStateBackend. Best regards, Kien On 10/18/2017 4:40 PM, Marchant, Hayden wrote: I read in the Flink docu

Re: Off heap memory issue

2017-10-18 Thread Kien Truong
Hi, We saw a similar issue in one of our job due to ByteBuffer memory leak[1]. We fixed it using the solution in the article, setting -Djdk.nio.maxCachedBufferSize This variable is available for Java > 8u102 Best regards, Kien [1]http://www.evanjones.ca/java-bytebuffer-leak.html On 10/18

Re: Different CoGroup behavior inside DeltaIteration

2015-11-16 Thread Duc Kien Truong
bulk iteration. Best, Kien Truong Sent using CloudMagic Email [https://cloudmagic.com/k/d/mailapp?ct=ta&cv=8.0.55&pv=5.1.1&source=email_footer_2] On Mon, Nov 16, 2015 at 11:02 PM, Stephan Ewen < se...@apache.org [se...@apache.org] > wrote: It is actually very important tha