Re: Restore from savepoint with Iterations

2020-05-05 Thread ashish pok
Let me see if I can do artificial throttle somewhere. Volume of data is really high and hence trying to avoid rounds in Kafka too. Looks like options are “not so elegant” until FLIP-15. Thanks for pointers again!!! On Monday, May 4, 2020, 11:06 PM, Ken Krugler wrote: Hi Ashish, The

Re: Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-09-04 Thread ashish pok
at this point. It would be good to know the reason you want to have such HA and see if Flink meets you requirement in another way. Thanks, Jiangjie (Becket) Qin On Thu, Aug 29, 2019 at 9:19 PM ashish pok wrote: Looks like Flink is using “assign” partitions instead of “subscribe” which will not allow

Re: Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-08-29 Thread ashish pok
, ashish pok wrote: All, I was wondering what the expected default behavior is when same app is deployed in 2 separate clusters but with same group Id. In theory idea was to create active-active across separate clusters but it seems like both apps are getting all the data from Kafka.  Anyone

Kafka consumer behavior with same group Id, 2 instances of apps in separate cluster?

2019-08-28 Thread ashish pok
All, I was wondering what the expected default behavior is when same app is deployed in 2 separate clusters but with same group Id. In theory idea was to create active-active across separate clusters but it seems like both apps are getting all the data from Kafka.  Anyone else has tried

Re: JSON to CEP coversion

2019-01-22 Thread ashish pok
of the logic. Best, Dom. [1] https://github.com/apache/bahir[2] https://github.com/wso2/siddhi wt., 22 sty 2019 o 20:20 ashish pok napisał(a): All, Wondering if anyone in community has started something along the line - idea being CEP logic is abstracted out to metadata instead. That can

JSON to CEP coversion

2019-01-22 Thread ashish pok
All, Wondering if anyone in community has started something along the line - idea being CEP logic is abstracted out to metadata instead. That can then further be exposed out to users from a REST API/UI etc. Obviously, it would really need some additional information like data catalog etc for it

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
e know if you recommend using latest test utils with 1.4.2 core as a test.  Thanks, Ashish On Monday, September 17, 2018, 9:33:56 AM EDT, ashish pok wrote: Hi Till, I am still in 1.4.2 version and will need some time before we can get later version certified in our Prod env. Timers a

Re: Unit / Integration Test Timer

2018-09-17 Thread ashish pok
:57 PM ashish pok wrote: Hi Till, To answer your first question, I currently don't (and honestly now sure how other than of course in IDE I can use breakpoint, or if something like MockIto can do it). So did I interpret it correctly that it sounds like execution env started using flink-test

Re: Unit / Integration Test Timer

2018-09-14 Thread ashish pok
a fraction of the 2 seconds? For this it would be better to use event time which allows you to control how time passes. If you want to test a specific operator you could try out the One/TwoInputStreamOperatorTestHarness. Cheers,Till On Fri, Sep 14, 2018 at 3:36 PM ashish pok wrote: All, Hopeful

Unit / Integration Test Timer

2018-09-14 Thread ashish pok
All, Hopefully a quick one. I feel like I have seen this answered before a few times before but can't find an appropriate example. I am trying to run few tests where registered timeouts are invoked (snippet below). Simple example as show in documentation for integration test (using

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
with active windows, windows which have not been purged yet. Maybe Aljoscha knows more about why the window state is growing (I would not rule out a bug). Cheers,Till On Tue, Jul 31, 2018 at 1:45 PM ashish pok wrote: Hi Till, Keys are unbounded (a group of events have same key but that key

Re: Questions on Unbounded number of keys

2018-07-31 Thread ashish pok
Hi Till, Keys are unbounded (a group of events have same key but that key doesnt repeat after it is fired other than some odd delayed events). So basically there 1 key that will be aligned to a window. When you say key space of active windows, does that include keys for windows that have

Re: Implement Joins with Lookup Data

2018-07-25 Thread ashish pok
be.  Thanks, - Ashish On Wednesday, July 25, 2018, 4:07 AM, Michael Gendelman wrote: Hi Ashish, We are planning for a similar use case and I was wondering if you can share the amount of resources you have allocated for this flow? Thanks,Michael On Tue, Jul 24, 2018, 18:57 ashish pok wrote: BTW

Re: Implement Joins with Lookup Data

2018-07-24 Thread ashish pok
is dead is not recoverable. How do you recover from that situation? Will the pipeline die and you go over the entire bootstrap process? On Tue, Jul 24, 2018 at 11:56 ashish pok wrote: BTW,  We got around bootstrap problem for similar use case using a “nohup” topic as input stream. Our CICD

Re: Implement Joins with Lookup Data

2018-07-24 Thread ashish pok
BTW,  We got around bootstrap problem for similar use case using a “nohup” topic as input stream. Our CICD pipeline currently passes an initialize option to app IF there is a need to bootstrap and waits for X minutes before taking a savepoint and restart app normally listening to right

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
r/ops/state/large_state_tuning.html#task-local-recovery Am 23.07.2018 um 14:18 schrieb ashish pok : Sorry, Just a follow-up. In absence of NAS then the best option to go with here is checkpoint and savepoints both on HDFS and StateBackend using local SSDs then? We were trying to not even hit

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Sorry, Just a follow-up. In absence of NAS then the best option to go with here is checkpoint and savepoints both on HDFS and StateBackend using local SSDs then? We were trying to not even hit HDFS other than for savepoints. - Ashish On Monday, July 23, 2018, 7:45 AM, ashish pok wrote

Re: Permissions to delete Checkpoint on cancel

2018-07-23 Thread ashish pok
Stefan, I did have first point at the back of my mind. I was under the impression though for checkpoints, cleanup would be done by TMs as they are being taken by TMs. So for a standalone cluster with its own zookeeper for JM high availability, a NAS is a must have? We were going to go with

Re: Multi-tenancy environment with mutual auth

2018-07-16 Thread ashish pok
. Nico On 14/07/18 14:02, ashish pok wrote: > All, > > We are running into a blocking production deployment issue. It looks > like Flink inter-communications doesnt support SSL mutual auth. Any > plans/ways to support it? We are going to have to create DMZ for each &

Multi-tenancy environment with mutual auth

2018-07-14 Thread ashish pok
All, We are running into a blocking production deployment issue. It looks like Flink inter-communications doesnt support SSL mutual auth. Any plans/ways to support it? We are going to have to create DMZ for each tenant without that, not preferable of course. - Ashish

Re: Flink Kafka TimeoutException

2018-07-05 Thread ashish pok
Our experience on this has been that if Kafka cluster is healthy, JVM resource contentions on our Flink app caused by high heap utilization and there by lost CPU cycles on GC also did result in this issue. Getting basic JVM metrics like CPU load, GC times and Heap Util from your app (we use

Re: Memory Leak in ProcessingTimeSessionWindow

2018-07-02 Thread ashish pok
2018, 9:00:14 AM EDT, ashish pok wrote: Stefan, All,  If there are no further thoughts on this I am going to switch my app to low level Process API. I still think there is an easier solution here which I am missing but I will revisit that after I fix Production issue. Thanks, Ashish O

Re: How to partition within same physical node in Flink

2018-07-02 Thread ashish pok
led                                DataStream cameraWithCubeDataStream = cameraWithCubeDataStreamAsync. keyBy((cameraWithCube) -> cameraWithCube.getTs()); On Thu, Jun 28, 2018 at 9:22 AM ashish pok wrote: Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to first keyBy

Re: How to partition within same physical node in Flink

2018-06-28 Thread ashish pok
Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to first keyBy parent and then by child. If we want to have physical partitioning in a way where physical partiotioning happens first by parent key and localize grouping by child key, is

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-22 Thread ashish pok
, June 21, 2018, 7:28 AM, ashish pok wrote: Hi Stefan,  Thanks for outlining the steps and are similar to what we have been doing for OOM issues. However, I was looking for something more high level on whether state / key handling needs some sort of cleanup specifically that is not done by default. I

Re: Memory Leak in ProcessingTimeSessionWindow

2018-06-18 Thread ashish pok
that runs into the problem and share it with us? That would make finding the cause a lot easier. Best,Stefan Am 15.06.2018 um 23:01 schrieb ashish pok : All, I have another slow Memory Leak situation using basic TimeSession Window (earlier it was GlobalWindow related that Fabian helped clarify).  I

Memory Leak in ProcessingTimeSessionWindow

2018-06-15 Thread ashish pok
All, I have another slow Memory Leak situation using basic TimeSession Window (earlier it was GlobalWindow related that Fabian helped clarify).  I have a very simple data pipeline: DataStream processedData = rawTuples

IoT Use Case, Problem and Thoughts

2018-06-05 Thread ashish pok
Fabian, Stephan, All, I started a discussion a while back around having a form of event-based checkpointing policy that will help us in some of our high volume data pipelines. Here is an effort to put this in front of community and understand what capabilities can support these type of use

Re: Best way to clean-up states in memory

2018-05-14 Thread ashish pok
y) rely on timers to clean up per-window state. Best, Fabian 2018-05-14 9:34 GMT+02:00 Kostas Kloudas <k.klou...@data-artisans.com>: Hi Ashish, It would be helpful to share the code of your custom trigger for the first case.Without that, we cannot tell what state you create and how/when you u

Re: Best way to clean-up states in memory

2018-05-13 Thread ashish pok
Mon, Apr 30, 2018 at 3:54 PM, ashish pok <ashish...@yahoo.com> wrote: All, I am using noticing heap utilization creeping up slowly in couple of apps which eventually lead to OOM issue. Apps only have 1 process function that cache state. I did make sure I have a clear method invoked when ev

Best way to clean-up states in memory

2018-04-30 Thread ashish pok
All, I am using noticing heap utilization creeping up slowly in couple of apps which eventually lead to OOM issue. Apps only have 1 process function that cache state. I did make sure I have a clear method invoked when events are collected normally, on exception and on timeout. Are any other

Re: Scaling down Graphite metrics

2018-04-16 Thread ashish pok
that you aren't interested in. On 13.04.2018 18:52, ashish pok wrote: All, We love Flinks OOTB metrics but boy there is a ton :) Any way to scale them down (frequency and metric itself)? Flink apps are becoming huge source of data right now. Thanks, -- Ashish

Scaling down Graphite metrics

2018-04-13 Thread ashish pok
All, We love Flinks OOTB metrics but boy there is a ton :) Any way to scale them down (frequency and metric itself)? Flink apps are becoming huge source of data right now. Thanks, -- Ashish

Re: Fw: 1.4.2 - Unable to start YARN containers with more than 1 vCores per Task Manager

2018-04-03 Thread ashish pok
Thanks Shashank, I will check ot out. -- Ashish On Tue, Apr 3, 2018 at 10:11 AM, shashank734 wrote: CHeck in your Yarn configuration, Are you using DeafaultResourceCalculater which only uses memory while allocating resources. So you have to change it to

Fw: 1.4.2 - Unable to start YARN containers with more than 1 vCores per Task Manager

2018-04-03 Thread ashish pok
Hi All,   I had been using the following command in a Lab environment successfully in 1.3 Flink version.   yarn-session.sh -n 4 -s 4 -jm 2048 -tm 2048 -Dyarn.containers.vcores=2 -nm infra.test3   As expected, I see 4 TMs with 16 slots and taking 8 vCores from YARN. In a new

Re: Error running on Hadoop 2.7

2018-03-26 Thread ashish pok
on its class path? Maybe you could also share the logs with us. Please also check whether HADOOP_CLASSPATH is set to something suspicious. Thanks a lot! Cheers,Till On Wed, Mar 21, 2018 at 6:25 PM, ashish pok <ashish...@yahoo.com> wrote: Hi Piotrek, At this point we are simply

Re: "dynamic" bucketing sink

2018-03-26 Thread ashish pok
Hi Christophe, Have you looked at Kite SDK? We do something like this but using Gobblin and Kite SDK, which is a parallel pipeline to Flink. It feels like if you partition by something logical like topic name, you should be able to sink using Kite SDK. Kite allows you good ways to handle

Re: Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
s version conflict? Piotrek On 21 Mar 2018, at 16:52, ashish pok <ashish...@yahoo.com> wrote: Hi Piotrek, Yes, this is a brand new Prod environment. 2.6 was in our lab. Thanks, -- Ashish On Wed, Mar 21, 2018 at 11:39 AM, Piotr Nowojski<pi...@data-artisans.com> wrote: Hi,

Re: Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
u sure that something hasn't mix in the process? Does some simple word count example works on the cluster after the upgrade? Piotrek On 21 Mar 2018, at 16:11, ashish pok <ashish...@yahoo.com> wrote: Hi All, We ran into a roadblock in our new Hadoop environment, migrating from 2.6 to 2.7. It

Error running on Hadoop 2.7

2018-03-21 Thread ashish pok
Hi All, We ran into a roadblock in our new Hadoop environment, migrating from 2.6 to 2.7. It was supposed to be an easy lift to get a YARN session but doesnt seem like :) We definitely are using 2.7 binaries but it looks like there is a call here to a private methos which screams runtime

Restart hook and checkpoint

2018-03-02 Thread ashish pok
All, It looks like Flink's default behavior is to restart all operators on a single operator error - in my case it is a Kafka Producer timing out. When this happens, I see logs that all operators are restarted. This essentially leads to data loss. In my case the volume of data is so high that

Re: Task Manager detached under load

2018-02-24 Thread ashish pok
collection log. We've encountered case where Task Manager disassociated due to long GC pause. Regards, Kien On 1/20/2018 1:27 AM, ashish pok wrote: Hi All, We have hit some load related issues and was wondering if any one has some suggestions. We are noticing task managers an

Re: Task manager not able to rejoin job manager after network hicup

2018-02-24 Thread ashish pok
We see the same in 1.4. I dont think we could see this in 1.3. I had started a thread a while back on this. Till asked for more details. I havent had a chance to get back to him on this. If you can repro this easily perhaps you can get to it faster. I will find the thread and resend. Thanks,

Re: Strata San Jose

2018-02-09 Thread ashish pok
Awesome, I will send a note from my work email :)  -- Ashish On Fri, Feb 9, 2018 at 5:12 AM, Fabian Hueske<fhue...@gmail.com> wrote: Hi Ashish, I'll be at Strata San Jose and give two talks. Just ping me and we can meet there :-) Cheers, Fabian 2018-02-09 0:53 GMT+01:00 ashi

Strata San Jose

2018-02-08 Thread ashish pok
Wondering if any of the core Flink team members are planning to be at the conference? It would be great to meet in peson. Thanks, -- Ashish

Kafka Producer timeout causing data loss

2018-01-19 Thread ashish pok
Team, One more question to the community regarding hardening Flink Apps. Let me start off by saying we do have known Kafka bottlenecks which we are in the midst of resolving. So during certain times of day, a lot of our Flink Apps are seeing Kafka Producer timeout issues. Most of the logs are

Understanding Restart Strategy

2018-01-19 Thread ashish pok
Team, Hopefully, this is a quick one.  We have setup restart strategy as follows in pretty much all of our apps: env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, Time.of(30, TimeUnit.SECONDS))); This seems pretty straight-forward. App should retry starting 10 times every 30

Task Manager detached under load

2018-01-19 Thread ashish pok
Hi All, We have hit some load related issues and was wondering if any one has some suggestions. We are noticing task managers and job managers being detached from each other under load and never really sync up again. As a result, Flink session shows 0 slots available for processing. Even

Re: Kafka Consumer fetch-size/rate and Producer queue timeout

2017-11-07 Thread ashish pok
Thanks Fabian. I am seeing thia consistently and can definitely use some help. I have plenty of graphana views I can share if that helps :) Sent from Yahoo Mail on Android On Tue, Nov 7, 2017 at 3:54 AM, Fabian Hueske wrote: Hi Ashish, Gordon (in CC) might be able to

Re: Capacity Planning For Large State in YARN Cluster

2017-10-30 Thread ashish pok
easing the workload. Cheers, Till ​ On Mon, Oct 30, 2017 at 1:34 AM, ashish pok <ashish...@yahoo.com> wrote: Jorn, correct and I suppose that's where we are at this point. RocksDB based backend is definitely looking promising for our use case. Since I haven't gotten a definite no-no

Re: Capacity Planning For Large State in YARN Cluster

2017-10-29 Thread ashish pok
Jorn, correct and I suppose that's where we are at this point. RocksDB based backend is definitely looking promising for our use case. Since I haven't gotten a definite no-no on using 30% for YARN cut-off ratio (about 1.8GB from 6GB memory) and off-heap flag turned on, we will continue on that