Re: Flink/Kafka POC performance issue

TechnoMage Tue, 17 Apr 2018 11:54:38 -0700

No checkpoints are active.
I will try that back end.
Yes, using JSONObject subclass for most of the intermediate state, with JSON 
strings in and out of Kafka.  I will look at the config page for how to enable 
that.


Thank you,
Michael

> On Apr 17, 2018, at 12:51 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> A few ideas how to start debugging this:
> 
>   - Try deactivating checkpoints. Without that, no work goes into persisting 
> rocksdb data to the checkpoint store.
>   - Try to swap RocksDB for the FsStateBackend - that reduces serialization 
> cost for moving data between heap and offheap (rocksdb).
>   - Do you have some expensive types (JSON, etc)? Try activating object reuse 
> (which avoids some extra defensive copies)
> 
> On Tue, Apr 17, 2018 at 5:50 PM, TechnoMage <mla...@technomage.com 
> <mailto:mla...@technomage.com>> wrote:
> Memory use is steady throughout the job, but the CPU utilization drops off a 
> cliff.  I assume this is because it becomes I/O bound shuffling managed state.
> 
> Are there any metrics on managed state that can help in evaluating what to do 
> next?
> 
> Michael
> 
> 
>> On Apr 17, 2018, at 7:11 AM, Michael Latta <mla...@technomage.com 
>> <mailto:mla...@technomage.com>> wrote:
>> 
>> Thanks for the suggestion. The task manager is configured for 8GB of heap, 
>> and gets to about 8.3 total. Other java processes (job manager and Kafka). 
>> Add a few more. I will check it again but the instances have 16GB same as my 
>> laptop that completes the test in <90 min. 
>> 
>> Michael
>> 
>> Sent from my iPad
>> 
>> On Apr 16, 2018, at 10:53 PM, Niclas Hedhman <nic...@hedhman.org 
>> <mailto:nic...@hedhman.org>> wrote:
>> 
>>> 
>>> Have you checked memory usage? It could be as simple as either having 
>>> memory leaks, or aggregating more than you think (sometimes not obvious how 
>>> much is kept around in memory for longer than one first thinks). If 
>>> possible, connect FlightRecorder or similar tool and keep an eye on memory. 
>>> Additionally, I don't have AWS experience to talk of, but IF AWS swaps RAM 
>>> to disk like regular Linux, then that might be triggered if your JVM heap 
>>> is bigger than can be handled within the available RAM.
>>> 
>>> On Tue, Apr 17, 2018 at 9:26 AM, TechnoMage <mla...@technomage.com 
>>> <mailto:mla...@technomage.com>> wrote:
>>> I am doing a short Proof of Concept for using Flink and Kafka in our 
>>> product.  On my laptop I can process 10M inputs in about 90 min.  On 2 
>>> different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and 
>>> ssd storage) I see the process hit a wall around 50min into the test and 
>>> short of 7M events processed.  This is running zookeeper, kafka broker, 
>>> flink all on the same server in all cases.  My goal is to measure single 
>>> node vs. multi-node and test horizontal scalability, but I would like to 
>>> figure out why hit hits a wall first.  I have the task maanger configured 
>>> with 6 slots and the job has 5 parallelism.  The laptop has 8 threads, and 
>>> the EC2 instances have 4 threads. On smaller data sets and in the begining 
>>> of each test the EC2 instances outpace the laptop.  I will try again with 
>>> an m5.2xlarge which has 8 threads and 32GB ram to see if that works better 
>>> for this workload.  Any pointers or ways to get metrics that would help 
>>> diagnose this would be appreciated.
>>> 
>>> Michael
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Niclas Hedhman, Software Developer
>>> http://polygene.apache.org <http://polygene.apache.org/> - New Energy for 
>>> Java
> 
>

Re: Flink/Kafka POC performance issue

Reply via email to