In running tests of flink jobs we are seeing some that yield really good 
performance (2.5M records in minutes) and others that are struggleing to get 
past 200k records processed.  In the later case there are a large number of 
keys, and each key gets state in the form of 3 value states.  One holds a 
string and the others hold a Map of Lists of events (JSONObject object 
subclasses with custom java serialization).  There is also a MapState for each 
key that will hold one entry for each event matching that key (string->string).

The program starts out processing 1000-1500 records/sec (on my 4 year old 
laptop), and progressively gets slower and slower.  it is about 400/sec when 
processing the 500,000th event.

When using JProfiler on the test (local environment running under Eclipse) it 
indicates 70-80% of the execution time is spent in Kryo serialization methods.

When using the MemoryStateBackend the above is true, the RocksDBStateBackend is 
about 1/2 to 2/3 the speed.

Any suggestions on how to reduce or identify the source of the serialization 
performance issue is welcome.

Michael

Reply via email to