gaborgsomogyi commented on pull request #30151:
URL: https://github.com/apache/spark/pull/30151#issuecomment-717816270


   Let me create a stream-stream join app to test and we can discuss the 
details what/how/where to aggregate.
   Some preliminary opinions:
   
   > see the overall memory usage end users have to accumulate these values by 
theirselves
   
   I agree, it would be good to show a summary but independent graph also 
needed to see which one is problematic
   
   > Having graphs per state store may be helpful on stream-stream join when 
there's a skew between left side and right side (either volume of the inputs or 
difference on evict condition), but probably can be hidden by default and shown 
on demand of "details". (separate page?)
   
   Yeah, having 3-4 operator would make the UI horror. I'll start to experiment 
w/ separate page per operator approach.
   
   > Btw I guess loadedMapCacheHitCount graph can be dropped unless on demand, 
as if things are working without crash or Spark's bug it will always increment 
properly.
   
   `loadedMapCacheHitCount` is coming from custom metrics which has taken over 
as-is: 
https://github.com/apache/spark/pull/30151/files#diff-e2de3487a935d91466e94189dc6d74dfe545a80a2a24a6da73cffbc55e32f6eaR261
   If we want to show such values selectively maybe we can create a blacklist 
config for it (of course is separate jira).
   Just a rapid idea: `spark.sql.streaming.ui.disabledCustomMetrics=foo,bar`. 
WDYT?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to