Hi, I am tracking states in my Spark streaming application with MapGroupsWithStateFunction described here: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/streaming/GroupState.html Which are the limiting factors on the number of states a job can track at the same time? Is it memory? Could be a bounded data structure in the internal implementation? Anything else ... You might have valuable input on this while I am trying to setup and test this.
Thanks, Arnold