uncleGen commented on issue #24323: [SPARK-27413][SS] keep the same epoch pace between driver and executor. URL: https://github.com/apache/spark/pull/24323#issuecomment-481946789 @jose-torres Thanks for you reply > I'm not sure I understand the motivation here. It's true that setting the poll interval larger than the generation interval will generate a bunch of empty epochs, You have got the motivation. As Mentioned above, the main concern of PR is to avoid produce empty epoch. > but why does that imply that we shouldn't allow it to be configured at all? We can indeed allow it to be configured. But firstly, IMO, it is not a good idea to expose this config to users and let them to set it carefully. Secondly, the `epoch pulling interval` is a internal config. We may use a better approach to optimize this issue but not add config. > And even if we shouldn't, why is "equal to the epoch duration" the right value? Hmm... After think again and deeply, "equal to the epoch duration" dose not really fix the issue. In some corner cases, executor will pull epoch from driver later than "epoch duration". So as @gaborgsomogyi mentioned, "push notifications to executors" may be a better approach. To reiterate, the main concern of PR is to avoid produce empty epoch and late epoch. Do you have any doubt about this motivation? Any idea is appreciated.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
