Github user tdas commented on the issue:
https://github.com/apache/spark/pull/20646
Actually, I am having second thoughts about this. This is fundamentally
changing how the tests work, especially for stress tests. The stress tests
actually test these corner cases (by randomly adding successive AddData) about
what if data was being added while the previously added data is being picked
up. With this change, we will accidentally not test those race-condition-prone
cases.
Second, we are taking multiple locks here in multiple sources, and the
StreamExecution is likely to take the same locks. I am really afraid that we
are introducing deadlocks by doing this.
I am still thinking what the right approach here. I think it should be
- Explicit synchronized adding of data to multiple sources.
- Not holding locks in multiple sources.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]