HeartSaVioR commented on a change in pull request #32796:
URL: https://github.com/apache/spark/pull/32796#discussion_r647219439
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -269,18 +269,14 @@ class SymmetricHashJoinStateManager(
// The backing store is arraylike - we as the caller are responsible
for filling back in
// any hole. So we swap the last element into the hole and decrement
numValues to shorten.
// clean
- if (numValues > 1) {
+ if (index != numValues - 1) {
val valuePairAtMaxIndex = keyWithIndexToValue.get(currentKey,
numValues - 1)
if (valuePairAtMaxIndex != null) {
Review comment:
Yeah my preference is to fail the query (like raising internal error) on
the case of `valuePairAtMaxIndex == null`, so that we can indicate the bad case
in prior instead of encountering bad case after state is corrupted.
If we would only have the case the value is null as error case, still makes
sense to fail the query even for the state created before the fix, but for
safety, we could log warning message here. I'm just afraid I'm missing some
"normal" case where the null value is doable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]