HeartSaVioR commented on a change in pull request #32828:
URL: https://github.com/apache/spark/pull/32828#discussion_r647936191
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
##########
@@ -274,6 +274,9 @@ class SymmetricHashJoinStateManager(
if (valuePairAtMaxIndex != null) {
keyWithIndexToValue.put(currentKey, index,
valuePairAtMaxIndex.value,
valuePairAtMaxIndex.matched)
+ } else {
+ logWarning(s"`keyWithIndexToValue` returns a null value for index
${numValues - 1} " +
Review comment:
Sigh that's the problem... We won't be able to reproduce a key row from
string output.
Probably we need to think more on this. I don't like to swallow the problem
we can catch in prior, but I don't also like the way we log cryptic message to
end user, and when they come up with the log message we say "Sorry we don't
have enough information from the log.". Cryptic log message might be
acceptable, but at least we should be able to provide the way how to
investigate further.
The next possible way may be... having state store reader and scan all state
store to find null in value? If we are OK with this one, then it might still
make sense to leave this log message.
Less cryptic, or guide to report to Spark community might be better though.
Do we have some way to log an internal error so that community can report it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]