HeartSaVioR commented on issue #24890: [SPARK-28074][DOC][SS] Document caveats 
on using multiple stateful operations in single query
URL: https://github.com/apache/spark/pull/24890#issuecomment-504914060
 
 
   Now I'm trying to narrow down the issue...
   
   As we've figured out global watermark is the root cause (to discard 
intermediate rows or evict rows from state incorrectly), the condition which 
**might** bring correctness issue is, 1) having multiple stateful operators 2) 
more than one stateful operators have watermark to discard input rows/evict 
states.
   
   Note that the condition is applied to "Append mode". For "Update mode" and 
"Complete mode", the behavior on multiple stateful operators are not defined 
properly - most of cases it needs retraction to correct the outputs for given 
key which is not supported, hence we could just define the condition as having 
multiple stateful operators.
   
   Please also note that not all the cases would produce incorrect outputs. 
That makes hard to say what is the best approach from Spark side to avoid the 
correctness issue. Maybe we have some options here:
   
   1) Define the case as "unsupported" and throw error on the query which meets 
the condition
   1-A) Add config to unlock the case if end users are 100% sure their query is 
safe.
   2) Allow the case, but log warning message to let end users notice that 
their query might not be safe.
   3) Just document caveats and don't restrict or notice in runtime.
   
   Would like to hear your opinions about these options.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to