GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/21739
[SPARK-22187][SS] Update unsaferow format for saved state such that we can
set timeouts when state is null
## What changes were proposed in this pull request?
Currently, the group state of user-defined-type is encoded as top-level
columns in the UnsafeRows stores in the state store. The timeout timestamp is
also saved as (when needed) as the last top-level column. Since the group state
is serialized to top-level columns, you cannot save "null" as a value of state
(setting null in all the top-level columns is not equivalent). So we don't let
the user set the timeout without initializing the state for a key. Based on
user experience, this leads to confusion.
This PR is to change the row format such that the state is saved as nested
columns. This would allow the state to be set to null, and avoid these
confusing corner cases. However, queries recovering from existing checkpoint
will use the previous format to maintain compatibility with existing production
queries.
## How was this patch tested?
Refactored existing end-to-end tests and added new tests for explicitly
testing obj-to-row conversion for both state formats.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-22187-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21739.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21739
----
commit ef509c8986dbcc9b37387b0bde56c3d71abb7602
Author: Tathagata Das <tathagata.das1565@...>
Date: 2017-10-05T02:25:22Z
Partial implementation
commit 976a7ea3d5d528e6f1091c696c7f6e865027ee23
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-07-09T11:05:10Z
Fixed and added tests
commit cfc3f68aabeb4e83bfe8131e93e5f0133fba4869
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-07-09T11:19:01Z
Refactored
commit 9525484a444ce231ff366bc556fe5a1d46ac4d4f
Author: Tathagata Das <tathagata.das1565@...>
Date: 2018-07-09T17:38:43Z
Minor refactoring
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]