GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21733
[SPARK-24763][SS] Remove redundant key data from value in streaming
aggregation
* add option to configure enabling new feature: remove redundant key data
from value
* modify code to respect new option (turning on/off feature)
* modify tests to run tests with both on/off
* Add guard in OffsetSeqMetadata to prevent modifying option after
executing query
## What changes were proposed in this pull request?
This patch proposes a new flag option for stateful aggregation: remove
redundant key data from value.
Enabling new option runs similar with current, and uses less memory for
state according to key/value fields of state operator.
Please refer below link to see detailed perf. test result:
https://issues.apache.org/jira/browse/SPARK-24763?focusedCommentId=16536539&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16536539
Since the state between enabling the option and disabling the option is not
compatible, the option is set to 'disable' by default (to ensure backward
compatibility), and OffsetSeqMetadata would prevent modifying the option after
executing query.
## How was this patch tested?
Modify unit tests to cover both disabling option and enabling option.
Also did manual tests to see whether propose patch improves state memory
usage.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/spark SPARK-24763
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21733.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21733
----
commit 2a9cc496bb7f832b75b0090ef9a612f4fbc0f206
Author: Jungtaek Lim <kabhwan@...>
Date: 2018-07-08T09:37:12Z
[SPARK-24763][SS] Remove redundant key data from value in streaming
aggregation
* add option to configure enabling new feature: remove redundant key data
from value
* modify code to respect new option (turning on/off feature)
* modify tests to run tests with both on/off
* Add guard in OffsetSeqMetadata to prevent modifying option after
executing query
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]