[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534073#comment-16534073 ] ASF GitHub Bot commented on FLINK-9681: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/6255 > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533846#comment-16533846 ] ASF GitHub Bot commented on FLINK-9681: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/6255 Thanks for the update @hequn8128. I'll merge this > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533794#comment-16533794 ] ASF GitHub Bot commented on FLINK-9681: --- Github user hequn8128 commented on the issue: https://github.com/apache/flink/pull/6255 @fhueske Comments have been addressed. Thanks for your review. > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533618#comment-16533618 ] ASF GitHub Bot commented on FLINK-9681: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/6255#discussion_r200330683 --- Diff: docs/dev/table/streaming.md --- @@ -591,16 +589,14 @@ qConfig.withIdleStateRetentionTime(Time.hours(12); val qConfig: StreamQueryConfig = ??? -// set idle state retention time: min = 12 hour, max = 16 hours -qConfig.withIdleStateRetentionTime(Time.hours(12), Time.hours(16)) -// set idle state retention time. min = max = 12 hours -qConfig.withIdleStateRetentionTime(Time.hours(12) +// set idle state retention time: min = 12 hour, max = 24 hours +qConfig.withIdleStateRetentionTime(Time.hours(12), Time.hours(24)) {% endhighlight %} -Configuring different minimum and maximum idle state retention times is more efficient because it reduces the internal book-keeping of a query for when to remove state. +Configuring different minimum and maximum idle state retention times is more efficient because it reduces the internal book-keeping of a query for when to remove state. Difference between minTime and maxTime shoud be at least 5 minutes. --- End diff -- The "... more efficient ..." does not apply anymore. Maybe rephrase to > Cleaning up state requires additional bookkeeping which becomes less expensive for larger differences of `minTime` and `maxTime`. The difference between `minTime` and `maxTime` must be at least 5 minutes. > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533619#comment-16533619 ] ASF GitHub Bot commented on FLINK-9681: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/6255#discussion_r200333630 --- Diff: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/harness/HarnessTestBase.scala --- @@ -384,4 +386,12 @@ object HarnessTestBase { value.row.getField(selectorField).asInstanceOf[T] } } + + /** +* Test class used to test min and max retention time. +*/ + class StreamQueryConfigTest(min: Time, max: Time) extends StreamQueryConfig { --- End diff -- I would rename the class to `TestStreamQueryConfig` because the `Test` at the end suggests that this class is testing something instead of being a util for a test. > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532876#comment-16532876 ] ASF GitHub Bot commented on FLINK-9681: --- GitHub user hequn8128 opened a pull request: https://github.com/apache/flink/pull/6255 [FLINK-9681] [table] Make sure difference between minRetentionTime and maxRetentionTime at least 5 minutes ## What is the purpose of the change This PR aims to make sure difference between minRetentionTime and maxRetentionTime at least 5 minutes. Currently, for a group by(or other operators), if minRetentionTime equals to maxRetentionTime, the group by operator will register a timer for each record coming at different time which cause performance problem. The reasoning for having two parameters is that we can avoid to register many timers if we have more freedom when to discard state. As min equals to max cause performance problem it is better to make sure these two parameters are not same. ## Brief change log - Throw exception when difference between minRetentionTime and maxRetentionTime smaller than 5 minutes - Add a `QueryConfigTest` class extends from StreamQueryConfig. `QueryConfigTest` don't have the 5min limitation which makes test more convenient. - Adapt tests. ## Verifying this change This change added tests and can be verified as follows: - Added test that validates that min and max retention time are set correctly. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) You can merge this pull request into a Git repository by running: $ git pull https://github.com/hequn8128/flink retentionTime Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6255 commit b41df60dde422bce9317479c8d77304d6ee4857b Author: hequn8128 Date: 2018-07-04T14:11:38Z [FLINK-9681] [table] Make sure difference between minRetentionTime and maxRetentionTime at least 5 minutes > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > Labels: pull-request-available > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16531322#comment-16531322 ] Hequn Cheng commented on FLINK-9681: Hi [~fhueske], thanks for your reply and suggestions. I will create a pull request laster. > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9681) Make sure minRetentionTime not equal to maxRetentionTime
[ https://issues.apache.org/jira/browse/FLINK-9681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529850#comment-16529850 ] Fabian Hueske commented on FLINK-9681: -- Hi [~hequn8128], You are right. Setting both parameters to the same value results in very bad performance. I think we should throw an exception when setting the values in the query configuration and point out that there should be a significant difference between min and max (at least 1 minute?) In the long term we can migrate state cleanup to the currently developed state TTL and might get away from min and max. > Make sure minRetentionTime not equal to maxRetentionTime > > > Key: FLINK-9681 > URL: https://issues.apache.org/jira/browse/FLINK-9681 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > > Currently, for a group by(or other operators), if minRetentionTime equals to > maxRetentionTime, the group by operator will register a timer for each record > coming at different time which cause performance problem. The reasoning for > having two parameters is that we can avoid to register many timers if we have > more freedom when to discard state. As min equals to max cause performance > problem it is better to make sure these two parameters are not same. > Any suggestions are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005)