[GitHub] flink pull request #6255: [FLINK-9681] [table] Make sure difference between ...

2018-07-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/6255


---


[GitHub] flink pull request #6255: [FLINK-9681] [table] Make sure difference between ...

2018-07-05 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/6255#discussion_r200333630
  
--- Diff: 
flink-libraries/flink-table/src/test/scala/org/apache/flink/table/runtime/harness/HarnessTestBase.scala
 ---
@@ -384,4 +386,12 @@ object HarnessTestBase {
   value.row.getField(selectorField).asInstanceOf[T]
 }
   }
+
+  /**
+* Test class used to test min and max retention time.
+*/
+  class StreamQueryConfigTest(min: Time, max: Time) extends 
StreamQueryConfig {
--- End diff --

I would rename the class to `TestStreamQueryConfig` because the `Test` at 
the end suggests that this class is testing something instead of being a util 
for a test.


---


[GitHub] flink pull request #6255: [FLINK-9681] [table] Make sure difference between ...

2018-07-05 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/6255#discussion_r200330683
  
--- Diff: docs/dev/table/streaming.md ---
@@ -591,16 +589,14 @@ qConfig.withIdleStateRetentionTime(Time.hours(12);
 
 val qConfig: StreamQueryConfig = ???
 
-// set idle state retention time: min = 12 hour, max = 16 hours
-qConfig.withIdleStateRetentionTime(Time.hours(12), Time.hours(16))
-// set idle state retention time. min = max = 12 hours
-qConfig.withIdleStateRetentionTime(Time.hours(12)
+// set idle state retention time: min = 12 hour, max = 24 hours
+qConfig.withIdleStateRetentionTime(Time.hours(12), Time.hours(24))
 
 {% endhighlight %}
 
 
 
-Configuring different minimum and maximum idle state retention times is 
more efficient because it reduces the internal book-keeping of a query for when 
to remove state.
+Configuring different minimum and maximum idle state retention times is 
more efficient because it reduces the internal book-keeping of a query for when 
to remove state. Difference between minTime and maxTime shoud be at least 5 
minutes.
--- End diff --

The "... more efficient ..." does not apply anymore. Maybe rephrase to 

> Cleaning up state requires additional bookkeeping which becomes less 
expensive for larger differences of `minTime` and `maxTime`. The difference 
between `minTime` and `maxTime` must be at least 5 minutes.




---


[GitHub] flink pull request #6255: [FLINK-9681] [table] Make sure difference between ...

2018-07-04 Thread hequn8128
GitHub user hequn8128 opened a pull request:

https://github.com/apache/flink/pull/6255

[FLINK-9681] [table] Make sure difference between minRetentionTime and 
maxRetentionTime at least 5 minutes


## What is the purpose of the change

This PR aims to make sure difference between minRetentionTime and 
maxRetentionTime at least 5 minutes. Currently, for a group by(or other 
operators), if minRetentionTime equals to maxRetentionTime, the group by 
operator will register a timer for each record coming at different time which 
cause performance problem. The reasoning for having two parameters is that we 
can avoid to register many timers if we have more freedom when to discard 
state. As min equals to max cause performance problem it is better to make sure 
these two parameters are not same.


## Brief change log

  - Throw exception when difference between minRetentionTime and 
maxRetentionTime smaller than  5 minutes
  - Add a `QueryConfigTest` class extends from StreamQueryConfig. 
`QueryConfigTest` don't have the 5min limitation which makes test more 
convenient.
  - Adapt tests.


## Verifying this change

This change added tests and can be verified as follows:
  - Added test that validates that min and max retention time are set 
correctly.

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): (no)
  - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
  - The serializers: (no)
  - The runtime per-record code paths (performance sensitive): (no)
  - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  - The S3 file system connector: (no)

## Documentation

  - Does this pull request introduce a new feature? (no)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hequn8128/flink retentionTime

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/6255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6255


commit b41df60dde422bce9317479c8d77304d6ee4857b
Author: hequn8128 
Date:   2018-07-04T14:11:38Z

[FLINK-9681] [table] Make sure difference between minRetentionTime and 
maxRetentionTime at least 5 minutes




---