[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join
lirui-apache commented on a change in pull request #12609: URL: https://github.com/apache/flink/pull/12609#discussion_r439246374 ## File path: docs/dev/table/hive/hive_streaming.md ## @@ -163,4 +163,33 @@ SELECT * FROM hive_table /*+ OPTIONS('streaming-source.enable'='true', 'streamin ## Hive Table As Temporal Tables -TODO +Starting from Flink 1.11.0, you can use a Hive table as temporal table and join streaming data with it. Please follow +the [example]({{ site.baseurl }}/dev/table/streaming/temporal_tables.html#temporal-table) to find out how to join a +temporal table. When performing the join, the Hive table will be cached in TM memory and each record from the stream +is looked up in the Hive table to decide whether a match is found. You don't need any extra settings to use a Hive table +as temporal table. But optionally, you can configure the TTL of the Hive table cache with the following +property. After the cache expires, the Hive table will be scanned again to load the latest data. + + + + +Key +Default +Type +Description + + + + +lookup.join.cache.ttl +60 min +Duration +The cache TTL (e.g. 10min) for the build table in lookup join. By default the TTL is 60 minutes. + + + + +**Note**: +1. You need to make sure the Hive table can fit into TM memory since the whole table will be cached. +2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll probably have performance issue if +your Hive table needs to be updated and reloaded too frequently. Review comment: I have added some comments about this anyway, but w/o too many details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join
lirui-apache commented on a change in pull request #12609: URL: https://github.com/apache/flink/pull/12609#discussion_r439183606 ## File path: docs/dev/table/hive/hive_streaming.md ## @@ -163,4 +163,33 @@ SELECT * FROM hive_table /*+ OPTIONS('streaming-source.enable'='true', 'streamin ## Hive Table As Temporal Tables -TODO +Starting from Flink 1.11.0, you can use a Hive table as temporal table and join streaming data with it. Please follow +the [example]({{ site.baseurl }}/dev/table/streaming/temporal_tables.html#temporal-table) to find out how to join a +temporal table. When performing the join, the Hive table will be cached in TM memory and each record from the stream +is looked up in the Hive table to decide whether a match is found. You don't need any extra settings to use a Hive table +as temporal table. But optionally, you can configure the TTL of the Hive table cache with the following +property. After the cache expires, the Hive table will be scanned again to load the latest data. + + + + +Key +Default +Type +Description + + + + +lookup.join.cache.ttl +60 min +Duration +The cache TTL (e.g. 10min) for the build table in lookup join. By default the TTL is 60 minutes. + + + + +**Note**: +1. You need to make sure the Hive table can fit into TM memory since the whole table will be cached. +2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll probably have performance issue if +your Hive table needs to be updated and reloaded too frequently. Review comment: I have mentioned that the whole table will be cached. And the temporal table can be either partitioned or non-partitioned. It seems to me that talking about new/old partitions here might bring more confusions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org