subject:"\[GitHub\] \[flink\] lirui\-apache commented on a change in pull request #12609\: \[FLINK\-17836\]\[hive\]\[doc\] Add document for Hive dim join"

[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join

2020-06-12 Thread GitBox



lirui-apache commented on a change in pull request #12609:
URL: https://github.com/apache/flink/pull/12609#discussion_r439246374



##
File path: docs/dev/table/hive/hive_streaming.md
##
@@ -163,4 +163,33 @@ SELECT * FROM hive_table /*+ 
OPTIONS('streaming-source.enable'='true', 'streamin
 
 ## Hive Table As Temporal Tables
 
-TODO
+Starting from Flink 1.11.0, you can use a Hive table as temporal table and 
join streaming data with it. Please follow
+the [example]({{ site.baseurl 
}}/dev/table/streaming/temporal_tables.html#temporal-table) to find out how to 
join a
+temporal table. When performing the join, the Hive table will be cached in TM 
memory and each record from the stream
+is looked up in the Hive table to decide whether a match is found. You don't 
need any extra settings to use a Hive table
+as temporal table. But optionally, you can configure the TTL of the Hive table 
cache with the following
+property. After the cache expires, the Hive table will be scanned again to 
load the latest data.
+
+
+  
+
+Key
+Default
+Type
+Description
+
+  
+  
+
+lookup.join.cache.ttl
+60 min
+Duration
+The cache TTL (e.g. 10min) for the build table in lookup join. By 
default the TTL is 60 minutes.
+
+  
+
+
+**Note**:
+1. You need to make sure the Hive table can fit into TM memory since the whole 
table will be cached.
+2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll 
probably have performance issue if
+your Hive table needs to be updated and reloaded too frequently.

Review comment:
   I have added some comments about this anyway, but w/o too many details.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join

2020-06-11 Thread GitBox



lirui-apache commented on a change in pull request #12609:
URL: https://github.com/apache/flink/pull/12609#discussion_r439183606



##
File path: docs/dev/table/hive/hive_streaming.md
##
@@ -163,4 +163,33 @@ SELECT * FROM hive_table /*+ 
OPTIONS('streaming-source.enable'='true', 'streamin
 
 ## Hive Table As Temporal Tables
 
-TODO
+Starting from Flink 1.11.0, you can use a Hive table as temporal table and 
join streaming data with it. Please follow
+the [example]({{ site.baseurl 
}}/dev/table/streaming/temporal_tables.html#temporal-table) to find out how to 
join a
+temporal table. When performing the join, the Hive table will be cached in TM 
memory and each record from the stream
+is looked up in the Hive table to decide whether a match is found. You don't 
need any extra settings to use a Hive table
+as temporal table. But optionally, you can configure the TTL of the Hive table 
cache with the following
+property. After the cache expires, the Hive table will be scanned again to 
load the latest data.
+
+
+  
+
+Key
+Default
+Type
+Description
+
+  
+  
+
+lookup.join.cache.ttl
+60 min
+Duration
+The cache TTL (e.g. 10min) for the build table in lookup join. By 
default the TTL is 60 minutes.
+
+  
+
+
+**Note**:
+1. You need to make sure the Hive table can fit into TM memory since the whole 
table will be cached.
+2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll 
probably have performance issue if
+your Hive table needs to be updated and reloaded too frequently.

Review comment:
   I have mentioned that the whole table will be cached. And the temporal 
table can be either partitioned or non-partitioned. It seems to me that talking 
about new/old partitions here might bring more confusions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join

[GitHub] [flink] lirui-apache commented on a change in pull request #12609: [FLINK-17836][hive][doc] Add document for Hive dim join

2 matches

Site Navigation

Mail list logo

Footer information