[GitHub] [incubator-hudi] garyli1019 edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

GitBox Thu, 21 May 2020 11:11:14 -0700


garyli1019 edited a comment on issue #661:
URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632255322



   We have been using HUDI to manage a data lake with 500+TB manufacturing data 
for almost a year now. In the IoT world, late arrival and update is a very 
common scenario and HUDI can handle it perfectly for us.
   We use Impala to query the data. The small file handling with easy 
partitioning feature of HUDI let us build an efficient structure to make the 
query on the fly.
   In addition, the incremental pulling makes the expensive batch jobs like 
aggregating BI dashboards and maintaining a large graph database much more 
efficient with the custom merging feature between the historical data and 
change data.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-hudi] garyli1019 edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

Reply via email to