garyli1019 edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632255322
We have been using HUDI to manage a data lake with 500+TB manufacturing data for almost a year now. In the IoT world, late arrival and update is a very common scenario and HUDI can handle it perfectly for us. We use Impala to query the data. The small file handling with easy partitioning feature of HUDI let us build an efficient structure to make the query on the fly. In addition, the incremental pulling makes the expensive batch jobs like aggregating BI dashboards and maintaining a large graph database much more efficient with the custom merging feature between the historical data and change data. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org