Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/18191 )
Change subject: IMPALA-9433: Improved caching of HdfsFileHandles ...................................................................... Patch Set 21: > > (8 comments) > > > > Thank you for taking this on. There is a lot of history here. > > Originally, the file handle cache used a generic structure like > > this: > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h > > > > In my rewrite, I switched it to remove the generic structure. > This > > heads back in the other direction. > > > > I like that you have backend tests for the generic data > structure, > > which is definitely one advantage of that approach. One question > I > > have about moving back to a generic structure is whether we would > > be able to add new customization to the file handle cache case. I > > had been thinking about adding a file structure that could > contain > > additional per-file data and/or stats. Is that possible with the > > new generic structure? > > "had been thinking about adding a file structure that could contain > additional per-file data and/or stat" > > I was thinking about similar things (e.g. caching processed > Parquet/ORC headers), but this seems a somewhat different feature > to me - while we want to cache more than one file handle per file > and apply LRU logic per handle, we want to cache data for a file > only once and apply LRU logic per file. Yeah, it's unclear whether we would ever want to extend the file handle cache to deal with other things. Separate data structures may be cleaner even if it means duplicating filename strings or other things. The file handle cache is pretty unusual in structure and historically we haven't extended it. I don't have any strong objection to a generic structure. I just wanted to think through whether there are any extensions that would end up getting more complicated. For more ordinary caches that don't need duplication, we should be using the cache implementations in be/src/util/cache, because that also gets us different cache eviction policies. -- To view, visit http://gerrit.cloudera.org:8080/18191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a Gerrit-Change-Number: 18191 Gerrit-PatchSet: 21 Gerrit-Owner: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 22 Feb 2022 18:46:25 +0000 Gerrit-HasComments: No
