Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )

Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
......................................................................


Patch Set 21:

> > (8 comments)
 > >
 > > Thank you for taking this on. There is a lot of history here.
 > > Originally, the file handle cache used a generic structure like
 > > this:
 > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h
 > >
 > > In my rewrite, I switched it to remove the generic structure.
 > This
 > > heads back in the other direction.
 > >
 > > I like that you have backend tests for the generic data
 > structure,
 > > which is definitely one advantage of that approach. One question
 > I
 > > have about moving back to a generic structure is whether we would
 > > be able to add new customization to the file handle cache case. I
 > > had been thinking about adding a file structure that could
 > contain
 > > additional per-file data and/or stats. Is that possible with the
 > > new generic structure?
 >
 > "had been thinking about adding a file structure that could contain
 > additional per-file data and/or stat"
 >
 > I was thinking about similar things (e.g. caching processed
 > Parquet/ORC headers), but this seems a somewhat different feature
 > to me - while we want to cache more than one file handle per file
 > and apply LRU logic per handle, we want to cache data for a file
 > only once and apply LRU logic per file.

Yeah, it's unclear whether we would ever want to extend the file handle cache
to deal with other things. Separate data structures may be cleaner even
if it means duplicating filename strings or other things. The file handle
cache is pretty unusual in structure and historically we haven't extended
it. I don't have any strong objection to a generic structure. I just wanted
to think through whether there are any extensions that would end up
getting more complicated.

For more ordinary caches that don't need duplication, we should be using the
cache implementations in be/src/util/cache, because that also gets us different
cache eviction policies.


--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 21
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 22 Feb 2022 18:46:25 +0000
Gerrit-HasComments: No

Reply via email to