[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

Code Review Wed, 23 Feb 2022 01:34:01 -0800

Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18191 )


Change subject: IMPALA-9433: Improved caching of HdfsFileHandles
......................................................................


Patch Set 21:

(12 comments)

> > > (8 comments)
 > > >
 > > > Thank you for taking this on. There is a lot of history here.
 > > > Originally, the file handle cache used a generic structure like
 > > > this:
 > > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h
 > > >
 > > > In my rewrite, I switched it to remove the generic structure.
 > > This
 > > > heads back in the other direction.
 > > >
 > > > I like that you have backend tests for the generic data
 > > structure,
 > > > which is definitely one advantage of that approach. One
 > question
 > > I
 > > > have about moving back to a generic structure is whether we
 > would
 > > > be able to add new customization to the file handle cache case.
 > I
 > > > had been thinking about adding a file structure that could
 > > contain
 > > > additional per-file data and/or stats. Is that possible with
 > the
 > > > new generic structure?
 > >
 > > "had been thinking about adding a file structure that could
 > contain
 > > additional per-file data and/or stat"
 > >
 > > I was thinking about similar things (e.g. caching processed
 > > Parquet/ORC headers), but this seems a somewhat different feature
 > > to me - while we want to cache more than one file handle per file
 > > and apply LRU logic per handle, we want to cache data for a file
 > > only once and apply LRU logic per file.
 >
 > Yeah, it's unclear whether we would ever want to extend the file
 > handle cache
 > to deal with other things. Separate data structures may be cleaner
 > even
 > if it means duplicating filename strings or other things. The file
 > handle
 > cache is pretty unusual in structure and historically we haven't
 > extended
 > it. I don't have any strong objection to a generic structure. I
 > just wanted
 > to think through whether there are any extensions that would end up
 > getting more complicated.
 >
 > For more ordinary caches that don't need duplication, we should be
 > using the
 > cache implementations in be/src/util/cache, because that also gets
 > us different
 > cache eviction policies.

Moving the caching to a separate class definitely helps with testing.

I think making it generic helps with encapsulation too, as the caching 
algorithm (even if it is somewhat specialized to a use case) has nothing to do 
with the stored data. Moreover, making it generic helped with unit testing too, 
as I didn't have to juggle around file handles to test out the caching feature.

As I understand, the requirements were simple enough to fit it in a generic 
structure. This would help in the future to decide - if we ever want to use 
something similar to this - whether if it's easier to extend with some 
configurability or not. We can specialize this more towards file handle 
caching, although the arguments for templated key/value are still applicable in 
that case, maybe we can change the name and place of the code for something 
less generic.

Regarding to per-file data/stats, we could squeeze it in (e.g. 
unordered_map<string, pair<Data, list<Handler>> but I don't think that is a 
good direction, it breaks the encapsulation of the cache. I would rather put a 
container in FileHandleCache class next to the cache and do handle based 
operations/metrics during creation (GetFileHandle() new entry branch) and do 
access based operation/metrics in FileHandleCache::Accessor

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h
File be/src/runtime/io/handle-cache.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h@118
PS21, Line 118:         return file_handle.mtime() == mtime;
> The semantics of mtime  is unclear to me - why do we handle it separately a
Very good point, I think I got stuck on earlier design :)


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h
File be/src/runtime/io/handle-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167
PS21, Line 167:   // Opening a file handle requires talking to the NameNode so 
it can take some time.
              :   RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_));
> Let me double-check my understanding of the threading here:
Yes, it will be in_use, does not lock the cache and not available for other 
threads.


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167
PS21, Line 167:   // Opening a file handle requires talking to the NameNode so 
it can take some time.
              :   RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_));
> Another question: How does this work if this call fails? Does the entry get
Very good point, now it will be released back to the cache, I will add a destroy


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@170
PS21, Line 170: it/out
> Nit: in/out
Done


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h
File be/src/util/lru-multi-cache.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@78
PS21, Line 78: ///     this is a intrusive (as known as not owning) list, used 
for eviction
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@125
PS21, Line 125: resf ot
> Nit: rest of
Done


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@170
PS21, Line 170:     Container_internal& container;
              :     typename Container_internal::iterator it;
> Did we consider using an intrusive list for this?
Something has to own the objects. The point of unordered_map<string, list> was 
to spare the duplication of stored keys compared to a multimap.

Other consideration would be to have a contiguous container (e.g. vector) to 
own them and add an intrusive list to the map too. This could increase the 
cache locality (which in my opinion is not as useful as we already use 
unordered_map, we could switch to a better hash table, but seems a bit overkill 
for this task), but then eviction would be more complicated (we would need to 
add some kind of buffer to the size and garbage collection mechanism, as 
removing one by one would be even worse than using and owning list)


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h
File be/src/util/lru-multi-cache.inline.h:

http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@147
PS21, Line 147:       // Move the object to the back of the owning list as it 
is no longer available
              :       container.splice(container.end(), container, 
container_it);
> Why move it to the back? It could stay where it is and Get() would skip ove
No other reason, just what you mentioned. With moving the mtime to the key, I 
could remove the even the loop too and only check the first element, making 
Get() truly O(1), but I agree it's not significant speedup, I can keep the list 
order as it is


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@197
PS21, Line 197:   p_value_internal->timestamp_seconds = MonotonicSeconds()
> The old behavior does reset the timestamp each time we use it. The code is
We still have that behavior in EvictOlderThan(). I just realized I reversed the 
order of the LRU list (i.e. now it walks from the back), that might be the 
cause of some confusion


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@197
PS21, Line 197:   p_value_internal->timestamp_seconds = MonotonicSeconds()
> I think that we didn't refresh timestamp_second in the old version, so soon
Yes, it did in emplace (but we don't pass it as a constructor parameter)


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@201
PS21, Line 201:   // Move the object to the front, keep LRU relation in owning 
list too to
              :   // be able to age out unused objects
              :   container.splice(container.begin(), container, 
p_value_internal->it);
> Do we need to be moving entries? I'm not sure this gets us anything.
I agree that moving during get is not necessary, but I think it is useful 
during release and helps with aging out. E.g. we have 3 objects in use, and 
release one every second in the order of the list. Without change, the age of 
the objects will be reversed (i.e. #1 will be the oldest, #3 will be the most 
recently released), but the Get() will still pick up #1. If we fix the order in 
the owning list to, a new call can pick up #3 and let #1 age out earlier


http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@222
PS21, Line 222:   if (container.size() == 1) {
              :     // Last element, owning list can be removed to prevent aging
              :     hash_table_.erase(p_value_internal->key);
              :   } else {
              :     // Remove from owning list
              :     container.erase(p_value_internal->it);
              :   }
> When/how does the element get removed from the LRU list? (I'm guessing it h
At this point, the element is in use, not in the LRU list. It gets unlinked at 
line 151 in Get()

auto unlink allows that interface, this is why I use doubly linked list too, 
for O(1) unlink



--
To view, visit http://gerrit.cloudera.org:8080/18191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a
Gerrit-Change-Number: 18191
Gerrit-PatchSet: 21
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 23 Feb 2022 09:29:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9433: Improved caching of HdfsFileHandles

Reply via email to