Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18191 )
Change subject: IMPALA-9433: Improved caching of HdfsFileHandles ...................................................................... Patch Set 21: (12 comments) > > > (8 comments) > > > > > > Thank you for taking this on. There is a lot of history here. > > > Originally, the file handle cache used a generic structure like > > > this: > > > https://github.com/apache/impala/blob/branch-2.10.0/be/src/util/lru-cache.h > > > > > > In my rewrite, I switched it to remove the generic structure. > > This > > > heads back in the other direction. > > > > > > I like that you have backend tests for the generic data > > structure, > > > which is definitely one advantage of that approach. One > question > > I > > > have about moving back to a generic structure is whether we > would > > > be able to add new customization to the file handle cache case. > I > > > had been thinking about adding a file structure that could > > contain > > > additional per-file data and/or stats. Is that possible with > the > > > new generic structure? > > > > "had been thinking about adding a file structure that could > contain > > additional per-file data and/or stat" > > > > I was thinking about similar things (e.g. caching processed > > Parquet/ORC headers), but this seems a somewhat different feature > > to me - while we want to cache more than one file handle per file > > and apply LRU logic per handle, we want to cache data for a file > > only once and apply LRU logic per file. > > Yeah, it's unclear whether we would ever want to extend the file > handle cache > to deal with other things. Separate data structures may be cleaner > even > if it means duplicating filename strings or other things. The file > handle > cache is pretty unusual in structure and historically we haven't > extended > it. I don't have any strong objection to a generic structure. I > just wanted > to think through whether there are any extensions that would end up > getting more complicated. > > For more ordinary caches that don't need duplication, we should be > using the > cache implementations in be/src/util/cache, because that also gets > us different > cache eviction policies. Moving the caching to a separate class definitely helps with testing. I think making it generic helps with encapsulation too, as the caching algorithm (even if it is somewhat specialized to a use case) has nothing to do with the stored data. Moreover, making it generic helped with unit testing too, as I didn't have to juggle around file handles to test out the caching feature. As I understand, the requirements were simple enough to fit it in a generic structure. This would help in the future to decide - if we ever want to use something similar to this - whether if it's easier to extend with some configurability or not. We can specialize this more towards file handle caching, although the arguments for templated key/value are still applicable in that case, maybe we can change the name and place of the code for something less generic. Regarding to per-file data/stats, we could squeeze it in (e.g. unordered_map<string, pair<Data, list<Handler>> but I don't think that is a good direction, it breaks the encapsulation of the cache. I would rather put a container in FileHandleCache class next to the cache and do handle based operations/metrics during creation (GetFileHandle() new entry branch) and do access based operation/metrics in FileHandleCache::Accessor http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h File be/src/runtime/io/handle-cache.h: http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.h@118 PS21, Line 118: return file_handle.mtime() == mtime; > The semantics of mtime is unclear to me - why do we handle it separately a Very good point, I think I got stuck on earlier design :) http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h File be/src/runtime/io/handle-cache.inline.h: http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167 PS21, Line 167: // Opening a file handle requires talking to the NameNode so it can take some time. : RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_)); > Let me double-check my understanding of the threading here: Yes, it will be in_use, does not lock the cache and not available for other threads. http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@167 PS21, Line 167: // Opening a file handle requires talking to the NameNode so it can take some time. : RETURN_IF_ERROR(accessor_tmp.Get()->Init(hdfs_monitor_)); > Another question: How does this work if this call fails? Does the entry get Very good point, now it will be released back to the cache, I will add a destroy http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/runtime/io/handle-cache.inline.h@170 PS21, Line 170: it/out > Nit: in/out Done http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h File be/src/util/lru-multi-cache.h: http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@78 PS21, Line 78: /// this is a intrusive (as known as not owning) list, used for eviction > nit: an Done http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@125 PS21, Line 125: resf ot > Nit: rest of Done http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.h@170 PS21, Line 170: Container_internal& container; : typename Container_internal::iterator it; > Did we consider using an intrusive list for this? Something has to own the objects. The point of unordered_map<string, list> was to spare the duplication of stored keys compared to a multimap. Other consideration would be to have a contiguous container (e.g. vector) to own them and add an intrusive list to the map too. This could increase the cache locality (which in my opinion is not as useful as we already use unordered_map, we could switch to a better hash table, but seems a bit overkill for this task), but then eviction would be more complicated (we would need to add some kind of buffer to the size and garbage collection mechanism, as removing one by one would be even worse than using and owning list) http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h File be/src/util/lru-multi-cache.inline.h: http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@147 PS21, Line 147: // Move the object to the back of the owning list as it is no longer available : container.splice(container.end(), container, container_it); > Why move it to the back? It could stay where it is and Get() would skip ove No other reason, just what you mentioned. With moving the mtime to the key, I could remove the even the loop too and only check the first element, making Get() truly O(1), but I agree it's not significant speedup, I can keep the list order as it is http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@197 PS21, Line 197: p_value_internal->timestamp_seconds = MonotonicSeconds() > The old behavior does reset the timestamp each time we use it. The code is We still have that behavior in EvictOlderThan(). I just realized I reversed the order of the LRU list (i.e. now it walks from the back), that might be the cause of some confusion http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@197 PS21, Line 197: p_value_internal->timestamp_seconds = MonotonicSeconds() > I think that we didn't refresh timestamp_second in the old version, so soon Yes, it did in emplace (but we don't pass it as a constructor parameter) http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@201 PS21, Line 201: // Move the object to the front, keep LRU relation in owning list too to : // be able to age out unused objects : container.splice(container.begin(), container, p_value_internal->it); > Do we need to be moving entries? I'm not sure this gets us anything. I agree that moving during get is not necessary, but I think it is useful during release and helps with aging out. E.g. we have 3 objects in use, and release one every second in the order of the list. Without change, the age of the objects will be reversed (i.e. #1 will be the oldest, #3 will be the most recently released), but the Get() will still pick up #1. If we fix the order in the owning list to, a new call can pick up #3 and let #1 age out earlier http://gerrit.cloudera.org:8080/#/c/18191/21/be/src/util/lru-multi-cache.inline.h@222 PS21, Line 222: if (container.size() == 1) { : // Last element, owning list can be removed to prevent aging : hash_table_.erase(p_value_internal->key); : } else { : // Remove from owning list : container.erase(p_value_internal->it); : } > When/how does the element get removed from the LRU list? (I'm guessing it h At this point, the element is in use, not in the LRU list. It gets unlinked at line 151 in Get() auto unlink allows that interface, this is why I use doubly linked list too, for O(1) unlink -- To view, visit http://gerrit.cloudera.org:8080/18191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6b5c5e9e2b5db2847ab88c41f667c9ca1b03d51a Gerrit-Change-Number: 18191 Gerrit-PatchSet: 21 Gerrit-Owner: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 23 Feb 2022 09:29:47 +0000 Gerrit-HasComments: Yes
