Re: [DISCUSS] Move from Iterable to List in Cleaners
Thanks for the suggestion Duo. I've tested it and the improvement was similar when the only SnapshotFileCache was changed to List from Iterable. I've updated the PR accordingly. Thanks, Peter On Sat, Jan 28, 2023 at 5:07 AM 张铎(Duo Zhang) wrote: > After looking at the code, I think the direct reason is what Wei-Chiu > Chuang has already pointed out on the PR, Iterables.filter is lazy > evaluation, so when we iterate over the files in > SnapshotFileCache.getUnreferencedFiles, we will actually trigger all > the filter code, where we are under the taking snapshot write lock > protection. > > I think the root problem here is the write lock , as the XXX comment > says, it is inefficient. What we want to prevent is taking snapshots > at the same time, but we should allow accessing the cache at the same > time. Of course it is not easy, otherwise we should have it in place > already. > > And on the fix here, I do not think we need to change all the Iterable > to List in our code, a simple one line fix is enough.In > SnapshotHFileCleaner.getDeletableFiles, convert files to List and then > use the List to call SnapshotFileCache.getUnreferencedFiles. I think > this could fix the problem here. > > Thanks. > > In > > Tak Lon (Stephen) Wu 于2023年1月27日周五 01:55写道: > > > > +1 for this change > > > > I tested the part on S3, it is as fast as peter mentioned and cuts > > down a lot of chore time if a user is actively using a snapshot and > > the archive is growing very fast. > > > > -Stephen > > > > On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi > > wrote: > > > > > > Hi, > > > > > > I was doing performance testing of the CleanerChores using S3 root > > > directory with HBase 2.4. When the archive directory was large the > HFile > > > cleaner threads were blocked on acquiring the SnapshotWriteLock. The > flame > > > graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles > there > > > were a lot of listing operations to S3. The lock was held for 45 > seconds > > > with a directory containing 1000 files. > > > I've also seen one case when a snapshot creation failed (timed out). I > > > assume that can also be related to the long locking. > > > > > > The locking time can be drastically reduced by changing the > > > Iterable parameter to List for the > > > FileCleanerDelegate.getDeletableFiles. With this change, the lock was > > > released in approximately 100ms, however, the overall time for file > > > listing, evaluation, and deletion took the same time (45sec). Since the > > > different cleaner threads are not blocked on the snapshot lock the > overall > > > deletion speed can be increased significantly. > > > CleanerChore.checkAndDeleteFiles already receives a List > > > parameter, and later converts it to Iterable so I don't > expect > > > a drastic change in the Cleaners' memory usage. > > > > > > I've done some testing with HDFS as well. > > > Cleanup for 100k files took 63518ms with Iterable implementation, the > lock > > > was held for ~130ms for every 1000 files. > > > Cleanup for 100k files took 64411ms with List implementation, the lock > was > > > held for ~2ms for every 1000 files. > > > > > > Do you have any concerns about making this change? > > > > > > Thanks, > > > Peter > > > > > > [1] https://issues.apache.org/jira/browse/HBASE-27590 > > > [2] https://github.com/apache/hbase/pull/4995 >
Re: [DISCUSS] Move from Iterable to List in Cleaners
After looking at the code, I think the direct reason is what Wei-Chiu Chuang has already pointed out on the PR, Iterables.filter is lazy evaluation, so when we iterate over the files in SnapshotFileCache.getUnreferencedFiles, we will actually trigger all the filter code, where we are under the taking snapshot write lock protection. I think the root problem here is the write lock , as the XXX comment says, it is inefficient. What we want to prevent is taking snapshots at the same time, but we should allow accessing the cache at the same time. Of course it is not easy, otherwise we should have it in place already. And on the fix here, I do not think we need to change all the Iterable to List in our code, a simple one line fix is enough.In SnapshotHFileCleaner.getDeletableFiles, convert files to List and then use the List to call SnapshotFileCache.getUnreferencedFiles. I think this could fix the problem here. Thanks. In Tak Lon (Stephen) Wu 于2023年1月27日周五 01:55写道: > > +1 for this change > > I tested the part on S3, it is as fast as peter mentioned and cuts > down a lot of chore time if a user is actively using a snapshot and > the archive is growing very fast. > > -Stephen > > On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi > wrote: > > > > Hi, > > > > I was doing performance testing of the CleanerChores using S3 root > > directory with HBase 2.4. When the archive directory was large the HFile > > cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame > > graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there > > were a lot of listing operations to S3. The lock was held for 45 seconds > > with a directory containing 1000 files. > > I've also seen one case when a snapshot creation failed (timed out). I > > assume that can also be related to the long locking. > > > > The locking time can be drastically reduced by changing the > > Iterable parameter to List for the > > FileCleanerDelegate.getDeletableFiles. With this change, the lock was > > released in approximately 100ms, however, the overall time for file > > listing, evaluation, and deletion took the same time (45sec). Since the > > different cleaner threads are not blocked on the snapshot lock the overall > > deletion speed can be increased significantly. > > CleanerChore.checkAndDeleteFiles already receives a List > > parameter, and later converts it to Iterable so I don't expect > > a drastic change in the Cleaners' memory usage. > > > > I've done some testing with HDFS as well. > > Cleanup for 100k files took 63518ms with Iterable implementation, the lock > > was held for ~130ms for every 1000 files. > > Cleanup for 100k files took 64411ms with List implementation, the lock was > > held for ~2ms for every 1000 files. > > > > Do you have any concerns about making this change? > > > > Thanks, > > Peter > > > > [1] https://issues.apache.org/jira/browse/HBASE-27590 > > [2] https://github.com/apache/hbase/pull/4995
Re: [DISCUSS] Move from Iterable to List in Cleaners
+1 for this change I tested the part on S3, it is as fast as peter mentioned and cuts down a lot of chore time if a user is actively using a snapshot and the archive is growing very fast. -Stephen On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi wrote: > > Hi, > > I was doing performance testing of the CleanerChores using S3 root > directory with HBase 2.4. When the archive directory was large the HFile > cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame > graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there > were a lot of listing operations to S3. The lock was held for 45 seconds > with a directory containing 1000 files. > I've also seen one case when a snapshot creation failed (timed out). I > assume that can also be related to the long locking. > > The locking time can be drastically reduced by changing the > Iterable parameter to List for the > FileCleanerDelegate.getDeletableFiles. With this change, the lock was > released in approximately 100ms, however, the overall time for file > listing, evaluation, and deletion took the same time (45sec). Since the > different cleaner threads are not blocked on the snapshot lock the overall > deletion speed can be increased significantly. > CleanerChore.checkAndDeleteFiles already receives a List > parameter, and later converts it to Iterable so I don't expect > a drastic change in the Cleaners' memory usage. > > I've done some testing with HDFS as well. > Cleanup for 100k files took 63518ms with Iterable implementation, the lock > was held for ~130ms for every 1000 files. > Cleanup for 100k files took 64411ms with List implementation, the lock was > held for ~2ms for every 1000 files. > > Do you have any concerns about making this change? > > Thanks, > Peter > > [1] https://issues.apache.org/jira/browse/HBASE-27590 > [2] https://github.com/apache/hbase/pull/4995
[DISCUSS] Move from Iterable to List in Cleaners
Hi, I was doing performance testing of the CleanerChores using S3 root directory with HBase 2.4. When the archive directory was large the HFile cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there were a lot of listing operations to S3. The lock was held for 45 seconds with a directory containing 1000 files. I've also seen one case when a snapshot creation failed (timed out). I assume that can also be related to the long locking. The locking time can be drastically reduced by changing the Iterable parameter to List for the FileCleanerDelegate.getDeletableFiles. With this change, the lock was released in approximately 100ms, however, the overall time for file listing, evaluation, and deletion took the same time (45sec). Since the different cleaner threads are not blocked on the snapshot lock the overall deletion speed can be increased significantly. CleanerChore.checkAndDeleteFiles already receives a List parameter, and later converts it to Iterable so I don't expect a drastic change in the Cleaners' memory usage. I've done some testing with HDFS as well. Cleanup for 100k files took 63518ms with Iterable implementation, the lock was held for ~130ms for every 1000 files. Cleanup for 100k files took 64411ms with List implementation, the lock was held for ~2ms for every 1000 files. Do you have any concerns about making this change? Thanks, Peter [1] https://issues.apache.org/jira/browse/HBASE-27590 [2] https://github.com/apache/hbase/pull/4995
