Re: [DISCUSS] Move from Iterable to List in Cleaners

2023-02-07 Thread Peter Somogyi
Thanks for the suggestion Duo.
I've tested it and the improvement was similar when the only
SnapshotFileCache was changed to List from Iterable.

I've updated the PR accordingly.

Thanks,
Peter

On Sat, Jan 28, 2023 at 5:07 AM 张铎(Duo Zhang)  wrote:

> After looking at the code, I think the direct reason is what Wei-Chiu
> Chuang has already pointed out on the PR, Iterables.filter is lazy
> evaluation, so when we iterate over the files in
> SnapshotFileCache.getUnreferencedFiles, we will actually trigger all
> the filter code, where we are under the taking snapshot write lock
> protection.
>
> I think the root problem here is the write lock , as the XXX comment
> says, it is inefficient. What we want to prevent is taking snapshots
> at the same time, but we should allow accessing the cache at the same
> time. Of course it is not easy, otherwise we should have it in place
> already.
>
> And on the fix here, I do not think we need to change all the Iterable
> to List in our code, a simple one line fix is enough.In
> SnapshotHFileCleaner.getDeletableFiles, convert files to List and then
> use the List to call SnapshotFileCache.getUnreferencedFiles. I think
> this could fix the problem here.
>
> Thanks.
>
> In
>
> Tak Lon (Stephen) Wu  于2023年1月27日周五 01:55写道:
> >
> > +1 for this change
> >
> > I tested the part on S3, it is as fast as peter mentioned and cuts
> > down a lot of chore time if a user is actively using a snapshot and
> > the archive is growing very fast.
> >
> > -Stephen
> >
> > On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi
> >  wrote:
> > >
> > > Hi,
> > >
> > > I was doing performance testing of the CleanerChores using S3 root
> > > directory with HBase 2.4. When the archive directory was large the
> HFile
> > > cleaner threads were blocked on acquiring the SnapshotWriteLock. The
> flame
> > > graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles
> there
> > > were a lot of listing operations to S3. The lock was held for 45
> seconds
> > > with a directory containing 1000 files.
> > > I've also seen one case when a snapshot creation failed (timed out). I
> > > assume that can also be related to the long locking.
> > >
> > > The locking time can be drastically reduced by changing the
> > > Iterable parameter to List for the
> > > FileCleanerDelegate.getDeletableFiles. With this change, the lock was
> > > released in approximately 100ms, however, the overall time for file
> > > listing, evaluation, and deletion took the same time (45sec). Since the
> > > different cleaner threads are not blocked on the snapshot lock the
> overall
> > > deletion speed can be increased significantly.
> > > CleanerChore.checkAndDeleteFiles already receives a List
> > > parameter, and later converts it to Iterable so I don't
> expect
> > > a drastic change in the Cleaners' memory usage.
> > >
> > > I've done some testing with HDFS as well.
> > > Cleanup for 100k files took 63518ms with Iterable implementation, the
> lock
> > > was held for ~130ms for every 1000 files.
> > > Cleanup for 100k files took 64411ms with List implementation, the lock
> was
> > > held for ~2ms for every 1000 files.
> > >
> > > Do you have any concerns about making this change?
> > >
> > > Thanks,
> > > Peter
> > >
> > > [1] https://issues.apache.org/jira/browse/HBASE-27590
> > > [2] https://github.com/apache/hbase/pull/4995
>


Re: [DISCUSS] Move from Iterable to List in Cleaners

2023-01-27 Thread Duo Zhang
After looking at the code, I think the direct reason is what Wei-Chiu
Chuang has already pointed out on the PR, Iterables.filter is lazy
evaluation, so when we iterate over the files in
SnapshotFileCache.getUnreferencedFiles, we will actually trigger all
the filter code, where we are under the taking snapshot write lock
protection.

I think the root problem here is the write lock , as the XXX comment
says, it is inefficient. What we want to prevent is taking snapshots
at the same time, but we should allow accessing the cache at the same
time. Of course it is not easy, otherwise we should have it in place
already.

And on the fix here, I do not think we need to change all the Iterable
to List in our code, a simple one line fix is enough.In
SnapshotHFileCleaner.getDeletableFiles, convert files to List and then
use the List to call SnapshotFileCache.getUnreferencedFiles. I think
this could fix the problem here.

Thanks.

In

Tak Lon (Stephen) Wu  于2023年1月27日周五 01:55写道:
>
> +1 for this change
>
> I tested the part on S3, it is as fast as peter mentioned and cuts
> down a lot of chore time if a user is actively using a snapshot and
> the archive is growing very fast.
>
> -Stephen
>
> On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi
>  wrote:
> >
> > Hi,
> >
> > I was doing performance testing of the CleanerChores using S3 root
> > directory with HBase 2.4. When the archive directory was large the HFile
> > cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame
> > graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there
> > were a lot of listing operations to S3. The lock was held for 45 seconds
> > with a directory containing 1000 files.
> > I've also seen one case when a snapshot creation failed (timed out). I
> > assume that can also be related to the long locking.
> >
> > The locking time can be drastically reduced by changing the
> > Iterable parameter to List for the
> > FileCleanerDelegate.getDeletableFiles. With this change, the lock was
> > released in approximately 100ms, however, the overall time for file
> > listing, evaluation, and deletion took the same time (45sec). Since the
> > different cleaner threads are not blocked on the snapshot lock the overall
> > deletion speed can be increased significantly.
> > CleanerChore.checkAndDeleteFiles already receives a List
> > parameter, and later converts it to Iterable so I don't expect
> > a drastic change in the Cleaners' memory usage.
> >
> > I've done some testing with HDFS as well.
> > Cleanup for 100k files took 63518ms with Iterable implementation, the lock
> > was held for ~130ms for every 1000 files.
> > Cleanup for 100k files took 64411ms with List implementation, the lock was
> > held for ~2ms for every 1000 files.
> >
> > Do you have any concerns about making this change?
> >
> > Thanks,
> > Peter
> >
> > [1] https://issues.apache.org/jira/browse/HBASE-27590
> > [2] https://github.com/apache/hbase/pull/4995


Re: [DISCUSS] Move from Iterable to List in Cleaners

2023-01-26 Thread Tak Lon (Stephen) Wu
+1 for this change

I tested the part on S3, it is as fast as peter mentioned and cuts
down a lot of chore time if a user is actively using a snapshot and
the archive is growing very fast.

-Stephen

On Thu, Jan 26, 2023 at 8:36 AM Peter Somogyi
 wrote:
>
> Hi,
>
> I was doing performance testing of the CleanerChores using S3 root
> directory with HBase 2.4. When the archive directory was large the HFile
> cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame
> graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there
> were a lot of listing operations to S3. The lock was held for 45 seconds
> with a directory containing 1000 files.
> I've also seen one case when a snapshot creation failed (timed out). I
> assume that can also be related to the long locking.
>
> The locking time can be drastically reduced by changing the
> Iterable parameter to List for the
> FileCleanerDelegate.getDeletableFiles. With this change, the lock was
> released in approximately 100ms, however, the overall time for file
> listing, evaluation, and deletion took the same time (45sec). Since the
> different cleaner threads are not blocked on the snapshot lock the overall
> deletion speed can be increased significantly.
> CleanerChore.checkAndDeleteFiles already receives a List
> parameter, and later converts it to Iterable so I don't expect
> a drastic change in the Cleaners' memory usage.
>
> I've done some testing with HDFS as well.
> Cleanup for 100k files took 63518ms with Iterable implementation, the lock
> was held for ~130ms for every 1000 files.
> Cleanup for 100k files took 64411ms with List implementation, the lock was
> held for ~2ms for every 1000 files.
>
> Do you have any concerns about making this change?
>
> Thanks,
> Peter
>
> [1] https://issues.apache.org/jira/browse/HBASE-27590
> [2] https://github.com/apache/hbase/pull/4995


[DISCUSS] Move from Iterable to List in Cleaners

2023-01-26 Thread Peter Somogyi
Hi,

I was doing performance testing of the CleanerChores using S3 root
directory with HBase 2.4. When the archive directory was large the HFile
cleaner threads were blocked on acquiring the SnapshotWriteLock. The flame
graph [1] showed that inside SnapshotFileCache.getUnreferencedFiles there
were a lot of listing operations to S3. The lock was held for 45 seconds
with a directory containing 1000 files.
I've also seen one case when a snapshot creation failed (timed out). I
assume that can also be related to the long locking.

The locking time can be drastically reduced by changing the
Iterable parameter to List for the
FileCleanerDelegate.getDeletableFiles. With this change, the lock was
released in approximately 100ms, however, the overall time for file
listing, evaluation, and deletion took the same time (45sec). Since the
different cleaner threads are not blocked on the snapshot lock the overall
deletion speed can be increased significantly.
CleanerChore.checkAndDeleteFiles already receives a List
parameter, and later converts it to Iterable so I don't expect
a drastic change in the Cleaners' memory usage.

I've done some testing with HDFS as well.
Cleanup for 100k files took 63518ms with Iterable implementation, the lock
was held for ~130ms for every 1000 files.
Cleanup for 100k files took 64411ms with List implementation, the lock was
held for ~2ms for every 1000 files.

Do you have any concerns about making this change?

Thanks,
Peter

[1] https://issues.apache.org/jira/browse/HBASE-27590
[2] https://github.com/apache/hbase/pull/4995