Lavinia-Stefania Sirbu created HBASE-21286: ----------------------------------------------
Summary: Parallelize computeHDFSBlocksDistribution when getting splits of a HBaseSnapshot Key: HBASE-21286 URL: https://issues.apache.org/jira/browse/HBASE-21286 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 1.4.0 Reporter: Lavinia-Stefania Sirbu Even if this step is called computeHDFSBlocksDistribution, this is executed no matter the file system of the snapshot. For example, we have observed an important slowness when we have a snapshot in s3 (~26k regions, 5column families, 2 files per column family) the getsplits time is ~40min due to the calls in s3 for listing the files to get the best locations. Parallelizing this operation can reduce the overall setup time. The thread pool should be configurable and a good choice could be "hbase.snapshot.thread.pool.max" that is also used in RestoreSnapshotHelper. -- This message was sent by Atlassian JIRA (v7.6.3#76005)