[ https://issues.apache.org/jira/browse/HBASE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J updated HBASE-6339: --------------------------- Description: I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the HRegion write lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The file copy from the caller's source FS is done after holding the lock. This doesn't seem right. For instance, we had a recent use-case where the bulk load running cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers between these FSes can get slower than an intra-cluster transfer. Hence I think we should begin to hold the write lock only after we've got a successful destinationFS copy of the requested file, and thereby allow more write throughput to pass. Does this sound reasonable to do? was: I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the write lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The file copy from the caller's source FS is done after holding the lock. This doesn't seem right. For instance, we had a recent use-case where the bulk load running cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers between these FSes can get slower than an intra-cluster transfer. Hence I think we should begin to hold the write lock only after we've got a successful destinationFS copy of the requested file, and thereby allow more write throughput to pass. Does this sound reasonable to do? > Bulkload call to RS should begin holding write lock only after the file has > been transferred > -------------------------------------------------------------------------------------------- > > Key: HBASE-6339 > URL: https://issues.apache.org/jira/browse/HBASE-6339 > Project: HBase > Issue Type: Improvement > Components: client, regionserver > Affects Versions: 0.90.0 > Reporter: Harsh J > Assignee: Harsh J > > I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the > HRegion write lock as soon as we determine that it is a multi-family bulk > load we'll be attempting. The file copy from the caller's source FS is done > after holding the lock. > This doesn't seem right. For instance, we had a recent use-case where the > bulk load running cluster is a separate HDFS instance/cluster than the one > that runs HBase and the transfers between these FSes can get slower than an > intra-cluster transfer. Hence I think we should begin to hold the write lock > only after we've got a successful destinationFS copy of the requested file, > and thereby allow more write throughput to pass. > Does this sound reasonable to do? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira