[ https://issues.apache.org/jira/browse/HBASE-20430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439985#comment-16439985 ]
Andrew Purtell edited comment on HBASE-20430 at 4/16/18 8:42 PM: ----------------------------------------------------------------- I suppose this could also be handled with an enhancement to Hadoop's S3A to multiplex open files among the internal resource pools it already maintains. [~steve_l] was (Author: apurtell): I suppose this could also be handled with an enhancement to Hadoop's S3A to multiplex open files among the internal resource pools it already maintains. > Improve store file management for non-HDFS filesystems > ------------------------------------------------------ > > Key: HBASE-20430 > URL: https://issues.apache.org/jira/browse/HBASE-20430 > Project: HBase > Issue Type: Sub-task > Reporter: Andrew Purtell > Priority: Major > > HBase keeps a file open for every active store file so no additional round > trips to the NameNode are needed after the initial open. HDFS internally > multiplexes open files, but the Hadoop S3 filesystem implementations do not, > or, at least, not as well. As the bulk of data under management increases we > observe the required number of concurrently open connections will rise, and > expect it will eventually exhaust a limit somewhere (the client, the OS file > descriptor table or open file limits, or the S3 service). > Initially we can simply introduce an option to close every store file after > the reader has finished, and determine the performance impact. Use cases > backed by non-HDFS filesystems will already have to cope with a different > read performance profile. Based on experiments with the S3 backed Hadoop > filesystems, notably S3A, even with aggressively tuned options simple reads > can be very slow when there are blockcache misses, 15-20 seconds observed for > Get of a single small row, for example. We expect extensive use of the > BucketCache to mitigate in this application already. Could be backed by > offheap storage, but more likely a large number of cache files managed by the > file engine on local SSD storage. If misses are already going to be super > expensive, then the motivation to do more than simply open store files on > demand is largely absent. > Still, we could employ a predictive cache. Where frequent access to a given > store file (or, at least, its store) is predicted, keep a reference to the > store file open. Can keep statistics about read frequency, write it out to > HFiles during compaction, and note these stats when opening the region, > perhaps by reading all meta blocks of region HFiles when opening. Otherwise, > close the file after reading and open again on demand. Need to be careful not > to use ARC or equivalent as cache replacement strategy as it is encumbered. > The size of the cache can be determined at startup after detecting the > underlying filesystem. Eg. setCacheSize(VERY_LARGE_CONSTANT) if (fs > instanceof DistributedFileSystem), so we don't lose much when on HDFS still. -- This message was sent by Atlassian JIRA (v7.6.3#76005)