We are trying to perform bulk ingest at scale and wanted to get some quick thoughts on how to increase performance and stability. One of the problems we have is that we sometimes import thousands of small files, and I don't believe there is a good way around this in the architecture as of yet. Already I have run into an rpc timeout issue because the import process is taking longer than 5m. And another issue where we have so many files after a bulk import that we have had to bump the tserver.scan.files.open.max to 1K.
Here are some other configs that we have been toying with: - master.fate.threadpool.size: 20 - master.bulk.threadpool.size: 20 - master.bulk.timeout: 20m - tserver.bulk.process.threads: 20 - tserver.bulk.assign.threads: 20 - tserver.bulk.timeout: 20m - tserver.compaction.major.concurrent.max: 20 - tserver.scan.files.open.max: 1200 - tserver.server.threads.minimum: 64 - table.file.max: 64 - table.compaction.major.ratio: 20 (HDFS) - dfs.namenode.handler.count: 100 - dfs.datanode.handler.count: 50 Just want to get any quick ideas for performing bulk ingest at scale. Thanks guys p.s. This is on Accumulo 1.6.5
