We are trying to perform bulk ingest at scale and wanted to get some quick
thoughts on how to increase performance and stability. One of the problems
we have is that we sometimes import thousands of small files, and I don't
believe there is a good way around this in the architecture as of yet.
Already I have run into an rpc timeout issue because the import process is
taking longer than 5m. And another issue where we have so many files after
a bulk import that we have had to bump the tserver.scan.files.open.max to
1K.

Here are some other configs that we have been toying with:
- master.fate.threadpool.size: 20
- master.bulk.threadpool.size: 20
- master.bulk.timeout: 20m
- tserver.bulk.process.threads: 20
- tserver.bulk.assign.threads: 20
- tserver.bulk.timeout: 20m
- tserver.compaction.major.concurrent.max: 20
- tserver.scan.files.open.max: 1200
- tserver.server.threads.minimum: 64
- table.file.max: 64
- table.compaction.major.ratio: 20

(HDFS)
- dfs.namenode.handler.count: 100
- dfs.datanode.handler.count: 50

Just want to get any quick ideas for performing bulk ingest at scale.
Thanks guys

p.s. This is on Accumulo 1.6.5

Reply via email to