Can you create RFiles outside of Accumulo and then import those? On Thu, Jun 16, 2016 at 10:24 PM, Josh Elser <[email protected]> wrote:
> There are two big things that are required to really scale up bulk > loading. Sadly (I guess) they are both things you would need to be > implement on your own: > > 1) Avoid lots of small files. Target as large of files as you can, > relative to your ingest latency requirements and your max file size (set on > your instance or table) > > 2) Avoid having to import one file to multiple tablets. Remember that the > majority of the metadata update for Accumulo is updating the tablet row > with the new file. When you have one file which spans many tablets, you are > now create N metadata updates instead of just one. When you create the > files, take into account the split points of your table, and use that try > to target one file per tablet. > > > Roshan Punnoose wrote: > >> We are trying to perform bulk ingest at scale and wanted to get some >> quick thoughts on how to increase performance and stability. One of the >> problems we have is that we sometimes import thousands of small files, >> and I don't believe there is a good way around this in the architecture >> as of yet. Already I have run into an rpc timeout issue because the >> import process is taking longer than 5m. And another issue where we have >> so many files after a bulk import that we have had to bump the >> tserver.scan.files.open.max to 1K. >> >> Here are some other configs that we have been toying with: >> - master.fate.threadpool.size: 20 >> - master.bulk.threadpool.size: 20 >> - master.bulk.timeout: 20m >> - tserver.bulk.process.threads: 20 >> - tserver.bulk.assign.threads: 20 >> - tserver.bulk.timeout: 20m >> - tserver.compaction.major.concurrent.max: 20 >> - tserver.scan.files.open.max: 1200 >> - tserver.server.threads.minimum: 64 >> - table.file.max: 64 >> - table.compaction.major.ratio: 20 >> >> (HDFS) >> - dfs.namenode.handler.count: 100 >> - dfs.datanode.handler.count: 50 >> >> Just want to get any quick ideas for performing bulk ingest at scale. >> Thanks guys >> >> p.s. This is on Accumulo 1.6.5 >> >
