Some comments inlined below: On Wed, Nov 28, 2012 at 2:49 PM, Chris Burrell <[email protected]> wrote:
> Hi > > I am trialling Accumulo on a small (tiny) cluster and wondering how the > best way to tune it would be. I have 1 master + 2 tservers. The master has > 8Gb of RAM and the tservers have each 16Gb each. > > I have set the walogs size to be 2Gb with an external memory map of 9G. > The ratio is still the defaulted to 3. I've also upped the heap sizes of > each tserver to 2Gb heaps. > > I'm trying to achieve high-speed ingest via batch writers held on several > other servers. I'm loading two separate tables. > > Here are some questions I have: > - Does the config above sound sensible? or overkill? > Looks good to me, assuming you aren't doing other things (like map/reduce) on the machines. > - Is it preferable to have more servers with lower specs? > Yes. Mostly to get more drives. > - Is this the best way to maximise use of the memory? > It's not bad. You may want to have larger block caches and a smaller in-memory map. But if you want to write-mostly, read-little, this is good. > - Does the fact I have 3x2Gb walogs, means that the remaining 3Gb in the > external memory map can be used while compactions occur? > Yes. You will want to increase the size or number of logs. With that many servers, failures will hopefully be very rare. I would go with changing 3 to 8. Having lots of logs on a tablet is no big deal if you have disk space, and don't expect many failures. > - When minor compactions occur, does this halt ingest on that particular > tablet? or tablet server? > Only if memory fills before the compactions finish. The monitor page will indicate this by displaying "hold time." When this happens the tserver will self-tune and start minor compactions earlier with future ingest. > - I have pre-split the tables six-ways, but not entirely sure if that's > preferable if I only have 2 servers while trying it out? Perhaps 2 ways > might be better? > Not for that reason, but to be able to use more cores concurrently. Aim for 50-100 tablets/node. > - Does the batch upload through the shell client give significantly better > performance stats? > Using map/reduce to create RFiles is more efficient. But it also increases latency: you only can see the data when the whole file is loaded. When a file is batch-loaded, its index is read, and the file is assigned to matching tablets. With small indexes, you can batch-load terabytes in minutes. -Eric
