Eric, Thanks for the quick reply. Another question - My cluster has over 80 cpus available. Suppose I create something like 50 splits across the 7 servers - I will increase my map job count accordingly. What are your thoughts on this?
Thanks, Ralph __________________________________________________ Ralph Perko Pacific Northwest National Laboratory From: Eric Newton <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: table splits You need to estimate the size of the split. First, get the id of the table with "tables -l" in the accumulo shell. Then, find out the size of table in hdfs: $ hadoop fs -dus /accumulo/tables/<id> Divide by 7, and use that as the split size: shell> config -t mytable -s table.split.threshold=newsize The table will automatically split out. Afterwards, you can then raise the split size to keep it from splitting until it gets much bigger: shell> config -t mytable -s table.split.threshold=1G -Eric On Mon, May 21, 2012 at 12:24 PM, Perko, Ralph J <[email protected]<mailto:[email protected]>> wrote: Hi, I am looking for advice on how to best layout my table splits. I have a 7 node cluster and my table contains ~10M records. I would like to split the table equally across all the servers however I see no utility to do this in this manner. I understand I can create splits for some letter range but I was hoping for some way to have accumulo create "n" equal splits. Is this possible? Right now the best way I see to handle this is to write a utility that iterates the table, keeps a count and at some given value (table size/ split count) spits out the beginning and end row and then I create the split manually. Thanks, Ralph __________________________________________________ Ralph Perko Pacific Northwest National Laboratory
