UNCLASSIFIED

Thanks for all the replies on this.

Based on the feedback, particularly considering the high number of splits 
maintainable per server, I'll leave the splits in place.  I'm not keen on 
merging tablets due to its impact on query performance.

Thanks again.

Matt



________________________________
From: Eric Newton [mailto:[email protected]]
Sent: Tuesday, 9 April 2013 11:57
To: [email protected]
Subject: Re: Removing splits [SEC=UNCLASSIFIED]


Is there a maximum number of splits a table can have?

There are a few theoretical limits to the number of tablets you can have.

1) a row cannot be split over tablets: if you only have a billion rows, you can 
only have a billion tablets
2) tablet servers track some bits of overhead about a tablet in memory: 
typically this is only a 1-2K per tablet, so a gigabyte JVM would only be able 
to have a 500K-1M tablets per server.
3) there's a limit to the number of files/directories that can be stored in 
your NameNode.  More tablets tend to create more files and directories.

Performance is likely to be poor at these limits, and it would not be helpful 
to approach them.

I have seen stable clusters with over 500K tablets.


  How can splits be removed once they are nolonger required, I can't see any 
command in the api?

With version 1.4, you can merge tablets together.  In the shell, you can merge 
ranges, or have the shell merge ranges based on size.

With version 1.5, you will be able to merge METADATA tablets together.

-Eric


IMPORTANT: This email remains the property of the Department of Defence and is 
subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have 
received this email in error, you are requested to contact the sender and 
delete the email.

Reply via email to