I'd change the max file size to 20GB. That'd give you 5000 regions for 100TB.
________________________________ From: Jianshi Huang <[email protected]> To: [email protected] Sent: Monday, August 18, 2014 12:22 PM Subject: Re: How are split files distributed across Region servers? Hi JM, Make the range bigger you mean to make it multiple regions/splits, right? I probably will have >100TB of data, and I think the default split file size is 10GB. So I can assume each of my 100 machines will get assigned to 100 *random* regions? Where can I find the implementation details or settings for region assignment? Jianshi On Mon, Aug 18, 2014 at 8:48 PM, Jean-Marc Spaggiari < [email protected]> wrote: > Hi Jianshi, > > A region server can host more than one region. So if you pre-split your > table correctly based on your access usage, at the end all the servers > should be used evenly. > > If you have about 30% or your range which is not used, just make sure that > this range is bigger so at the end it will have the same load at the > others. > > JM > > > 2014-08-18 2:08 GMT-04:00 Jianshi Huang <[email protected]>: > > > Hi JM, > > > > If the region boundaries will not change, does that mean, > > > > If my data access pattern has skews (say a certain part (30%) of my data > > will almost never be used), then a proportion (30%) of my server will > > always be idle? > > > > A region server has to have a continuous rowkey range? > > > > Jianshi > > > > > > > > > > On Sat, Aug 16, 2014 at 2:46 AM, Jean-Marc Spaggiari < > > [email protected]> wrote: > > > > > H Jianshi, > > > > > > Not sure to get your question. > > > > > > Can I rephrase it? > > > > > > So you have 10 regions, and each of those regions has 10 HFiles. Then > you > > > run a major compaction on the table. Correct? > > > > > > Then you will end up with: > > > > > > reg1:[files:1] > > > reg2:[files:2] > > > reg3:[files:3] > > > ... > > > > > > Regions boundaries will not change. But each region will not have a > > single > > > underlaying file. > > > > > > HTH, > > > > > > JM > > > > > > > > > 2014-08-15 1:53 GMT-04:00 Jianshi Huang <[email protected]>: > > > > > > > Say I have 100 split files on 10 region servers, and I did a major > > > compact. > > > > > > > > Will these split files be distributed like this: > > > > reg1: [splits 1,2,..,10] > > > > reg2: [splits 11,12,...,20] > > > > ... > > > > > > > > Or like this: > > > > reg1: [splits: 1, 11, 21, ... , 91] > > > > reg2: [splits: 2, 12, 22, ... , 92] > > > > ... > > > > > > > > And if I want to specify the locality and the stride of split files? > > How > > > > can I do it in HBase? > > > > > > > > > > > > -- > > > > Jianshi Huang > > > > > > > > LinkedIn: jianshi > > > > Twitter: @jshuang > > > > Github & Blog: http://huangjs.github.com/ > > > > > > > > > > > > > > > -- > > Jianshi Huang > > > > LinkedIn: jianshi > > Twitter: @jshuang > > Github & Blog: http://huangjs.github.com/ > > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
