We insert data using java hbase client (org.apache.hadoop.hbase.client.*) . However we are not providing any details in the configuration object , except for the zookeeper quorum, port number. Should we specify explicitly at this stage ?
On 13 May 2013 19:54, Anoop John <[email protected]> wrote: > >now have 731 regions (each about ~350 mb !!). I checked the > configuration in CM, and the value for hbase.hregion.max.filesize is 1 GB > too !!! > > You mentioned the splits at the time of table creation? How u created the > table? > > -Anoop- > > On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani <[email protected] > >wrote: > > > Hi, > > > > Thanks for the details. No i haven't run any compaction or i have no idea > > if there is one going on in background. I executed a major_compact on > that > > table and i now have 731 regions (each about ~350 mb !!). I checked the > > configuration in CM, and the value for hbase.hregion.max.filesize is 1 > GB > > too !!! > > > > I am not trying to access HFiles in my MR job, infact i am just using a > PIG > > script which handles this. This number (731) is close to my number of map > > tasks, which makes sense. But how can i decrease this, shouldn't the size > > of each region be 1 GB with that configuration value ? > > > > > > On 13 May 2013 18:36, Ted Yu <[email protected]> wrote: > > > > > You can change HFile size through hbase.hregion.max.filesize parameter. > > > > > > On May 13, 2013, at 2:45 AM, Praveen Bysani <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > I wanted to minimize on the number of map reduce tasks generated > while > > > > processing a job, hence configured it to a larger value. > > > > > > > > I don't think i have configured HFile size in the cluster. I use > > Cloudera > > > > Manager to mange my cluster, and the only configuration i can relate > > > > to is hfile.block.cache.size > > > > which is set to 0.25. How do i change the HFile size ? > > > > > > > > On 13 May 2013 15:03, Amandeep Khurana <[email protected]> wrote: > > > > > > > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani < > > > [email protected] > > > >>> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> I have the dfs.block.size value set to 1 GB in my cluster > > > configuration. > > > >> > > > >> > > > >> Just out of curiosity - why do you have it set at 1GB? > > > >> > > > >> > > > >>> I > > > >>> have around 250 GB of data stored in hbase over this cluster. But > > when > > > i > > > >>> check the number of blocks, it doesn't correspond to the block size > > > >> value i > > > >>> set. From what i understand i should only have ~250 blocks. But > > instead > > > >>> when i did a fsck on the /hbase/<table-name>, i got the following > > > >>> > > > >>> Status: HEALTHY > > > >>> Total size: 265727504820 B > > > >>> Total dirs: 1682 > > > >>> Total files: 1459 > > > >>> Total blocks (validated): 1459 (avg. block size 182129886 B) > > > >>> Minimally replicated blocks: 1459 (100.0 %) > > > >>> Over-replicated blocks: 0 (0.0 %) > > > >>> Under-replicated blocks: 0 (0.0 %) > > > >>> Mis-replicated blocks: 0 (0.0 %) > > > >>> Default replication factor: 3 > > > >>> Average block replication: 3.0 > > > >>> Corrupt blocks: 0 > > > >>> Missing replicas: 0 (0.0 %) > > > >>> Number of data-nodes: 5 > > > >>> Number of racks: 1 > > > >>> > > > >>> Are there any other configuration parameters that need to be set ? > > > >> > > > >> > > > >> What is your HFile size set to? The HFiles that get persisted would > be > > > >> bound by that number. Thereafter each HFile would be split into > > blocks, > > > the > > > >> size of which you configure using the dfs.block.size configuration > > > >> parameter. > > > >> > > > >> > > > >>> > > > >>> -- > > > >>> Regards, > > > >>> Praveen Bysani > > > >>> http://www.praveenbysani.com > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Praveen Bysani > > > > http://www.praveenbysani.com > > > > > > > > > > > -- > > Regards, > > Praveen Bysani > > http://www.praveenbysani.com > > > -- Regards, Praveen Bysani http://www.praveenbysani.com
