Thanks Rahul. 1 - I understand the idea of listing the usage on each of the disks that we have HBase running on for that table. However how do I map the Nodes to Regions. I looked at RegionLocator - getStartEndKeys. But these just give me the values and not the Hostnames where each region is currently running. Is there a way to map the Region to the Node?
2 - Some of our row sizes vary quite a bit depending on the number of updates to the row. This will give us a rough idea of the size of the Region, but not the number of Rows. Is there a way to get both.. Apologies if I am bothering too much.. Thanks, Manish On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani <[email protected]> wrote: > If you want to see which regionservers are currently hot, then jmx would be > the best way to get that data. > > If you want to see overall what is hot, you can do this without the use of > a scan (it will be a pretty decent estimate) > > you can do: > > hdfs dfs -du /hbase/data/default/<table_you_care_about>/ > > with that data you can create a Map<EncodedRegionName, SizeInBytes> > > Then you can use the RegionLocator to find which region resides on which > machine. > > That will tell you the overall skew of your data in terms of raw bytes. > > Should be a pretty decent estimate and a lot faster than scanning your > table provided your table / cluster is sufficiently large. > > hope that helps. > rahul > > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu <[email protected]> wrote: > > > Have you looked at /jmx endpoint on the servers ? > > Below is a sample w.r.t. the metrics that would be of interest to you: > > > > > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505 > > 51_metric_appendCount" > > : 0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_num_ops" > > : 0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_min" > > : 0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_max" > > : 0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_mean" > > : 0.0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_median" > > : 0.0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_75th_percentile" > > : 0.0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_95th_percentile" > > : 0.0, > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > ad_metric_scanNext_99th_percentile" > > : 0.0, > > > > > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936 > > ab_metric_deleteCount" > > : 0, > > > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62 > > 15_metric_deleteCount" > > : 0, > > > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7 > > 15_metric_appendCount" > > : 0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_num_ops" > > : 0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_min" > > : 0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_max" > > : 0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_mean" > > : 0.0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_median" > > : 0.0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_75th_percentile" > > : 0.0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_95th_percentile" > > : 0.0, > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > 86_metric_get_99th_percentile" > > : 0.0, > > > > > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76 > > d2_metric_mutateCount" > > : 0, > > > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c > > be_metric_incrementCount" > > : 0, > > > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <[email protected] > > > > wrote: > > > > > Hi Ted, > > > > > > I understand the region crash/migration/splitting impact. Currently we > > have > > > hotspotting on few region servers. I am trying to collect the row stats > > at > > > region server and region levels to see how bad the skew of the data is. > > > > > > Manish > > > > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu <[email protected]> wrote: > > > > > > > Can you elaborate on your use case ? > > > > > > > > Suppose row A is on server B, after you retrieve row A, the region > for > > > row > > > > A gets moved to server C (load balancer or server crash). Server B > > would > > > no > > > > longer be relevant. > > > > > > > > Cheers > > > > > > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari < > > [email protected] > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I looked at the HBase Count functionality to count rows in a Table. > > Is > > > > > there a way that we can count the number of rows in Regions & > Region > > > > > Servers? When we use a HBase scan, we dont get the Region ID or > > Region > > > > > Server of the row. Is there a way to do this via Scans? > > > > > > > > > > Thanks, > > > > > Manish > > > > > > > > > > > > > > >
