If you want to see which regionservers are currently hot, then jmx would be
the best way to get that data.

If you want to see overall what is hot, you can do this without the use of
a scan (it will be a pretty decent estimate)

you can do:

hdfs dfs -du /hbase/data/default/<table_you_care_about>/

with that data you can create a Map<EncodedRegionName, SizeInBytes>

Then you can use the RegionLocator to find which region resides on which
machine.

That will tell you the overall skew of your data in terms of raw bytes.

Should be a pretty decent estimate and a lot faster than scanning your
table provided your table / cluster is sufficiently large.

hope that helps.
rahul

On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu <[email protected]> wrote:

> Have you looked at /jmx endpoint on the servers ?
> Below is a sample w.r.t. the metrics that would be of interest to you:
>
>
> "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> 51_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_min"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_max"
> : 0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_median"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> ad_metric_scanNext_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> ab_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> 15_metric_deleteCount"
> : 0,
>
> "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> 15_metric_appendCount"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_num_ops"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_min"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_max"
> : 0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_mean"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_median"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_75th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_95th_percentile"
> : 0.0,
>
> "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> 86_metric_get_99th_percentile"
> : 0.0,
>
>
> "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> d2_metric_mutateCount"
> : 0,
>
> "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> be_metric_incrementCount"
> : 0,
>
> On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <[email protected]>
> wrote:
>
> > Hi Ted,
> >
> > I understand the region crash/migration/splitting impact. Currently we
> have
> > hotspotting on few region servers. I am trying to collect the row stats
> at
> > region server and region levels to see how bad the skew of the data is.
> >
> > Manish
> >
> > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu <[email protected]> wrote:
> >
> > > Can you elaborate on your use case ?
> > >
> > > Suppose row A is on server B, after you retrieve row A, the region for
> > row
> > > A gets moved to server C (load balancer or server crash). Server B
> would
> > no
> > > longer be relevant.
> > >
> > > Cheers
> > >
> > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> [email protected]
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I looked at the HBase Count functionality to count rows in a Table.
> Is
> > > > there a way that we can count the number of rows in Regions & Region
> > > > Servers? When we use a HBase scan, we dont get the Region ID or
> Region
> > > > Server of the row. Is there a way to do this via Scans?
> > > >
> > > > Thanks,
> > > > Manish
> > > >
> > >
> >
>

Reply via email to