Re: How to get size of Hbase Table

Sachin Jain Thu, 21 Jul 2016 22:17:34 -0700

Thanks Ted,

Do you see a good approach for calculating size of HBase Table and telling
it to Spark and letting it decide which type of join it should perform.
As far as I have understood, from HBaseAdmin we calculate number of regions
and for each region we compute Hfiles and get their weight [0].


[0]:
https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/HDFSBlocksDistribution.java


On Fri, Jul 22, 2016 at 4:24 AM, Ted Yu <[email protected]> wrote:

> Please take a look at the following methods:
>
> From HBaseAdmin:
>
>   public List<HRegionInfo> getTableRegions(final TableName tableName)
>
> From HRegion:
>
>   public static HDFSBlocksDistribution computeHDFSBlocksDistribution(final
> Configuration conf,
>
>       final HTableDescriptor tableDescriptor, final HRegionInfo regionInfo)
> throws IOException {
>
> FYI
>
> On Wed, Jul 20, 2016 at 11:28 PM, Sachin Jain <[email protected]>
> wrote:
>
> > *Context*
> > I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark
> > Jobs into HBase which will be further available as lookups from HBase
> > Table. BaseRelation extends HadoopFSRelation and is used to read and
> write
> > to HBase. Spark Default Source API is used.
> >
> > *Use Case*
> > Now, whenever I perform join operation, Spark creates a logical plan and
> > decides which type of join it should execute and as per Spark Strategies
> > [0] it checks the size of HBase Table. If it is less than some threshold
> > (10 MB) it selects Broadcast Hash join otherwise Sort Merge join.
> >
> > *Problem Statement*
> > I want to know if there is an API or some approach to calculate the size
> of
> > an HBase table.
> >
> > [0]:
> >
> >
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118
> >
> > Thanks
> > -Sachin
> >
>

Re: How to get size of Hbase Table

Reply via email to