*Context* I am using Spark (1.5.1) with HBase (1.1.2) to dump the output of Spark Jobs into HBase which will be further available as lookups from HBase Table. BaseRelation extends HadoopFSRelation and is used to read and write to HBase. Spark Default Source API is used.
*Use Case* Now, whenever I perform join operation, Spark creates a logical plan and decides which type of join it should execute and as per Spark Strategies [0] it checks the size of HBase Table. If it is less than some threshold (10 MB) it selects Broadcast Hash join otherwise Sort Merge join. *Problem Statement* I want to know if there is an API or some approach to calculate the size of an HBase table. [0]: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L118 Thanks -Sachin
