Scan vs TableInputFormat to process data

Guillermo Ortiz Fernández Tue, 28 May 2019 23:54:55 -0700

Just to be sure, if I execute Scan inside Spark, the execution is goig
through RegionServers and I get all the features of HBase/Scan (filters and
so on), all the parallelization is in charge of the RegionServers (even
I'm  running the program with spark)
If I use TableInputFormat I read all the column families (even If I don't
want to) , not previous filter either, it's just open the files of a hbase
table and process them completly. All te parallelization is in Spark and
don't use HBase at all, it's just read in HDFS the files what HBase stored
for a specific table.


Am I missing something?

Scan vs TableInputFormat to process data

Reply via email to