TableInputFormatBase is abstract. Most likely you would use TableInputFormat for the scan.
See javadoc of getSplits(): * Calculates the splits that will serve as input for the map tasks. The * number of splits matches the number of regions in a table. FYI On Wed, May 11, 2016 at 6:05 PM, Yi Jiang <[email protected]> wrote: > Hi, Guys > Recently we are debating the usage for hbase as our destination for data > pipeline job. > Basically, we want to save our logs into hbase, and our pipeline can > generate 2-4 terabytes data everyday, but our IT department think it is not > good idea to scan so hbase, it will cause the performance and memory issue. > And they ask our just keep 15 minutes data amount in the hbase for real > time analysis. > For now, I am using hive to external to hbase, but what I am thinking that > for map reduce job, what kind of mapper it is using to scan the data from > hbase? Is it TableInputFormatBase? and how many mapper it will use in hive > to scan the hbase. Is it efficient or not? Will it cause the performance > issue if we have couple T's or more larger data amount? > I am also trying to index some columns that we might use to query. But I > am not sure if it is good idea to keep so much history data in the hbase > for query. > Thank you > Jacky > >
