Re: Hbase scaning for couple Terabytes data

Ted Yu Wed, 11 May 2016 21:02:24 -0700

TableInputFormatBase is abstract.

Most likely you would use TableInputFormat for the scan.


See javadoc of getSplits():

   * Calculates the splits that will serve as input for the map tasks. The

   * number of splits matches the number of regions in a table.


FYI

On Wed, May 11, 2016 at 6:05 PM, Yi Jiang <[email protected]> wrote:

> Hi, Guys
> Recently we are debating the usage for hbase as our destination for data
> pipeline job.
> Basically, we want to save our logs into hbase, and our pipeline can
> generate 2-4 terabytes data everyday, but our IT department think it is not
> good idea to scan so hbase, it will cause the performance and memory issue.
> And they ask our just keep 15 minutes data amount in the hbase for real
> time analysis.
> For now, I am using hive to external to hbase, but what I am thinking that
> for map reduce job, what kind of mapper it is using to scan the data from
> hbase? Is it TableInputFormatBase? and how many mapper it will use in hive
> to scan the hbase. Is it efficient or not? Will it cause the performance
> issue if we have couple T's or more larger data amount?
> I am also trying to index some columns that we might use to query. But  I
> am not sure if it is good idea to keep so much history data in the hbase
> for query.
> Thank you
> Jacky
>
>

Re: Hbase scaning for couple Terabytes data

Reply via email to