Re: Reading the whole table with MapReduce and Spark.

James Kebinger Wed, 29 May 2019 15:22:50 -0700

TableInputFormat doesn't read the filesystem directly it essentially issues
a scan over the whole table (or the specified range) so it'll read the data
you expect to read if you'd done a scan from any client.
There is a TableSnapshotInputFormat that bypasses the hbase server itself
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.html
going
directly to the files. When using this, your job will read the entire table.


On Wed, May 29, 2019 at 1:45 AM Guillermo Ortiz Fernández <
guillermo.ortiz.f...@gmail.com> wrote:

> Another little doubt it's: if I use the class TableinputFormat to read a
> HBase table, Am I going to read the whole table? or data what haven't been
> flushed to storefiles it's not going to be read?
>
> El mié., 29 may. 2019 a las 0:14, Guillermo Ortiz Fernández (<
> guillermo.ortiz.f...@gmail.com>) escribió:
>
> > it depends of the row, they did only share 5% of the qualifiers names.
> > Each row could have about 500-3000 columns in 3 column families. One of
> > them has 80% of the columns.
> >
> > The table has around 75M of rows.
> >
> > El mar., 28 may. 2019 a las 17:33, <s...@comcast.net> escribió:
> >
> >> Guillermo
> >>
> >>
> >> How large is your table?   How many columns?
> >>
> >>
> >> Sincerely,
> >>
> >> Sean
> >>
> >> > On May 28, 2019 at 10:11 AM Guillermo Ortiz <konstt2...@gmail.com
> >> mailto:konstt2...@gmail.com > wrote:
> >> >
> >> >
> >> >     I have a doubt. When you process a Hbase table with MapReduce you
> >> could use
> >> >     the TableInputFormat, I understand that it goes directly to HDFS
> >> files
> >> >     (storesFiles in HDFS) , so you could do some filter in the map
> >> phase and
> >> >     it's not the same to go through to the region servers to do some
> >> massive
> >> >     queriesIt's possible to do the same using TableInputFormat with
> >> Spark and
> >> >     it's more efficient than use scan with filters and so on (again)
> >> when you
> >> >     want to do a massive query about all the table. Am I right?
> >> >
> >>
> >
>

Re: Reading the whole table with MapReduce and Spark.

Reply via email to