Re: Reading the whole table with MapReduce and Spark.

Guillermo Ortiz Fernández Tue, 28 May 2019 23:45:51 -0700

Another little doubt it's: if I use the class TableinputFormat to read a
HBase table, Am I going to read the whole table? or data what haven't been
flushed to storefiles it's not going to be read?


El mié., 29 may. 2019 a las 0:14, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:

> it depends of the row, they did only share 5% of the qualifiers names.
> Each row could have about 500-3000 columns in 3 column families. One of
> them has 80% of the columns.
>
> The table has around 75M of rows.
>
> El mar., 28 may. 2019 a las 17:33, <s...@comcast.net> escribió:
>
>> Guillermo
>>
>>
>> How large is your table?   How many columns?
>>
>>
>> Sincerely,
>>
>> Sean
>>
>> > On May 28, 2019 at 10:11 AM Guillermo Ortiz <konstt2...@gmail.com
>> mailto:konstt2...@gmail.com > wrote:
>> >
>> >
>> >     I have a doubt. When you process a Hbase table with MapReduce you
>> could use
>> >     the TableInputFormat, I understand that it goes directly to HDFS
>> files
>> >     (storesFiles in HDFS) , so you could do some filter in the map
>> phase and
>> >     it's not the same to go through to the region servers to do some
>> massive
>> >     queriesIt's possible to do the same using TableInputFormat with
>> Spark and
>> >     it's more efficient than use scan with filters and so on (again)
>> when you
>> >     want to do a massive query about all the table. Am I right?
>> >
>>
>

Re: Reading the whole table with MapReduce and Spark.

Reply via email to