Another little doubt it's: if I use the class TableinputFormat to read a HBase table, Am I going to read the whole table? or data what haven't been flushed to storefiles it's not going to be read?
El mié., 29 may. 2019 a las 0:14, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > it depends of the row, they did only share 5% of the qualifiers names. > Each row could have about 500-3000 columns in 3 column families. One of > them has 80% of the columns. > > The table has around 75M of rows. > > El mar., 28 may. 2019 a las 17:33, <s...@comcast.net> escribió: > >> Guillermo >> >> >> How large is your table? How many columns? >> >> >> Sincerely, >> >> Sean >> >> > On May 28, 2019 at 10:11 AM Guillermo Ortiz <konstt2...@gmail.com >> mailto:konstt2...@gmail.com > wrote: >> > >> > >> > I have a doubt. When you process a Hbase table with MapReduce you >> could use >> > the TableInputFormat, I understand that it goes directly to HDFS >> files >> > (storesFiles in HDFS) , so you could do some filter in the map >> phase and >> > it's not the same to go through to the region servers to do some >> massive >> > queriesIt's possible to do the same using TableInputFormat with >> Spark and >> > it's more efficient than use scan with filters and so on (again) >> when you >> > want to do a massive query about all the table. Am I right? >> > >> >