Have a look at Phoenix (https://github.com/forcedotcom/phoenix), a SQL skin over HBase. It does parallel scans and has no map/reduce dependencies. Instead, it compiles your SQL into native HBase calls. Thanks, James @JamesPlusPlus http://phoenix-hbase.blogspot.com
On Aug 21, 2013, at 1:08 AM, yonghu <[email protected]> wrote: > Thanks. So, to scan the table just using the java program without using > MapReduce will heavily decrease the performance. > > Yong > > > On Tue, Aug 20, 2013 at 6:02 PM, Jeff Kolesky <[email protected]> wrote: > >> The scan will be broken up into multiple map tasks, each of which will run >> over a single split of the table (look at TableInputFormat to see how it is >> done). The map tasks will run in parallel. >> >> Jeff >> >> >> On Tue, Aug 20, 2013 at 8:45 AM, yonghu <[email protected]> wrote: >> >>> Hello, >>> >>> I know if I use default scan api, HBase scans table in a serial manner, >> as >>> it needs to guarantee the order of the returned tuples. My question is >> if I >>> use MapReduce to read the HBase table, and directly output the results in >>> HDFS, not returned back to client. The HBase scan is still in a serial >>> manner or in this situation it can run a parallel scan. >>> >>> Thanks! >>> >>> Yong >>> >> >> >> >> -- >> *Jeff Kolesky* >> Chief Software Architect >> *Opower* >>
