Maybe run more thrift gateways? Maybe one on each host running map tasks and have the tasks talk to localhost. That way your job doesn't bottleneck through a single thrift server.
> Solar Cell Data Management sounds cool. +1 :) On Fri, Oct 2, 2015 at 1:19 PM, Stack <[email protected]> wrote: > On Fri, Oct 2, 2015 at 12:46 PM, Pei Zhao <[email protected]> wrote: > > > Hi Stack, > > > > In my case, I tried to use HBase Thrift API, but Thrift server sometimes > > crashes during my MapReduce job, due to out of heap memory. > > > > Do you have any suggestions on that please? > > > > > Give it more heap? > > What is your client? > > Solar Cell Data Management sounds cool. > > St.Ack > > > > > > Thanks > > > > On Fri, Oct 2, 2015 at 3:10 PM, Stack <[email protected]> wrote: > > > > > You can't do hadoop streaming into hbase. Maybe explore hbase REST > > > interface and see if you can format puts that hbase REST can digest. > > > St.Ack > > > > > > On Fri, Oct 2, 2015 at 12:00 PM, Pei Zhao <[email protected]> > wrote: > > > > > > > Hi all, > > > > > > > > I am a graduate student do research in solar cell data management. My > > > > project is using Hadoop/HBase. Recently we switch MapReduce to Python > > > using > > > > Hadoop streaming. > > > > > > > > My question is can I use Hadoop streaming, which outputs to stdout > > with a > > > > specific format that HBase can pick it up and put them into tables? > > > > > > > > For instance, > > > > > > > > If I output lines of RowKey\t\Column\tValue , then HBase can know how > > to > > > > put this into tables. > > > > > > > > Regards > > > > Pei > > > > > > > > > > > -- > > *Pei (Asher) Zhao* > > *Electrical Engineering and Computer Science* > > *Case Western Reserve University* > > *Cleveland, Ohio 44106* > > > > *--The man who has made up his mind to win will never say impossible.* > > >
