Hi, The reason we cannot close the ResultScanner (or issue a multi-get), is that we have wide rows with many columns, and we want to iterate over them rather than get all the columns at once.
There's a special but common case that for each row we only need the first column. Is there a better way to do this than multiple scans + take(1)? Jianshi On Wed, Aug 27, 2014 at 12:44 PM, Dai, Kevin <[email protected]> wrote: > Hi, Ted > > I think you are right. But we must hold the ResultScanner for a while. So > is there any way to reduce the performance loss? Or is there any way to > share the connection? > > Best regards, > Kevin. > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: 2014年8月27日 11:36 > To: [email protected] > Subject: Re: ResultScanner performance > > Keeping many ResultScanners open at the same time is not good for > performance. > > Please see: > http://hbase.apache.org/book.html#perf.hbase.client.scannerclose > > After fetching results from ResultScanner, you should close it ASAP. > > Cheers > > > On Tue, Aug 26, 2014 at 8:18 PM, Dai, Kevin <[email protected]> wrote: > > > Hi, Ted > > > > We have a cluster of 48 machines and at least 100T data(which is still > > increasing). > > The problem is that we have a lot of row keys (about tens of thousands > > ) to query in the meantime and we don't fetch all the data at once, > > instead we fetch them when needed, so we may hold tens of thousands > > ResultScanner in the meantime. > > I want to know whether it will hurt the performance and network > > resources and if so, is there any way to solve it? > > > > Best regards, > > Kevin. > > -----Original Message----- > > From: Ted Yu [mailto:[email protected]] > > Sent: 2014年8月26日 16:49 > > To: [email protected] > > Cc: [email protected]; Huang, Jianshi > > Subject: Re: ResultScanner performance > > > > Can you give a bit more detail ? > > What size is the cluster / dataset ? > > What problem are you solving ? > > Would using coprocessor help reduce the usage of ResultScanner ? > > > > Cheers > > > > On Aug 26, 2014, at 12:13 AM, "Dai, Kevin" <[email protected]> wrote: > > > > > Hi, everyone > > > > > > My application will hold tens of thousands of ResultScanner to get > Data. > > Will it hurt the performance and network resources? > > > If so, is there any way to solve it? > > > Thanks, > > > Kevin. > > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
