No, Endpoint executes with normal QoS but it initiates a scan which seems to be execute on High QoS looking at the handlers. Though, I am not totally sure, maybe that region server was housing the .META table and those were actually scan.next operations for the META table. So I will need to confirm this.
Varun On Mon, Feb 11, 2013 at 4:50 AM, Anoop Sam John <[email protected]> wrote: > You mean the end point is geetting executed with high QoS? You checked > with some logs? > > -Anoop- > ________________________________________ > From: Varun Sharma [[email protected]] > Sent: Monday, February 11, 2013 4:05 AM > To: [email protected]; lars hofhansl > Subject: Re: Get on a row with multiple columns > > Back to BulkDeleteEndpoint, i got it to work but why are the scanner.next() > calls executing on the Priority handler queue ? > > Varun > > On Sat, Feb 9, 2013 at 8:46 AM, lars hofhansl <[email protected]> wrote: > > > The answer is "probably" :) > > It's disabled in 0.96 by default. Check out HBASE-7008 ( > > https://issues.apache.org/jira/browse/HBASE-7008) and the discussion > > there. > > > > Also check out the discussion in HBASE-5943 and HADOOP-8069 ( > > https://issues.apache.org/jira/browse/HADOOP-8069) > > > > > > -- Lars > > > > > > > > ________________________________ > > From: Jean-Marc Spaggiari <[email protected]> > > To: [email protected] > > Sent: Saturday, February 9, 2013 5:02 AM > > Subject: Re: Get on a row with multiple columns > > > > Lars, should we always consider disabling Nagle? What's the down side? > > > > JM > > > > 2013/2/9, Varun Sharma <[email protected]>: > > > Yeah, I meant true... > > > > > > On Sat, Feb 9, 2013 at 12:17 AM, lars hofhansl <[email protected]> > wrote: > > > > > >> Should be set to true. If tcpnodelay is set to true, Nagle's is > > disabled. > > >> > > >> -- Lars > > >> > > >> > > >> > > >> ________________________________ > > >> From: Varun Sharma <[email protected]> > > >> To: [email protected]; lars hofhansl <[email protected]> > > >> Sent: Saturday, February 9, 2013 12:11 AM > > >> Subject: Re: Get on a row with multiple columns > > >> > > >> > > >> Okay I did my research - these need to be set to false. I agree. > > >> > > >> > > >> On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[email protected]> > > >> wrote: > > >> > > >> I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and > the > > >> hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce > > >> network latency ? > > >> > > > >> > > > >> >On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[email protected]> > > wrote: > > >> > > > >> >Sorry.. I meant set these two config parameters to true (not false > as I > > >> state below). > > >> >> > > >> >> > > >> >> > > >> >> > > >> >>----- Original Message ----- > > >> >>From: lars hofhansl <[email protected]> > > >> >>To: "[email protected]" <[email protected]> > > >> >>Cc: > > >> >>Sent: Friday, February 8, 2013 11:41 PM > > >> >>Subject: Re: Get on a row with multiple columns > > >> >> > > >> >>Only somewhat related. Seeing the magic 40ms random read time there. > > >> >> Did > > >> you disable Nagle's? > > >> >>(set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false > in > > >> hbase-site.xml). > > >> >> > > >> >>________________________________ > > >> >>From: Varun Sharma <[email protected]> > > >> >>To: [email protected]; lars hofhansl <[email protected]> > > >> >>Sent: Friday, February 8, 2013 10:45 PM > > >> >>Subject: Re: Get on a row with multiple columns > > >> >> > > >> >>The use case is like your twitter feed. Tweets from people u follow. > > >> >> When > > >> >>someone unfollows, you need to delete a bunch of his tweets from the > > >> >>following feed. So, its frequent, and we are essentially running > into > > >> some > > >> >>extreme corner cases like the one above. We need high write > throughput > > >> for > > >> >>this, since when someone tweets, we need to fanout the tweet to all > > the > > >> >>followers. We need the ability to do fast deletes (unfollow) and > fast > > >> adds > > >> >>(follow) and also be able to do fast random gets - when a real user > > >> >> loads > > >> >>the feed. I doubt we will able to play much with the schema here > since > > >> >> we > > >> >>need to support a bunch of use cases. > > >> >> > > >> >>@lars: It does not take 30 seconds to place 300 delete markers. It > > >> >> takes > > >> 30 > > >> >>seconds to first find which of those 300 pins are in the set of > > columns > > >> >>present - this invokes 300 gets and then place the appropriate > delete > > >> >>markers. Note that we can have tens of thousands of columns in a > > single > > >> row > > >> >>so a single get is not cheap. > > >> >> > > >> >>If we were to just place delete markers, that is very fast. But when > > >> >>started doing that, our random read performance suffered because of > > too > > >> >>many delete markers. The 90th percentile on random reads shot up > from > > >> >> 40 > > >> >>milliseconds to 150 milliseconds, which is not acceptable for our > > >> usecase. > > >> >> > > >> >>Thanks > > >> >>Varun > > >> >> > > >> >>On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[email protected]> > > >> >> wrote: > > >> >> > > >> >>> Can you organize your columns and then delete by column family? > > >> >>> > > >> >>> deleteColumn without specifying a TS is expensive, since HBase > first > > >> has > > >> >>> to figure out what the latest TS is. > > >> >>> > > >> >>> Should be better in 0.94.1 or later since deletes are batched like > > >> >>> Puts > > >> >>> (still need to retrieve the latest version, though). > > >> >>> > > >> >>> In 0.94.3 or later you can also the BulkDeleteEndPoint, which > > >> >>> basically > > >> >>> let's specify a scan condition and then place specific delete > marker > > >> for > > >> >>> all KVs encountered. > > >> >>> > > >> >>> > > >> >>> If you wanted to get really > > >> >>> fancy, you could hook up a coprocessor to the compaction process > and > > >> >>> simply filter all KVs you no longer want (without ever placing any > > >> >>> delete markers). > > >> >>> > > >> >>> > > >> >>> Are you saying it takes 15 seconds to place 300 version delete > > >> markers?! > > >> >>> > > >> >>> > > >> >>> -- Lars > > >> >>> > > >> >>> > > >> >>> > > >> >>> ________________________________ > > >> >>> From: Varun Sharma <[email protected]> > > >> >>> To: [email protected] > > >> >>> Sent: Friday, February 8, 2013 10:05 PM > > >> >>> Subject: Re: Get on a row with multiple columns > > >> >>> > > >> >>> We are given a set of 300 columns to delete. I tested two cases: > > >> >>> > > >> >>> 1) deleteColumns() - with the 's' > > >> >>> > > >> >>> This function simply adds delete markers for 300 columns, in our > > >> >>> case, > > >> >>> typically only a fraction of these columns are actually present - > > 10. > > >> After > > >> >>> starting to use deleteColumns, we starting seeing a drop in > cluster > > >> wide > > >> >>> random read performance - 90th percentile latency worsened, so did > > >> >>> 99th > > >> >>> probably because of having to traverse delete markers. I attribute > > >> this to > > >> >>> profusion of delete markers in the cluster. Major compactions > slowed > > >> down > > >> >>> by almost 50 percent probably because of having to clean out > > >> significantly > > >> >>> more delete markers. > > >> >>> > > >> >>> 2) deleteColumn() > > >> >>> > > >> >>> Ended up with untolerable 15 second calls, which clogged all the > > >> handlers. > > >> >>> Making the cluster pretty much unresponsive. > > >> >>> > > >> >>> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[email protected]> > wrote: > > >> >>> > > >> >>> > For the 300 column deletes, can you show us how the Delete(s) > are > > >> >>> > constructed ? > > >> >>> > > > >> >>> > Do you use this method ? > > >> >>> > > > >> >>> > public Delete deleteColumns(byte [] family, byte [] > qualifier) { > > >> >>> > Thanks > > >> >>> > > > >> >>> > On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma < > [email protected] > > > > > >> >>> wrote: > > >> >>> > > > >> >>> > > So a Get call with multiple columns on a single row should be > > >> >>> > > much > > >> >>> faster > > >> >>> > > than independent Get(s) on each of those columns for that > row. I > > >> >>> > > am > > >> >>> > > basically seeing severely poor performance (~ 15 seconds) for > > >> certain > > >> >>> > > deleteColumn() calls and I am seeing that there is a > > >> >>> > > prepareDeleteTimestamps() function in HRegion.java which first > > >> tries to > > >> >>> > > locate the column by doing individual gets on each column you > > >> >>> > > want > > >> to > > >> >>> > > delete (I am doing 300 column deletes). Now, I think this > should > > >> ideall > > >> >>> > by > > >> >>> > > 1 get call with the batch of 300 columns so that one scan can > > >> retrieve > > >> >>> > the > > >> >>> > > columns and the columns that are found, are indeed deleted. > > >> >>> > > > > >> >>> > > Before I try this fix, I wanted to get an opinion if it will > > make > > >> >>> > > a > > >> >>> > > difference to batch the get() and it seems from your answer, > it > > >> should. > > >> >>> > > > > >> >>> > > On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl < > [email protected] > > > > > >> >>> wrote: > > >> >>> > > > > >> >>> > > > Everything is stored as a KeyValue in HBase. > > >> >>> > > > The Key part of a KeyValue contains the row key, column > > family, > > >> >>> column > > >> >>> > > > name, and timestamp in that order. > > >> >>> > > > Each column family has it's own store and store files. > > >> >>> > > > > > >> >>> > > > So in a nutshell a get is executed by starting a scan at the > > >> >>> > > > row > > >> key > > >> >>> > > > (which is a prefix of the key) in each store (CF) and then > > >> scanning > > >> >>> > > forward > > >> >>> > > > in each store until the next row key is reached. (in reality > > it > > >> is a > > >> >>> > bit > > >> >>> > > > more complicated due to multiple versions, skipping columns, > > >> >>> > > > etc) > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > -- Lars > > >> >>> > > > ________________________________ > > >> >>> > > > From: Varun Sharma <[email protected]> > > >> >>> > > > To: [email protected] > > >> >>> > > > Sent: Friday, February 8, 2013 9:22 PM > > >> >>> > > > Subject: Re: Get on a row with multiple columns > > >> >>> > > > > > >> >>> > > > Sorry, I was a little unclear with my question. > > >> >>> > > > > > >> >>> > > > Lets say you have > > >> >>> > > > > > >> >>> > > > Get get = new Get(row) > > >> >>> > > > get.addColumn("1"); > > >> >>> > > > get.addColumn("2"); > > >> >>> > > > . > > >> >>> > > > . > > >> >>> > > > . > > >> >>> > > > > > >> >>> > > > When internally hbase executes the batch get, it will seek > to > > >> column > > >> >>> > "1", > > >> >>> > > > now since data is lexicographically sorted, it does not need > > to > > >> seek > > >> >>> > from > > >> >>> > > > the beginning to get to "2", it can continue seeking, > > >> >>> > > > henceforth > > >> >>> since > > >> >>> > > > column "2" will always be after column "1". I want to know > > >> whether > > >> >>> this > > >> >>> > > is > > >> >>> > > > how a multicolumn get on a row works or not. > > >> >>> > > > > > >> >>> > > > Thanks > > >> >>> > > > Varun > > >> >>> > > > > > >> >>> > > > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz < > [email protected]> > > >> wrote: > > >> >>> > > > > > >> >>> > > > > Like Ishan said, a get give an instance of the Result > class. > > >> >>> > > > > All utility methods that you can use are: > > >> >>> > > > > byte[] getValue(byte[] family, byte[] qualifier) > > >> >>> > > > > byte[] value() > > >> >>> > > > > byte[] getRow() > > >> >>> > > > > int size() > > >> >>> > > > > boolean isEmpty() > > >> >>> > > > > KeyValue[] raw() # Like Ishan said, all data here is > sorted > > >> >>> > > > > List<KeyValue> list() > > >> >>> > > > > > > >> >>> > > > > > > >> >>> > > > > > > >> >>> > > > > > > >> >>> > > > > On 02/08/2013 11:29 PM, Ishan Chhabra wrote: > > >> >>> > > > > > > >> >>> > > > >> Based on what I read in Lars' book, a get will return a > > >> result a > > >> >>> > > Result, > > >> >>> > > > >> which is internally a KeyValue[]. This KeyValue[] is > sorted > > >> by the > > >> >>> > key > > >> >>> > > > and > > >> >>> > > > >> you access this array using raw or list methods on the > > >> >>> > > > >> Result > > >> >>> > object. > > >> >>> > > > >> > > >> >>> > > > >> > > >> >>> > > > >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma < > > >> [email protected] > > >> >>> > > > >> >>> > > > wrote: > > >> >>> > > > >> > > >> >>> > > > >> +user > > >> >>> > > > >>> > > >> >>> > > > >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma < > > >> >>> [email protected]> > > >> >>> > > > >>> wrote: > > >> >>> > > > >>> > > >> >>> > > > >>> Hi, > > >> >>> > > > >>>> > > >> >>> > > > >>>> When I do a Get on a row with multiple column > qualifiers. > > >> Do we > > >> >>> > sort > > >> >>> > > > the > > >> >>> > > > >>>> column qualifers and make use of the sorted order when > we > > >> get > > >> >>> the > > >> >>> > > > >>>> > > >> >>> > > > >>> results ? > > >> >>> > > > >>> > > >> >>> > > > >>>> Thanks > > >> >>> > > > >>>> Varun > > >> >>> > > > >>>> > > >> >>> > > > >>>> > > >> >>> > > > >> > > >> >>> > > > >> > > >> >>> > > > > -- > > >> >>> > > > > Marcos Ortiz Valmaseda, > > >> >>> > > > > Product Manager && Data Scientist at UCI > > >> >>> > > > > Blog: http://marcosluis2186.**posterous.com< > > >> >>> > > > http://marcosluis2186.posterous.com> > > >> >>> > > > > Twitter: @marcosluis2186 > > >> >>> > > > > <http://twitter.com/**marcosluis2186< > > >> >>> > > > http://twitter.com/marcosluis2186> > > >> >>> > > > > > > > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >> > > >> >> > > >> > > > >> > > > > > >
