You mean the end point is geetting executed with high QoS? You checked with some logs?
-Anoop- ________________________________________ From: Varun Sharma [[email protected]] Sent: Monday, February 11, 2013 4:05 AM To: [email protected]; lars hofhansl Subject: Re: Get on a row with multiple columns Back to BulkDeleteEndpoint, i got it to work but why are the scanner.next() calls executing on the Priority handler queue ? Varun On Sat, Feb 9, 2013 at 8:46 AM, lars hofhansl <[email protected]> wrote: > The answer is "probably" :) > It's disabled in 0.96 by default. Check out HBASE-7008 ( > https://issues.apache.org/jira/browse/HBASE-7008) and the discussion > there. > > Also check out the discussion in HBASE-5943 and HADOOP-8069 ( > https://issues.apache.org/jira/browse/HADOOP-8069) > > > -- Lars > > > > ________________________________ > From: Jean-Marc Spaggiari <[email protected]> > To: [email protected] > Sent: Saturday, February 9, 2013 5:02 AM > Subject: Re: Get on a row with multiple columns > > Lars, should we always consider disabling Nagle? What's the down side? > > JM > > 2013/2/9, Varun Sharma <[email protected]>: > > Yeah, I meant true... > > > > On Sat, Feb 9, 2013 at 12:17 AM, lars hofhansl <[email protected]> wrote: > > > >> Should be set to true. If tcpnodelay is set to true, Nagle's is > disabled. > >> > >> -- Lars > >> > >> > >> > >> ________________________________ > >> From: Varun Sharma <[email protected]> > >> To: [email protected]; lars hofhansl <[email protected]> > >> Sent: Saturday, February 9, 2013 12:11 AM > >> Subject: Re: Get on a row with multiple columns > >> > >> > >> Okay I did my research - these need to be set to false. I agree. > >> > >> > >> On Sat, Feb 9, 2013 at 12:05 AM, Varun Sharma <[email protected]> > >> wrote: > >> > >> I have ipc.client.tcpnodelay, ipc.server.tcpnodelay set to false and the > >> hbase one - [hbase].ipc.client.tcpnodelay set to true. Do these induce > >> network latency ? > >> > > >> > > >> >On Fri, Feb 8, 2013 at 11:57 PM, lars hofhansl <[email protected]> > wrote: > >> > > >> >Sorry.. I meant set these two config parameters to true (not false as I > >> state below). > >> >> > >> >> > >> >> > >> >> > >> >>----- Original Message ----- > >> >>From: lars hofhansl <[email protected]> > >> >>To: "[email protected]" <[email protected]> > >> >>Cc: > >> >>Sent: Friday, February 8, 2013 11:41 PM > >> >>Subject: Re: Get on a row with multiple columns > >> >> > >> >>Only somewhat related. Seeing the magic 40ms random read time there. > >> >> Did > >> you disable Nagle's? > >> >>(set hbase.ipc.client.tcpnodelay and ipc.server.tcpnodelay to false in > >> hbase-site.xml). > >> >> > >> >>________________________________ > >> >>From: Varun Sharma <[email protected]> > >> >>To: [email protected]; lars hofhansl <[email protected]> > >> >>Sent: Friday, February 8, 2013 10:45 PM > >> >>Subject: Re: Get on a row with multiple columns > >> >> > >> >>The use case is like your twitter feed. Tweets from people u follow. > >> >> When > >> >>someone unfollows, you need to delete a bunch of his tweets from the > >> >>following feed. So, its frequent, and we are essentially running into > >> some > >> >>extreme corner cases like the one above. We need high write throughput > >> for > >> >>this, since when someone tweets, we need to fanout the tweet to all > the > >> >>followers. We need the ability to do fast deletes (unfollow) and fast > >> adds > >> >>(follow) and also be able to do fast random gets - when a real user > >> >> loads > >> >>the feed. I doubt we will able to play much with the schema here since > >> >> we > >> >>need to support a bunch of use cases. > >> >> > >> >>@lars: It does not take 30 seconds to place 300 delete markers. It > >> >> takes > >> 30 > >> >>seconds to first find which of those 300 pins are in the set of > columns > >> >>present - this invokes 300 gets and then place the appropriate delete > >> >>markers. Note that we can have tens of thousands of columns in a > single > >> row > >> >>so a single get is not cheap. > >> >> > >> >>If we were to just place delete markers, that is very fast. But when > >> >>started doing that, our random read performance suffered because of > too > >> >>many delete markers. The 90th percentile on random reads shot up from > >> >> 40 > >> >>milliseconds to 150 milliseconds, which is not acceptable for our > >> usecase. > >> >> > >> >>Thanks > >> >>Varun > >> >> > >> >>On Fri, Feb 8, 2013 at 10:33 PM, lars hofhansl <[email protected]> > >> >> wrote: > >> >> > >> >>> Can you organize your columns and then delete by column family? > >> >>> > >> >>> deleteColumn without specifying a TS is expensive, since HBase first > >> has > >> >>> to figure out what the latest TS is. > >> >>> > >> >>> Should be better in 0.94.1 or later since deletes are batched like > >> >>> Puts > >> >>> (still need to retrieve the latest version, though). > >> >>> > >> >>> In 0.94.3 or later you can also the BulkDeleteEndPoint, which > >> >>> basically > >> >>> let's specify a scan condition and then place specific delete marker > >> for > >> >>> all KVs encountered. > >> >>> > >> >>> > >> >>> If you wanted to get really > >> >>> fancy, you could hook up a coprocessor to the compaction process and > >> >>> simply filter all KVs you no longer want (without ever placing any > >> >>> delete markers). > >> >>> > >> >>> > >> >>> Are you saying it takes 15 seconds to place 300 version delete > >> markers?! > >> >>> > >> >>> > >> >>> -- Lars > >> >>> > >> >>> > >> >>> > >> >>> ________________________________ > >> >>> From: Varun Sharma <[email protected]> > >> >>> To: [email protected] > >> >>> Sent: Friday, February 8, 2013 10:05 PM > >> >>> Subject: Re: Get on a row with multiple columns > >> >>> > >> >>> We are given a set of 300 columns to delete. I tested two cases: > >> >>> > >> >>> 1) deleteColumns() - with the 's' > >> >>> > >> >>> This function simply adds delete markers for 300 columns, in our > >> >>> case, > >> >>> typically only a fraction of these columns are actually present - > 10. > >> After > >> >>> starting to use deleteColumns, we starting seeing a drop in cluster > >> wide > >> >>> random read performance - 90th percentile latency worsened, so did > >> >>> 99th > >> >>> probably because of having to traverse delete markers. I attribute > >> this to > >> >>> profusion of delete markers in the cluster. Major compactions slowed > >> down > >> >>> by almost 50 percent probably because of having to clean out > >> significantly > >> >>> more delete markers. > >> >>> > >> >>> 2) deleteColumn() > >> >>> > >> >>> Ended up with untolerable 15 second calls, which clogged all the > >> handlers. > >> >>> Making the cluster pretty much unresponsive. > >> >>> > >> >>> On Fri, Feb 8, 2013 at 9:55 PM, Ted Yu <[email protected]> wrote: > >> >>> > >> >>> > For the 300 column deletes, can you show us how the Delete(s) are > >> >>> > constructed ? > >> >>> > > >> >>> > Do you use this method ? > >> >>> > > >> >>> > public Delete deleteColumns(byte [] family, byte [] qualifier) { > >> >>> > Thanks > >> >>> > > >> >>> > On Fri, Feb 8, 2013 at 9:44 PM, Varun Sharma <[email protected] > > > >> >>> wrote: > >> >>> > > >> >>> > > So a Get call with multiple columns on a single row should be > >> >>> > > much > >> >>> faster > >> >>> > > than independent Get(s) on each of those columns for that row. I > >> >>> > > am > >> >>> > > basically seeing severely poor performance (~ 15 seconds) for > >> certain > >> >>> > > deleteColumn() calls and I am seeing that there is a > >> >>> > > prepareDeleteTimestamps() function in HRegion.java which first > >> tries to > >> >>> > > locate the column by doing individual gets on each column you > >> >>> > > want > >> to > >> >>> > > delete (I am doing 300 column deletes). Now, I think this should > >> ideall > >> >>> > by > >> >>> > > 1 get call with the batch of 300 columns so that one scan can > >> retrieve > >> >>> > the > >> >>> > > columns and the columns that are found, are indeed deleted. > >> >>> > > > >> >>> > > Before I try this fix, I wanted to get an opinion if it will > make > >> >>> > > a > >> >>> > > difference to batch the get() and it seems from your answer, it > >> should. > >> >>> > > > >> >>> > > On Fri, Feb 8, 2013 at 9:34 PM, lars hofhansl <[email protected] > > > >> >>> wrote: > >> >>> > > > >> >>> > > > Everything is stored as a KeyValue in HBase. > >> >>> > > > The Key part of a KeyValue contains the row key, column > family, > >> >>> column > >> >>> > > > name, and timestamp in that order. > >> >>> > > > Each column family has it's own store and store files. > >> >>> > > > > >> >>> > > > So in a nutshell a get is executed by starting a scan at the > >> >>> > > > row > >> key > >> >>> > > > (which is a prefix of the key) in each store (CF) and then > >> scanning > >> >>> > > forward > >> >>> > > > in each store until the next row key is reached. (in reality > it > >> is a > >> >>> > bit > >> >>> > > > more complicated due to multiple versions, skipping columns, > >> >>> > > > etc) > >> >>> > > > > >> >>> > > > > >> >>> > > > -- Lars > >> >>> > > > ________________________________ > >> >>> > > > From: Varun Sharma <[email protected]> > >> >>> > > > To: [email protected] > >> >>> > > > Sent: Friday, February 8, 2013 9:22 PM > >> >>> > > > Subject: Re: Get on a row with multiple columns > >> >>> > > > > >> >>> > > > Sorry, I was a little unclear with my question. > >> >>> > > > > >> >>> > > > Lets say you have > >> >>> > > > > >> >>> > > > Get get = new Get(row) > >> >>> > > > get.addColumn("1"); > >> >>> > > > get.addColumn("2"); > >> >>> > > > . > >> >>> > > > . > >> >>> > > > . > >> >>> > > > > >> >>> > > > When internally hbase executes the batch get, it will seek to > >> column > >> >>> > "1", > >> >>> > > > now since data is lexicographically sorted, it does not need > to > >> seek > >> >>> > from > >> >>> > > > the beginning to get to "2", it can continue seeking, > >> >>> > > > henceforth > >> >>> since > >> >>> > > > column "2" will always be after column "1". I want to know > >> whether > >> >>> this > >> >>> > > is > >> >>> > > > how a multicolumn get on a row works or not. > >> >>> > > > > >> >>> > > > Thanks > >> >>> > > > Varun > >> >>> > > > > >> >>> > > > On Fri, Feb 8, 2013 at 9:08 PM, Marcos Ortiz <[email protected]> > >> wrote: > >> >>> > > > > >> >>> > > > > Like Ishan said, a get give an instance of the Result class. > >> >>> > > > > All utility methods that you can use are: > >> >>> > > > > byte[] getValue(byte[] family, byte[] qualifier) > >> >>> > > > > byte[] value() > >> >>> > > > > byte[] getRow() > >> >>> > > > > int size() > >> >>> > > > > boolean isEmpty() > >> >>> > > > > KeyValue[] raw() # Like Ishan said, all data here is sorted > >> >>> > > > > List<KeyValue> list() > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > On 02/08/2013 11:29 PM, Ishan Chhabra wrote: > >> >>> > > > > > >> >>> > > > >> Based on what I read in Lars' book, a get will return a > >> result a > >> >>> > > Result, > >> >>> > > > >> which is internally a KeyValue[]. This KeyValue[] is sorted > >> by the > >> >>> > key > >> >>> > > > and > >> >>> > > > >> you access this array using raw or list methods on the > >> >>> > > > >> Result > >> >>> > object. > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > > >> On Fri, Feb 8, 2013 at 5:40 PM, Varun Sharma < > >> [email protected] > >> >>> > > >> >>> > > > wrote: > >> >>> > > > >> > >> >>> > > > >> +user > >> >>> > > > >>> > >> >>> > > > >>> On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma < > >> >>> [email protected]> > >> >>> > > > >>> wrote: > >> >>> > > > >>> > >> >>> > > > >>> Hi, > >> >>> > > > >>>> > >> >>> > > > >>>> When I do a Get on a row with multiple column qualifiers. > >> Do we > >> >>> > sort > >> >>> > > > the > >> >>> > > > >>>> column qualifers and make use of the sorted order when we > >> get > >> >>> the > >> >>> > > > >>>> > >> >>> > > > >>> results ? > >> >>> > > > >>> > >> >>> > > > >>>> Thanks > >> >>> > > > >>>> Varun > >> >>> > > > >>>> > >> >>> > > > >>>> > >> >>> > > > >> > >> >>> > > > >> > >> >>> > > > > -- > >> >>> > > > > Marcos Ortiz Valmaseda, > >> >>> > > > > Product Manager && Data Scientist at UCI > >> >>> > > > > Blog: http://marcosluis2186.**posterous.com< > >> >>> > > > http://marcosluis2186.posterous.com> > >> >>> > > > > Twitter: @marcosluis2186 > >> >>> > > > > <http://twitter.com/**marcosluis2186< > >> >>> > > > http://twitter.com/marcosluis2186> > >> >>> > > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >> > >> >> > >> > > >> > > >
