I should follow up with that I was asking why he was using an HTable Pool, not saying that it was wrong.
Still. I think in the pool the writes shouldn't have to go to the WAL. On Feb 19, 2013, at 10:01 AM, Michael Segel <[email protected]> wrote: > Good question.. > > You create a class MyRO. > > How many instances of MyRO exist per RS? > > How many queries can access the instance MyRO at the same time? > > > > > On Feb 19, 2013, at 9:15 AM, Wei Tan <[email protected]> wrote: > >> A side question: if HTablePool is not encouraged to be used... how we >> handle the thread safeness in using HTable? Any replacement for >> HTablePool, in plan? >> Thanks, >> >> >> Best Regards, >> Wei >> >> >> >> >> From: Michel Segel <[email protected]> >> To: "[email protected]" <[email protected]>, >> Date: 02/18/2013 09:23 AM >> Subject: Re: coprocessor enabled put very slow, help please~~~ >> >> >> >> Why are you using an HTable Pool? >> Why are you closing the table after each iteration through? >> >> Try using 1 HTable object. Turn off WAL >> Initiate in start() >> Close in Stop() >> Surround the use in a try / catch >> If exception caught, re instantiate new HTable connection. >> >> Maybe want to flush the connection after puts. >> >> >> Again not sure why you are using check and put on the base table. Your >> count could be off. >> >> As an example look at poem/rhyme 'Marry had a little lamb'. >> Then check your word count. >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Feb 18, 2013, at 7:21 AM, prakash kadel <[email protected]> >> wrote: >> >>> Thank you guys for your replies, >>> Michael, >>> I think i didnt make it clear. Here is my use case, >>> >>> I have text documents to insert in the hbase. (With possible duplicates) >>> Suppose i have a document as : " I am working. He is not working" >>> >>> I want to insert this document to a table in hbase, say table "doc" >>> >>> =doc table= >>> ----- >>> rowKey : doc_id >>> cf: doc_content >>> value: "I am working. He is not working" >>> >>> Now, i to create another table that stores the word count, say "doc_idx" >>> >>> doc_idx table >>> --- >>> rowKey : I, cf: count, value: 1 >>> rowKey : am, cf: count, value: 1 >>> rowKey : working, cf: count, value: 2 >>> rowKey : He, cf: count, value: 1 >>> rowKey : is, cf: count, value: 1 >>> rowKey : not, cf: count, value: 1 >>> >>> My MR job code: >>> ============== >>> >>> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) { >>> for(String word : doc_content.split("\\s+")) { >>> Increment inc = new Increment(Bytes.toBytes(word)); >>> inc.addColumn("count", "", 1); >>> } >>> } >>> >>> Now, i wanted to do some experiments with coprocessors. So, i modified >>> the code as follows. >>> >>> My MR job code: >>> =============== >>> >>> doc.checkAndPut(rowKey, doc_content, "", null, putDoc); >>> >>> Coprocessor code: >>> =============== >>> >>> public void start(CoprocessorEnvironment env) { >>> pool = new HTablePool(conf, 100); >>> } >>> >>> public boolean postCheckAndPut(c, row, family, byte[] qualifier, >>> compareOp, comparator, put, result) { >>> >>> if(!result) return true; // check if the put succeeded >>> >>> HTableInterface table_idx = pool.getTable("doc_idx"); >>> >>> try { >>> >>> for(KeyValue contentKV = put.get("doc_content", "")) { >>> for(String word : >>> contentKV.getValue().split("\\s+")) { >>> Increment inc = new >>> Increment(Bytes.toBytes(word)); >>> inc.addColumn("count", "", 1); >>> table_idx.increment(inc); >>> } >>> } >>> } finally { >>> table_idx.close(); >>> } >>> return true; >>> } >>> >>> public void stop(env) { >>> pool.close(); >>> } >>> >>> I am a newbee to HBASE. I am not sure this is the way to do. >>> Given that, why is the cooprocessor enabled version much slower than >>> the one without? >>> >>> >>> Sincerely, >>> Prakash Kadel >>> >>> >>> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel >>> <[email protected]> wrote: >>>> >>>> The issue I was talking about was the use of a check and put. >>>> The OP wrote: >>>>>>>> each map inserts to doc table.(checkAndPut) >>>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some >> rows to >>>>>>>> a index table. >>>> >>>> My question is why does the OP use a checkAndPut, and the >> RegionObserver's postChecAndPut? >>>> >>>> >>>> Here's a good example... >> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put >> >>>> >>>> The OP doesn't really get in to the use case, so we don't know why the >> Check and Put in the M/R job. >>>> He should just be using put() and then a postPut(). >>>> >>>> Another issue... since he's writing to a different HTable... how? Does >> he create an HTable instance in the start() method of his RO object and >> then reference it later? Or does he create the instance of the HTable on >> the fly in each postCheckAndPut() ? >>>> Without seeing his code, we don't know. >>>> >>>> Note that this is synchronous set of writes. Your overall return from >> the M/R call to put will wait until the second row is inserted. >>>> >>>> Interestingly enough, you may want to consider disabling the WAL on the >> write to the index. You can always run a M/R job that rebuilds the index >> should something occur to the system where you might lose the data. >> Indexes *ARE* expendable. ;-) >>>> >>>> Does that explain it? >>>> >>>> -Mike >>>> >>>> On Feb 18, 2013, at 4:57 AM, yonghu <[email protected]> wrote: >>>> >>>>> Hi, Michael >>>>> >>>>> I don't quite understand what do you mean by "round trip back to the >>>>> client". In my understanding, as the RegionServer and TaskTracker can >>>>> be the same node, MR don't have to pull data into client and then >>>>> process. And you also mention the "unnecessary overhead", can you >>>>> explain a little bit what operations or data processing can be seen as >>>>> "unnecessary overhead". >>>>> >>>>> Thanks >>>>> >>>>> yong >>>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel >>>>> <[email protected]> wrote: >>>>>> Why? >>>>>> >>>>>> This seems like an unnecessary overhead. >>>>>> >>>>>> You are writing code within the coprocessor on the server. >> Pessimistic code really isn't recommended if you are worried about >> performance. >>>>>> >>>>>> I have to ask... by the time you have executed the code in your >> co-processor, what would cause the initial write to fail? >>>>>> >>>>>> >>>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <[email protected]> >> wrote: >>>>>> >>>>>>> its a local read. i just check the last param of PostCheckAndPut >> indicating if the Put succeeded. Incase if the put success, i insert a row >> in another table >>>>>>> >>>>>>> Sincerely, >>>>>>> Prakash Kadel >>>>>>> >>>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <[email protected]> wrote: >>>>>>> >>>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the >> nature of >>>>>>>> LSM, read is much slower compared to a write... >>>>>>>> >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Wei >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From: Prakash Kadel <[email protected]> >>>>>>>> To: "[email protected]" <[email protected]>, >>>>>>>> Date: 02/17/2013 07:49 PM >>>>>>>> Subject: coprocessor enabled put very slow, help please~~~ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> hi, >>>>>>>> i am trying to insert few million documents to hbase with >> mapreduce. To >>>>>>>> enable quick search of docs i want to have some indexes, so i tried >> to use >>>>>>>> the coprocessors, but they are slowing down my inserts. Arent the >>>>>>>> coprocessors not supposed to increase the latency? >>>>>>>> my settings: >>>>>>>> 3 region servers >>>>>>>> 60 maps >>>>>>>> each map inserts to doc table.(checkAndPut) >>>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some >> rows to >>>>>>>> a index table. >>>>>>>> >>>>>>>> >>>>>>>> Sincerely, >>>>>>>> Prakash >>>>>> >>>>>> Michael Segel | (m) 312.755.9623 >>>>>> >>>>>> Segel and Associates >>> >> >> > >
