Ok. Now, I got your point. I didn't notice the "checkAndPut".

regards!

Yong

On Mon, Feb 18, 2013 at 1:11 PM, Michael Segel
<michael_se...@hotmail.com> wrote:
>
> The  issue I was talking about was the use of a check and put.
> The OP wrote:
>>>>> each map inserts to doc table.(checkAndPut)
>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows to
>>>>> a index table.
>
> My question is why does the OP use a checkAndPut, and the RegionObserver's 
> postChecAndPut?
>
>
> Here's a good example... 
> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
>
> The OP doesn't really get in to the use case, so we don't know why the Check 
> and Put in the M/R job.
> He should just be using put() and then a postPut().
>
> Another issue... since he's writing to  a different HTable... how? Does he 
> create an HTable instance in the start() method of his RO object and then 
> reference it later? Or does he create the instance of the HTable on the fly 
> in each postCheckAndPut() ?
> Without seeing his code, we don't know.
>
> Note that this is synchronous set of writes. Your overall return from the M/R 
> call to put will wait until the second row is inserted.
>
> Interestingly enough, you may want to consider disabling the WAL on the write 
> to the index.  You can always run a M/R job that rebuilds the index should 
> something occur to the system where you might lose the data.  Indexes *ARE* 
> expendable. ;-)
>
> Does that explain it?
>
> -Mike
>
> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote:
>
>> Hi, Michael
>>
>> I don't quite understand what do you mean by "round trip back to the
>> client". In my understanding, as the RegionServer and TaskTracker can
>> be the same node, MR don't have to pull data into client and then
>> process.  And you also mention the "unnecessary overhead", can you
>> explain a little bit what operations or data processing can be seen as
>> "unnecessary overhead".
>>
>> Thanks
>>
>> yong
>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>> <michael_se...@hotmail.com> wrote:
>>> Why?
>>>
>>> This seems like an unnecessary overhead.
>>>
>>> You are writing code within the coprocessor on the server.  Pessimistic 
>>> code really isn't recommended if you are worried about performance.
>>>
>>> I have to ask... by the time you have executed the code in your 
>>> co-processor, what would cause the initial write to fail?
>>>
>>>
>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com> wrote:
>>>
>>>> its a local read. i just check the last param of PostCheckAndPut 
>>>> indicating if the Put succeeded. Incase if the put success, i insert a row 
>>>> in another table
>>>>
>>>> Sincerely,
>>>> Prakash Kadel
>>>>
>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote:
>>>>
>>>>> Is your CheckAndPut involving a local or remote READ? Due to the nature of
>>>>> LSM, read is much slower compared to a write...
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Wei
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> From:   Prakash Kadel <prakash.ka...@gmail.com>
>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>>> Date:   02/17/2013 07:49 PM
>>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>>
>>>>>
>>>>>
>>>>> hi,
>>>>> i am trying to insert few million documents to hbase with mapreduce. To
>>>>> enable quick search of docs i want to have some indexes, so i tried to use
>>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>>> coprocessors not supposed to increase the latency?
>>>>> my settings:
>>>>>  3 region servers
>>>>> 60 maps
>>>>> each map inserts to doc table.(checkAndPut)
>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows to
>>>>> a index table.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>> Prakash
>>>>>
>>>>
>>>
>>> Michael Segel  | (m) 312.755.9623
>>>
>>> Segel and Associates
>>>
>>>
>>
>

Reply via email to