You might try 
http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[],
byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)

St.Ack

On Mon, Nov 29, 2010 at 10:03 AM, Claudio Martella
<[email protected]> wrote:
> Hi Lars,
>
> thanks for you answer. Yes, I read Percolator's paper, but I'd like to
> get my problem solved with existing software solution, and i like HBase.
> The ephemeral node is, i think, my last solution i proposed, the one i
> called ZKsafe_insert(). Or?
>
> On 11/29/10 6:35 PM, Lars George wrote:
>> Hi Claudio,
>>
>> Did you have a look at Google's Percolator paper? I think a mechanism like 
>> this may work. Another option often used to implement distributed 
>> transactions is using Zookeeper where you could create an ephemeral node on 
>> the new word and the host succeeding to do so is adding it and then 
>> releasing the lock. Or some such.
>>
>> Lars
>>
>> On Nov 29, 2010, at 16:12, Claudio Martella <[email protected]> 
>> wrote:
>>
>>> Hello list,
>>>
>>> I'm kind of new to HBase, so I'll post this email with a request for
>>> comment.
>>> Very briefly, I do a lot of text processing with mapreduce, so it's very
>>> useful for me to convert string to longs, so i can make my computations
>>> faster.
>>>
>>> My corpus keeps on growing and I want this String->Long mapping to be
>>> persistent and dynamical (i want to add new mappings when i find new words).
>>> At the moment i'm tackling the problem this way (pseudo-code):
>>>
>>> longvalue = convert(word) # gets from hbase
>>> if longvalue == -1:
>>>    longvalue = insert(word) # puts in hbase
>>>
>>> longvalue now contains the new mapped value. This approach requires a
>>> global counter that saves the latest mapped long and increments at every
>>> insert. I can easily do this two ways. A special row in hbase "_counter"
>>> that I increment through IncrementColumnValue, or creating a sequential
>>> non-ephemeral znode in zookeeper and use the version as my counter. The
>>> first one is of course faster. So the solution would be:
>>>
>>> insert(word):
>>>    longvalue = hbase.incrementColumnValue("_counter", "v")
>>>    hbase.put(word, longvalue)
>>>    return longvalue
>>>
>>> The problem is that between the time i realize there's no mapping for my
>>> word and the time i insert the new longvalue, somebody else might have
>>> done the same for me, so I have a corrupted dictionary.
>>>
>>> One possible solution would be to acquire a lock on the "_counter" row,
>>> recheck for the presence of the mapping and then insert my new value:
>>>
>>> safe_insert(word):
>>>    lock("_counter")
>>>    longvalue = convert(word)
>>>    if longvalue == -1: #nobody inserted the mapping in the meantime
>>>        longvalue = insert(word)
>>>    unlock("_counter")
>>>    return longvalue
>>>
>>> This way the counter row, with its lock, would behave as a global lock.
>>> This would solve my problems but would create a bottleneck (although
>>> with time my inserts tend to get very rare as the dictionary grows). A
>>> solution to this problem would be to have locks on zookeeper based on words.
>>>
>>> ZKsafe_insert(word):
>>>    ZKlock("/words/"+ word)
>>>    longvalue = convert(word)
>>>    if longvalue == -1: #nobody inserted the mapping in the meantime
>>>        longvalue = insert(word)
>>>    ZKunlock("/words/"+word)
>>>    return longvalue
>>>
>>> This of course would allow me to have more finegrained locks and better
>>> scalability, but I'd relay on a system with higher latency (ZK).
>>>
>>> Does anybody have a better solution with hbase? I guess using
>>> hbase_transational would also be a possibility, but again, what about
>>> speed and the actual issues with the package (like recovering in the
>>> face of hregion failure).
>>>
>>>
>>> Thank you,
>>>
>>> Claudio
>>>
>>> --
>>> Claudio Martella
>>> Digital Technologies
>>> Unit Research & Development - Analyst
>>>
>>> TIS innovation park
>>> Via Siemens 19 | Siemensstr. 19
>>> 39100 Bolzano | 39100 Bozen
>>> Tel. +39 0471 068 123
>>> Fax  +39 0471 068 129
>>> [email protected] http://www.tis.bz.it
>>>
>>> Short information regarding use of personal data. According to Section 13 
>>> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that 
>>> we process your personal data in order to fulfil contractual and fiscal 
>>> obligations and also to send you information regarding our services and 
>>> events. Your personal data are processed with and without electronic means 
>>> and by respecting data subjects' rights, fundamental freedoms and dignity, 
>>> particularly with regard to confidentiality, personal identity and the 
>>> right to personal data protection. At any time and without formalities you 
>>> can write an e-mail to [email protected] in order to object the processing 
>>> of your personal data for the purpose of sending advertising materials and 
>>> also to exercise the right to access personal data and other rights 
>>> referred to in Section 7 of Decree 196/2003. The data controller is TIS 
>>> Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find 
>>> the complete information on the web site www.tis.bz.it.
>>>
>>>
>
>
> --
> Claudio Martella
> Digital Technologies
> Unit Research & Development - Analyst
>
> TIS innovation park
> Via Siemens 19 | Siemensstr. 19
> 39100 Bolzano | 39100 Bozen
> Tel. +39 0471 068 123
> Fax  +39 0471 068 129
> [email protected] http://www.tis.bz.it
>
> Short information regarding use of personal data. According to Section 13 of 
> Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
> process your personal data in order to fulfil contractual and fiscal 
> obligations and also to send you information regarding our services and 
> events. Your personal data are processed with and without electronic means 
> and by respecting data subjects' rights, fundamental freedoms and dignity, 
> particularly with regard to confidentiality, personal identity and the right 
> to personal data protection. At any time and without formalities you can 
> write an e-mail to [email protected] in order to object the processing of 
> your personal data for the purpose of sending advertising materials and also 
> to exercise the right to access personal data and other rights referred to in 
> Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation 
> Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete 
> information on the web site www.tis.bz.it.
>
>
>

Reply via email to