You might try http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)
St.Ack On Mon, Nov 29, 2010 at 10:03 AM, Claudio Martella <[email protected]> wrote: > Hi Lars, > > thanks for you answer. Yes, I read Percolator's paper, but I'd like to > get my problem solved with existing software solution, and i like HBase. > The ephemeral node is, i think, my last solution i proposed, the one i > called ZKsafe_insert(). Or? > > On 11/29/10 6:35 PM, Lars George wrote: >> Hi Claudio, >> >> Did you have a look at Google's Percolator paper? I think a mechanism like >> this may work. Another option often used to implement distributed >> transactions is using Zookeeper where you could create an ephemeral node on >> the new word and the host succeeding to do so is adding it and then >> releasing the lock. Or some such. >> >> Lars >> >> On Nov 29, 2010, at 16:12, Claudio Martella <[email protected]> >> wrote: >> >>> Hello list, >>> >>> I'm kind of new to HBase, so I'll post this email with a request for >>> comment. >>> Very briefly, I do a lot of text processing with mapreduce, so it's very >>> useful for me to convert string to longs, so i can make my computations >>> faster. >>> >>> My corpus keeps on growing and I want this String->Long mapping to be >>> persistent and dynamical (i want to add new mappings when i find new words). >>> At the moment i'm tackling the problem this way (pseudo-code): >>> >>> longvalue = convert(word) # gets from hbase >>> if longvalue == -1: >>> longvalue = insert(word) # puts in hbase >>> >>> longvalue now contains the new mapped value. This approach requires a >>> global counter that saves the latest mapped long and increments at every >>> insert. I can easily do this two ways. A special row in hbase "_counter" >>> that I increment through IncrementColumnValue, or creating a sequential >>> non-ephemeral znode in zookeeper and use the version as my counter. The >>> first one is of course faster. So the solution would be: >>> >>> insert(word): >>> longvalue = hbase.incrementColumnValue("_counter", "v") >>> hbase.put(word, longvalue) >>> return longvalue >>> >>> The problem is that between the time i realize there's no mapping for my >>> word and the time i insert the new longvalue, somebody else might have >>> done the same for me, so I have a corrupted dictionary. >>> >>> One possible solution would be to acquire a lock on the "_counter" row, >>> recheck for the presence of the mapping and then insert my new value: >>> >>> safe_insert(word): >>> lock("_counter") >>> longvalue = convert(word) >>> if longvalue == -1: #nobody inserted the mapping in the meantime >>> longvalue = insert(word) >>> unlock("_counter") >>> return longvalue >>> >>> This way the counter row, with its lock, would behave as a global lock. >>> This would solve my problems but would create a bottleneck (although >>> with time my inserts tend to get very rare as the dictionary grows). A >>> solution to this problem would be to have locks on zookeeper based on words. >>> >>> ZKsafe_insert(word): >>> ZKlock("/words/"+ word) >>> longvalue = convert(word) >>> if longvalue == -1: #nobody inserted the mapping in the meantime >>> longvalue = insert(word) >>> ZKunlock("/words/"+word) >>> return longvalue >>> >>> This of course would allow me to have more finegrained locks and better >>> scalability, but I'd relay on a system with higher latency (ZK). >>> >>> Does anybody have a better solution with hbase? I guess using >>> hbase_transational would also be a possibility, but again, what about >>> speed and the actual issues with the package (like recovering in the >>> face of hregion failure). >>> >>> >>> Thank you, >>> >>> Claudio >>> >>> -- >>> Claudio Martella >>> Digital Technologies >>> Unit Research & Development - Analyst >>> >>> TIS innovation park >>> Via Siemens 19 | Siemensstr. 19 >>> 39100 Bolzano | 39100 Bozen >>> Tel. +39 0471 068 123 >>> Fax +39 0471 068 129 >>> [email protected] http://www.tis.bz.it >>> >>> Short information regarding use of personal data. According to Section 13 >>> of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that >>> we process your personal data in order to fulfil contractual and fiscal >>> obligations and also to send you information regarding our services and >>> events. Your personal data are processed with and without electronic means >>> and by respecting data subjects' rights, fundamental freedoms and dignity, >>> particularly with regard to confidentiality, personal identity and the >>> right to personal data protection. At any time and without formalities you >>> can write an e-mail to [email protected] in order to object the processing >>> of your personal data for the purpose of sending advertising materials and >>> also to exercise the right to access personal data and other rights >>> referred to in Section 7 of Decree 196/2003. The data controller is TIS >>> Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find >>> the complete information on the web site www.tis.bz.it. >>> >>> > > > -- > Claudio Martella > Digital Technologies > Unit Research & Development - Analyst > > TIS innovation park > Via Siemens 19 | Siemensstr. 19 > 39100 Bolzano | 39100 Bozen > Tel. +39 0471 068 123 > Fax +39 0471 068 129 > [email protected] http://www.tis.bz.it > > Short information regarding use of personal data. According to Section 13 of > Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we > process your personal data in order to fulfil contractual and fiscal > obligations and also to send you information regarding our services and > events. Your personal data are processed with and without electronic means > and by respecting data subjects' rights, fundamental freedoms and dignity, > particularly with regard to confidentiality, personal identity and the right > to personal data protection. At any time and without formalities you can > write an e-mail to [email protected] in order to object the processing of > your personal data for the purpose of sending advertising materials and also > to exercise the right to access personal data and other rights referred to in > Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation > Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete > information on the web site www.tis.bz.it. > > >
