Re: Is HBase Thread-Safety?
NNever, Thanks so much for your answers! On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote: 1. A pre-row lock is here during the update, so other clients will block whild on client performs an update.(see HRegion.put 's annotaion), no exception. In the client side, while a process is updating, it may not reach the buffersize so the other process may read the original value, I think. 2. What kind of inconsistency? different value on the same row's qualifier? The inconsistency means for the same retrieval, such as a scan, different values are got in different threads for the multiple instances of a HTable in them, respectively. Is it possible? In my case, the little bit inconsistency is not so critical. So I will not worry about the thread-safety issue. It must be fine? 3.I don't know the truely realize in code. There Is caching, but everytime you call methods like Htable.get, it still need connect to server to retrieve data——so, not as fast as in memory, isn't it? I plan to design a read-only mechanism except when a periodically-updatinge in with HBase for my system to raise the performance. Locking must affect the performance. If caching is not fast enough in HBase, the design might not be good? Thanks again! Best, Bing Best regards, nn 2012/4/13 Bing Li lbl...@gmail.com Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Fwd: Is HBase Thread-Safety?
NNever, Thanks so much for your answers! On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote: 1. A pre-row lock is here during the update, so other clients will block whild on client performs an update.(see HRegion.put 's annotaion), no exception. In the client side, while a process is updating, it may not reach the buffersize so the other process may read the original value, I think. 2. What kind of inconsistency? different value on the same row's qualifier? The inconsistency means for the same retrieval, such as a scan, different values are got in different threads for the multiple instances of a HTable in them, respectively. Is it possible? In my case, the little bit inconsistency is not so critical. So I will not worry about the thread-safety issue. It must be fine? 3.I don't know the truely realize in code. There Is caching, but everytime you call methods like Htable.get, it still need connect to server to retrieve data——so, not as fast as in memory, isn't it? I plan to design a read-only mechanism except when a periodically-updatinge in with HBase for my system to raise the performance. Locking must affect the performance. If caching is not fast enough in HBase, the design might not be good? Thanks again! Best, Bing Best regards, nn 2012/4/13 Bing Li lbl...@gmail.com Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Re: Doumentation broken
Looks like /book got moved under another /book, so something is definitely wrong. You can try an unstyled version at: http://hbase.apache.org/book/book/book.html Cheers, Oliver On 2012-04-13, at 9:59 AM, Nitin Pawar wrote: Hello, Is there any maintenance going on with hbase.apache.org? All the links (ex: http://hbase.apache.org/book/architecture.html#arch.overview) are return 404 NOT FOUND Thanks, Nitin Pawar
Re: Doumentation broken
Thanks for quick reply Oliver ~nitin On Fri, Apr 13, 2012 at 1:32 PM, Oliver Meyn (GBIF) om...@gbif.org wrote: Looks like /book got moved under another /book, so something is definitely wrong. You can try an unstyled version at: http://hbase.apache.org/book/book/book.html Cheers, Oliver On 2012-04-13, at 9:59 AM, Nitin Pawar wrote: Hello, Is there any maintenance going on with hbase.apache.org? All the links (ex: http://hbase.apache.org/book/architecture.html#arch.overview) are return 404 NOT FOUND Thanks, Nitin Pawar -- Nitin Pawar
Re: Zookeeper available but no active master location found
Hi, Literally, it means that ZooKeeper is there but the hbase client can't find the hbase master address in it. By default, the node used is /hbase/master, and it contains the hostname and port of the master. You can check its content in ZK by doing a get /hbase/master in bin/zkCli.sh (see http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_ConnectingToZooKeeper ). There should be a root cause for this, so it worths looking for other error messages in the logs (master especially). N. On Fri, Apr 13, 2012 at 1:23 AM, Henri Pipe henri.p...@gmail.com wrote: client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found Having a problem with master startup that I have not seen before. running the following packages: hadoop-hbase-0.90.4+49.137-1 hadoop-0.20-secondarynamenode-0.20.2+923.197-1 hadoop-hbase-thrift-0.90.4+49.137-1 hadoop-zookeeper-3.3.4+19.3-1 hadoop-0.20-datanode-0.20.2+923.197-1 hadoop-0.20-namenode-0.20.2+923.197-1 hadoop-0.20-tasktracker-0.20.2+923.197-1 hadoop-hbase-regionserver-0.90.4+49.137-1 hadoop-zookeeper-server-3.3.4+19.3-1 hadoop-0.20-0.20.2+923.197-1 hadoop-0.20-jobtracker-0.20.2+923.197-1 hadoop-hbase-master-0.90.4+49.137-1 [root@ip-10-251-27-130 logs]# java -version java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) I start master and region server on another node. Master is initialized, but as soon as I try to check the master_status or do a zkdump via web interface, it blows up with: 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: getMaster attempt 10 of 10 failed; retrying after sleep of 16000 I am running three zookeepers: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/mnt/zookeeper # The maximum number of zookeeper client connections maxClientCnxns=2000 # the port at which the clients will connect clientPort=2181 server.1=10.251.27.130:2888:3888 server.2=10.250.9.220:2888:3888 server.3=10.251.110.50:2888:3888 I can telnet to the zookeepers just fine. Here is my hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value10.251.27.130,10.250.9.220,10.251.110.50/value /property property namehbase.zookeeper.property.dataDir/name value/hadoop/zookeeper/data/value /property property namehbase.zookeeper.property.maxClientCnxns/name value2000/value finaltrue/final /property /configuration Any thoughts? Any help is greatly appreciated. Thanks Henri Pipe
Re: Is HBase Thread-Safety?
Hi there- Especially with respect to the caching, HBase has a block cache so I think it would be a good idea to review the architecture chapter http://hbase.apache.org/book.html#architecture On 4/13/12 3:45 AM, Bing Li lbl...@gmail.com wrote: NNever, Thanks so much for your answers! On Fri, Apr 13, 2012 at 10:50 AM, NNever nnever...@gmail.com wrote: 1. A pre-row lock is here during the update, so other clients will block whild on client performs an update.(see HRegion.put 's annotaion), no exception. In the client side, while a process is updating, it may not reach the buffersize so the other process may read the original value, I think. 2. What kind of inconsistency? different value on the same row's qualifier? The inconsistency means for the same retrieval, such as a scan, different values are got in different threads for the multiple instances of a HTable in them, respectively. Is it possible? In my case, the little bit inconsistency is not so critical. So I will not worry about the thread-safety issue. It must be fine? 3.I don't know the truely realize in code. There Is caching, but everytime you call methods like Htable.get, it still need connect to server to retrieve data‹‹so, not as fast as in memory, isn't it? I plan to design a read-only mechanism except when a periodically-updatinge in with HBase for my system to raise the performance. Locking must affect the performance. If caching is not fast enough in HBase, the design might not be good? Thanks again! Best, Bing Best regards, nn 2012/4/13 Bing Li lbl...@gmail.com Dear Iars, Thanks so much for your reply! In my case, I need to overwrite or update a HTable. If reading during the process of updating or overwriting, any exceptions will be thrown by HBase? If multiple instances for a HTable are used by multiple threads, there must be inconsistency among them, right? I guess caching must be done in HBase. So retrieving in HTable must be almost as fast as in memory? Best regards, Bing On Fri, Apr 13, 2012 at 6:17 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Bing, Which part? The server certainly is thread safe. The client is not, at least not all the way through. The main consideration is HTable, which is not thread safe, you need to create one instance for each thread (HBASE-4805 makes that much cheaper), store the HTable in a ThreadLocal after creation, or use HTablePool. Please let me know if that answers your question. Thanks. -- Lars - Original Message - From: Bing Li lbl...@gmail.com To: hbase-u...@hadoop.apache.org; user user@hbase.apache.org Cc: Sent: Thursday, April 12, 2012 3:10 PM Subject: Is HBase Thread-Safety? Dear all, Is HBase thread-safety? Do I need to consider the consistency issue when manipulating HBase? Thanks so much! Best regards, Bing
Re: Doumentation broken
On Fri, Apr 13, 2012 at 5:05 AM, Doug Meil doug.m...@explorysmedical.com wrote: Stack was still working on the site as of the end of day yesterday... Sorry about that. Our site needed a bit of updating -- conflicting mvn plugin versions, etc., hbase-5772 -- and it took me a while. It should be good now. Let me know if not. St.Ack
RE: Hbase Map reduce and Index
You need to apply the filter ( that filtered out Electronics items) on table before you pass the table to mapper. -Original Message- From: Karthik Pandian [mailto:karthik...@gmail.com] Sent: Monday, January 02, 2012 2:09 AM To: hbase-u...@hadoop.apache.org Subject: Hbase Map reduce and Index I am crawling different industry data and storing the data into single hbase table. For example I am crawling Electronics and Computer industries and stored in a table called 'industry_tbl'. Now I want to run a map reduce on the sets of data namely for Electronics and computer industries and produce the reducer output with the different sets of data collected but currently hbase is taking the entire data of both the industries and giving me the reduced results which I cant differentiate by Industries. Any Help or idea on how to solve this? -- View this message in context: http://old.nabble.com/Hbase-Map-reduce-and-Index-tp33064563p33064563.html Sent from the HBase User mailing list archive at Nabble.com.
Re: Zookeeper available but no active master location found
What do you see in the master log? St.Ack On Fri, Apr 13, 2012 at 11:00 AM, Henri Pipe henri.p...@gmail.com wrote: I had tried zkCli (ls /hbase and get /hbase/master) , but it returns the correct value. [zk: localhost:2181(CONNECTED) 2] get /hbase/master ip-10-251-27-130:6 cZxid = 0xa0032 ctime = Thu Apr 12 20:03:23 EDT 2012 mZxid = 0xa0032 mtime = Thu Apr 12 20:03:23 EDT 2012 pZxid = 0xa0032 Also, I do have the namenode listed in my config Here is my hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value /property Henri Pipe On Fri, Apr 13, 2012 at 1:58 AM, N Keywal nkey...@gmail.com wrote: Hi, Literally, it means that ZooKeeper is there but the hbase client can't find the hbase master address in it. By default, the node used is /hbase/master, and it contains the hostname and port of the master. You can check its content in ZK by doing a get /hbase/master in bin/zkCli.sh (see http://zookeeper.apache.org/doc/r3.4.3/zookeeperStarted.html#sc_ConnectingToZooKeeper ). There should be a root cause for this, so it worths looking for other error messages in the logs (master especially). N. On Fri, Apr 13, 2012 at 1:23 AM, Henri Pipe henri.p...@gmail.com wrote: client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found Having a problem with master startup that I have not seen before. running the following packages: hadoop-hbase-0.90.4+49.137-1 hadoop-0.20-secondarynamenode-0.20.2+923.197-1 hadoop-hbase-thrift-0.90.4+49.137-1 hadoop-zookeeper-3.3.4+19.3-1 hadoop-0.20-datanode-0.20.2+923.197-1 hadoop-0.20-namenode-0.20.2+923.197-1 hadoop-0.20-tasktracker-0.20.2+923.197-1 hadoop-hbase-regionserver-0.90.4+49.137-1 hadoop-zookeeper-server-3.3.4+19.3-1 hadoop-0.20-0.20.2+923.197-1 hadoop-0.20-jobtracker-0.20.2+923.197-1 hadoop-hbase-master-0.90.4+49.137-1 [root@ip-10-251-27-130 logs]# java -version java version 1.6.0_31 Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) I start master and region server on another node. Master is initialized, but as soon as I try to check the master_status or do a zkdump via web interface, it blows up with: 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZooKeeper available but no active master location found 2012-04-12 19:16:10,453 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: getMaster attempt 10 of 10 failed; retrying after sleep of 16000 I am running three zookeepers: # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/mnt/zookeeper # The maximum number of zookeeper client connections maxClientCnxns=2000 # the port at which the clients will connect clientPort=2181 server.1=10.251.27.130:2888:3888 server.2=10.250.9.220:2888:3888 server.3=10.251.110.50:2888:3888 I can telnet to the zookeepers just fine. Here is my hbase-site.xml file: configuration property namehbase.rootdir/name valuehdfs://namenode:9000/hbase/value /property property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.zookeeper.quorum/name value10.251.27.130,10.250.9.220,10.251.110.50/value /property property namehbase.zookeeper.property.dataDir/name value/hadoop/zookeeper/data/value /property property namehbase.zookeeper.property.maxClientCnxns/name value2000/value finaltrue/final /property /configuration Any thoughts? Any help is greatly appreciated. Thanks Henri Pipe
Re: Doumentation broken
Looks good. Thanks stack! On 4/13/12 2:47 PM, Stack st...@duboce.net wrote: On Fri, Apr 13, 2012 at 5:05 AM, Doug Meil doug.m...@explorysmedical.com wrote: Stack was still working on the site as of the end of day yesterday... Sorry about that. Our site needed a bit of updating -- conflicting mvn plugin versions, etc., hbase-5772 -- and it took me a while. It should be good now. Let me know if not. St.Ack
Re: Add client complexity or use a coprocessor?
I would look first at how concurrent is your coprocessor in operation. There's been quite a bit of effort to make upserts (increments), and the MemStore in general, efficient at high concurrency. Is the table auto-flush option the same as manually batching all the updates? I think the answer to your question is yes. Setting HTable.setAutoFlush(false) will buffer Puts (only) until the write buffer is full or a call to HTable.flushCommits(). So that would be like manually batching a bunch of Puts. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: Tom Brown tombrow...@gmail.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Thursday, April 12, 2012 1:37 AM Subject: Re: Add client complexity or use a coprocessor? Andy, Is the table auto-flush option the same as manually batching all the updates? --Tom On Tue, Apr 10, 2012 at 5:53 PM, Andrew Purtell apurt...@apache.org wrote: Even my implementation of an atomic increment (using a coprocessor) is two orders of magnitude slower than the provided implementation. Are there properties inherent to coprocessors or Incrementors that would force this kind of performance difference? No. You may be seeing a performance difference if you are packing multiple Increments into one round trip but not doing a similar kind of batching if calling a custom endpoint. Each Endpoint invocation is a round trip unless you do something like: ListRow actions = new ArrayListRow(); actions.add(new Exec(conf, row, protocol, method, ...)); actions.add(new Exec(conf, row, protocol, method, ...)); actions.add(new Exec(conf, row, protocol, method, ...)); Object[] results = table.batch(actions); ... I've not personally tried that particular API combination but don't see why it would not be possible. Beyond that, I'd suggest running a regionserver with your coprocessor installed under a profiler to see if you have monitor contention or a hotspot or similar. It could be something unexpected. Can you think of an efficient way to implement an atomic bitfield (other than adding it as a separate feature like atomic increments)? I think the idea of an atomic bitfield operation as part of the core API is intriguing. It has applicability to your estimator use case and I can think of a couple of things I could use it for. If there is more support for this idea, this may be something to consider. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) - Original Message - From: Tom Brown tombrow...@gmail.com To: user@hbase.apache.org; Andrew Purtell apurt...@apache.org Cc: Sent: Tuesday, April 10, 2012 3:53 PM Subject: Re: Add client complexity or use a coprocessor? Andy, I have attempted to use coprocessors to achieve a passable performance but have failed so far. Even my implementation of an atomic increment (using a coprocessor) is two orders of magnitude slower than the provided implementation. Are there properties inherent to coprocessors or Incrementors that would force this kind of performance difference? Can you think of an efficient way to implement an atomic bitfield (other than adding it as a separate feature like atomic increments)? Thanks! --Tom On Tue, Apr 10, 2012 at 12:01 PM, Andrew Purtell apurt...@apache.org wrote: Tom, I am a big fan of the Increment class. Unfortunately, I'm not doing simple increments for the viewer count. I will be receiving duplicate messages from a particular client for a specific cube cell, and don't want them to be counted twice Gotcha. I created an RPC endpoint coprocessor to perform this function but performance suffered heavily under load (it appears that the endpoint performs all functions in serial). Did you serialize access to your data structure(s)? When I tried implementing it as a region observer, I was unsure of how to correctly replace the provided put with my own. When I issued a put from within prePut, the server blocked the new put (waiting for the prePut to finish). Should I be attempting to modify the WALEdit object? You can add KVs to the WALEdit. Or, you can get a reference to the Put's familyMap: Mapbyte[], ListKeyValue familyMap = put.getFamilyMap(); and if you modify the map, you'll change what gets committed. Is there a way to extend the functionality of Increment to provide arbitrary bitwise operations on a the contents of a field? As a matter of design, this should be a new operation. It does sound interesting and useful, some sort of atomic bitfield. Best regards, - Andy Problems worthy
Re: 0.92 and Read/writes not scaling
On Fri, Apr 13, 2012 at 8:02 PM, Todd Lipcon t...@cloudera.com wrote: If you want to patch on the HBase side, you can edit HLog.java to remove the checks for the sync method, and have it only call hflush. It's only the compatibility path that caused the problem. You mean change the order here boss? @Override public void sync() throws IOException { if (this.syncFs != null) { try { this.syncFs.invoke(this.writer, HLog.NO_ARGS); } catch (Exception e) { throw new IOException(Reflection, e); } } else if (this.hflush != null) { try { this.hflush.invoke(getWriterFSDataOutputStream(), HLog.NO_ARGS); } catch (Exception e) { throw new IOException(Reflection, e); } } } Call hflush if its available ahead of syncFs? Seems like we should get this in all around. I can do it. Good stuff, St.Ack