Partial replication
Hi, Is it possible to replicate a particular dataset to another cluster instead of replicating the whole data? -- With Regards, Jr.
Re: client-side caching
I've seen that. But that's about caching on regionserver-side through memcache. You still have the network roundtrip. I'd like to avoid at all contacting the regionserver, when possible. So I was guessing whether the hbase-client would have some caching embedded, otherwise I'll implement it through memcache. On 7/4/11 7:03 PM, Ted Yu wrote: See HBASE-4018 On Mon, Jul 4, 2011 at 7:33 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hello list, i'm using hbase 0.90.3 on a 5 nodes cluster. I'm using a table as a string-long map. As I'm using this map a lot, I was thinking about installing memcache on the client side, as to avoid flooding hbase for the same value over and over. What is the best practice in these situations? some client-side caching already in hbase? Best, Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
how to check whether zookeeper quorum is working fine or not ?
Hi, I have a 12-node hbase cluster setup with 3 nodes as a part of zookeeper quorum. I am able to run hbase shell, create tables.. and able to access tables in the shell. Now I am configuring pig to use hbase. While accessing records using pig, it is giving some Zookeeper exception and saying to check the zookeeper logs. I am sending the latest zookeeper logs of the 3 nodes along with hbase-site.xml Can anyone help me figuring out whether everything is okay with my zookeeper and if not, what is the issue ? Hbase-site.xml --- http://pastebin.com/8aJ7D54T Zookeeper log on ub11 --- http://pastebin.com/HMuL9aCJ Zookeeper log on ub12 --- http://pastebin.com/8XdmVmDW Zookeeper log on ub13 --- http://pastebin.com/2373Rrat Thanks, Praveenesh
Re: client-side caching
Caching sounds easy until you need to worry about invalidation. It's hard to build efficient and correct invalidation. On Jul 5, 2011 2:13 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I've seen that. But that's about caching on regionserver-side through memcache. You still have the network roundtrip. I'd like to avoid at all contacting the regionserver, when possible. So I was guessing whether the hbase-client would have some caching embedded, otherwise I'll implement it through memcache. On 7/4/11 7:03 PM, Ted Yu wrote: See HBASE-4018 On Mon, Jul 4, 2011 at 7:33 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hello list, i'm using hbase 0.90.3 on a 5 nodes cluster. I'm using a table as a string-long map. As I'm using this map a lot, I was thinking about installing memcache on the client side, as to avoid flooding hbase for the same value over and over. What is the best practice in these situations? some client-side caching already in hbase? Best, Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
zookeeper connection issue - distributed mode
Hi, i have following environment - hbase-0.90.1-cdh3u0 on ubuntu. I have following code for the distributed mode and i am calling this java code from remote client: HBaseConfiguration config = new HBaseConfiguration(); config.clear(); config.set(hbase.zookeeper.quorum, ults01); config.set(hbase.zookeeper.property.clientPort, 2181); and I get following log: 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:host.name=192.168.1.64 ... 11/07/05 03:21:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ults01:2181 sessionTimeout=18 watcher=hconnection 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Opening socket connection to server ults01/192.168.22.133:2181 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Socket connection established to ults01/192.168.22.133:2181, initiating session 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ults01/192.168.22.133:2181, sessionid = 0x130f352453f0027, negotiated timeout = 4 11/07/05 03:21:32 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:33 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:34 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:35 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. my conf/regionservers used to have the localhost entry, which i tried to change to the hostname ults01, but no luck. /etc/hosts: 127.0.0.1 localhost #127.0.0.1 ults01 192.168.22.133 ults01 Basic question is from where it is picking up localhost/127.0.0.1:60020? I think, it should be ults01/192.168.22.133:60020. Even worst, it use to work, but due to the virtual server move, i have to change the IP address of ults01 machine. thanks for comments. regards, devush
Re: client-side caching
Totally understand. As a matter of fact I didn't mention my table is read-only or insert-only (no data is modified), so no real invalidation necessary here. I guess this means i should go for my own memcache on the client side. Thanks! On 7/5/11 11:39 AM, Ryan Rawson wrote: Caching sounds easy until you need to worry about invalidation. It's hard to build efficient and correct invalidation. On Jul 5, 2011 2:13 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: I've seen that. But that's about caching on regionserver-side through memcache. You still have the network roundtrip. I'd like to avoid at all contacting the regionserver, when possible. So I was guessing whether the hbase-client would have some caching embedded, otherwise I'll implement it through memcache. On 7/4/11 7:03 PM, Ted Yu wrote: See HBASE-4018 On Mon, Jul 4, 2011 at 7:33 AM, Claudio Martella claudio.marte...@tis.bz.it wrote: Hello list, i'm using hbase 0.90.3 on a 5 nodes cluster. I'm using a table as a string-long map. As I'm using this map a lot, I was thinking about installing memcache on the client side, as to avoid flooding hbase for the same value over and over. What is the best practice in these situations? some client-side caching already in hbase? Best, Claudio -- Claudio Martella Digital Technologies Unit Research Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Free Software Open Technologies Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno
hbase host dns ip and route for multi network interface card
Hi, when I start my hbase cluster, there are some error logs in the master-log: the ip and hostname node3 192.168.1.15 192.168.1.13 are the same machine that have two NIC 2011-07-05 17:13:13,820 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 2011-07-05 17:13:13,840 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server node3/192.168.1.15:2181 2011-07-05 17:13:13,975 DEBUG org.apache.hadoop.hbase.master.HMaster: Checking cluster state... 2011-07-05 17:13:13,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 192.168.1.13:60020 2011-07-05 17:13:19,732 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode /hbase/rs/1309857199677 with data 192.168.1.15:60020 2011-07-05 17:22:01,041 INFO org.apache.hadoop.ipc.HbaseRPC: Server at / 192.168.1.13:60020 could not be reached after 1 tries, giving up. 2011-07-05 17:22:01,042 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {server: 192.168.1.13:60020, regionname: .META.,,1, startKey: }org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /192.168.1.13:60020 after attempts=1 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:429) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:918) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:934) at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:173) at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73) at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129) at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153) at org.apache.hadoop.hbase.Chore.run(Chore.java:68) Sometimes when the .META. region is not assigned to the server node3, which has two NIC:eth0:192.168.1.13 and eth1:192.168.1.15 and resolve the dns/host as:192.168.1.15 node3, I means, when the region .META. is assigned to the others server that has only one NIC, the hbase will work well. here is some of my hbase cluster infos: Hbase version:0.20.6 Hadoop version:0.20-append+4 Zookeeper version:3.3.0 the hbase-site.xml: configuration property namehbase.rootdir/name valuehdfs://node3:54310/hbase/value /property property namehbase.master/name valuehadoop5:6/value /property property namehbase.zookeeper.quorum/name valuenode3,hadoop5,hadoopoffice85,hadoopoffice88,hdofficelj001/value /property property namehbase.cluster.distributed/name valuetrue/value /property !--property namehbase.master.dns.interface/name valueeth1/value descriptionThe name of the Network Interface from which a master should report its IP address. /description /property property namehbase.regionserver.dns.interface/name valueeth1/value descriptionThe name of the Network Interface from which a region server should report its IP address. /description /property property namehbase.zookeeper.dns.interface/name valueeth1/value descriptionThe name of the Network Interface from which a ZooKeeper server should report its IP address. /description /property-- !--property namehbase.zookeeper.property.clientPort/name value/value descriptionProperty from ZooKeeper's config zoo.cfg. The port at which the clients will connect. /description /property property namehbase.zookeeper.property.dataDir/name value/opt/zookeeper/data/value descriptionProperty from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored. /description /property-- /configuration cat /opt/hbase/conf/regionservers hadoop5 node3 hadoopoffice85 hadoopoffice88 hdofficelj001 --- And the below is the node3's info: 192.168.1.13's ifconfig info: [root@node3 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:23:2E:D3 inet addr:192.168.1.13 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe23:2ed3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1424620 errors:0 dropped:0 overruns:0 frame:0 TX packets:17897973 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:150231810 (143.2 MiB) TX bytes:2834085782 (2.6 GiB) Base address:0x2000 Memory:d892-d894 eth1 Link encap:Ethernet HWaddr 00:0C:29:23:2E:DD inet addr:192.168.1.15 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe23:2edd/64 Scope:Link UP BROADCAST
Re: Seperating Application and Database Servers
Check your DNS. localhost in the log below can also mean that your hbase.zookeeper.quorum is carrying default value. Modify it to point to the real quorum. On Tue, Jul 5, 2011 at 1:27 AM, James Ram hbas...@gmail.com wrote: Hi, We are running hadoop and hbase in a 9 machine cluster. We tried to put our application server on a machine outside the HBase cluster, but we are getting the following error. Is there any way that we can do this? 11/07/05 11:26:41 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpe cted error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:43 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 11/07/05 11:26:44 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpe cted error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:45 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 -- With Regards, Jr.
Re: hbck -fix
On Mon, Jul 4, 2011 at 1:28 PM, Stack st...@duboce.net wrote: On Sun, Jul 3, 2011 at 12:39 AM, Andrew Purtell apurt...@apache.org wrote: I've done exercises in the past like delete META on disk and recreate it with the earlier set of utilities (add_table.rb). This always worked for me when I've tried it. We need to update add_table.rb at least. The onlining of regions was done by the metascan. It no longer exists in 0.90. Maybe a disable/enable after an add_table.rb would do but probably better to revamp and merge it with hbck? Results from torture tests that HBase was subjected to in the timeframe leading up to 0.90 also resulted in better handling of .META. table related errors. They are fortunately demonstrably now rare. Agreed. My concern here is getting repeatable results demonstrating HBCK weaknesses will be challenging. Yes. This is the tough one. I was hoping Wayne had a snapshot of .META. to help at least characterize the problem. (This does sound like something our Dan Harvey ran into recently on an hbase 0.20.x hbase. Let me go back to him. He might have some input that will help here.) St.Ack If the root of this issue is the master filling up it is not totally an hbase issue. If your search the hadoop mailing list you will find people who's NameNode disk fills up and had quite a catastrophic, hard to recover from, failure. Monitor the s#it out of your SPOFs. To throw something very anecdotal in here, I find not many data stores recover from full disk errors well.
Re: FW: Reg: The HLog that is created in region creation
When hbase.hstore.blockingStoreFiles is reached in one Store, updates are blocked for this HRegion until a compaction is completed. Regards On Tue, Jul 5, 2011 at 12:09 AM, Ramkrishna S Vasudevan ramakrish...@huawei.com wrote: Hi all, Sorry Ted for not sending to the dev@ list. Few more queries related to splitting 1. As per the flow when compaction is happening there may be few more store files created due to flushing. 2. Suppose initially 3 Store files have been selected for compaction at the end of compaction process i will get 1 storefile. Parallely due to flushing some 3 store files were created. So after step 2 my total store files in 4. 3. Now inorder to get the midkey for splitting we iterate through all the store files and find the midkey from the largest store file(Correct me if am wrong). 4. this largest store file may be the store file that was created as part of compaction. Now how does the midkey selected will encompass all the 4 store files keys? Why i have this doubt is while splitting all the store files are moved into the new regions created as part of split. So all the 4 store files are now moved to both the new regions created. Pls help me in understanding this flow. Thanks in advance. Regards Ram *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, July 05, 2011 11:01 AM To: ramakrish...@huawei.com Cc: user@hbase.apache.org Subject: Re: FW: Reg: The HLog that is created in region creation Please include dev@ so that other people would be able to answer your question. and Daugter A2(Midkey to startkey). Should read (Midkey to endkey). For #1 below, you're right that splitting benefits retrieval more than writes. For #2, I guess you may have read CompactSplitThread code where compacting stores in a region happens before splitting. Cheers On Mon, Jul 4, 2011 at 9:26 PM, Ramkrishna S Vasudevan ramakrish...@huawei.com wrote: Dear Ted Thanks for your reply. I have few questions on Splitting of regions. I will tell my observations, may be they are not fully correct. Pls correct me if am wrong, Every time compaction results in a file greater than max file size region splits happen from the mid key. So Region A is now split into Daughter A1(Start to midkey) and Daugter A2(Midkey to startkey). 1. Why do we do a split ? I suppose it is because now a Region will now hold only a portion of the full data and hence retrieval will be easy. 2. Is there a possibility that during first compaction a region split happens, which results in 2 region creations. Now when new write request come to the parent region will it be accomodated in the new daughter regions that were created. If it is the case, now how will the splitkeys be handled ? Pls do corect me if am wrong anywhere? Regards Ram *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, July 05, 2011 9:38 AM To: user@hbase.apache.org Cc: user@hbase.apache.org Subject: Re: Reg: The HLog that is created in region creation Not really used. See hbase-4010 On Jul 4, 2011, at 8:55 PM, Ramkrishna S Vasudevan ramakrish...@huawei.com wrote: Hello Can anybody tell me what is the use of the HLog created per region when a region is created? Regards Ram *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or
Re: zookeeper connection issue - distributed mode
Hi, On 60020 is usually region server; check hbase-default.xml or try to set 'hbase.regionserver' inside HBaseConfiguration object, if you are going to use it directly from application code. Regards, Sanel On Tue, Jul 5, 2011 at 4:00 PM, Florin P florinp...@yahoo.com wrote: Hello! The property hbase.zookeeper.quorum is taken from hbase-site.xml on the HBase master machine. taken from Hbase master hbase-site.xml property namehbase.zookeeper.quorum/name valueyour_server_name/value /property For me, it worked. Success, Florin --- On Tue, 7/5/11, devush devushan...@gmail.com wrote: From: devush devushan...@gmail.com Subject: zookeeper connection issue - distributed mode To: user@hbase.apache.org Date: Tuesday, July 5, 2011, 5:44 AM Hi, i have following environment - hbase-0.90.1-cdh3u0 on ubuntu. I have following code for the distributed mode and i am calling this java code from remote client: HBaseConfiguration config = new HBaseConfiguration(); config.clear(); config.set(hbase.zookeeper.quorum, ults01); config.set(hbase.zookeeper.property.clientPort, 2181); and I get following log: 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:host.name=192.168.1.64 ... 11/07/05 03:21:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ults01:2181 sessionTimeout=18 watcher=hconnection 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Opening socket connection to server ults01/192.168.22.133:2181 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Socket connection established to ults01/192.168.22.133:2181, initiating session 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ults01/192.168.22.133:2181, sessionid = 0x130f352453f0027, negotiated timeout = 4 11/07/05 03:21:32 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:33 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:34 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:35 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. my conf/regionservers used to have the localhost entry, which i tried to change to the hostname ults01, but no luck. /etc/hosts: 127.0.0.1 localhost #127.0.0.1 ults01 192.168.22.133 ults01 Basic question is from where it is picking up localhost/127.0.0.1:60020? I think, it should be ults01/192.168.22.133:60020. Even worst, it use to work, but due to the virtual server move, i have to change the IP address of ults01 machine. thanks for comments. regards, devush
Re: Errors after major compaction
Eran: I logged https://issues.apache.org/jira/browse/HBASE-4060 for you. On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the understanding. Can you log a JIRA and put your ideas below in it ? On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote: Thanks for the explanation Ted, I will try to apply HBASE-3789 and hope for the best but my understanding is that it doesn't really solve the problem, it only reduces the probability of it happening, at least in one particular scenario. I would hope for a more robust solution. My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. -eran On Sun, Jul 3, 2011 at 23:49, Ted Yu yuzhih...@gmail.com wrote: Let me try to answer some of your questions. The two paragraphs below were written along my reasoning which is in reverse order of the actual call sequence. For #4 below, the log indicates that the following was executed: private void assign(final RegionState state, final boolean setOfflineInZK, final boolean forceNewPlan) { for (int i = 0; i this.maximumAssignmentAttempts; i++) { if (setOfflineInZK !*setOfflineInZooKeeper*(state)) return; The above was due to the timeout which you noted in #2 which would have caused TimeoutMonitor.chore() to run this code (line 1787) for (Map.EntryHRegionInfo, Boolean e: assigns.entrySet()){ assign(e.getKey(), false, e.getValue()); } This means there is lack of coordination between assignmentManager.TimeoutMonitor and OpenedRegionHandler The reason I mention HBASE-3789 is that it is marked as Incompatible change and is in TRUNK already. The application of HBASE-3789 to 0.90 branch would change the behavior (timing) of region assignment. I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 BTW were the incorrect region assignments observed for a table with multiple initial regions ? If so, I have HBASE-4010 in TRUNK which speeds up initial region assignment by about 50%. Cheers On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner e...@gigya.com wrote: Ted, So if I understand correctly the the theory is that because of the issue fixed in HBASE-3789 the master took too long to detect that the region was successfully opened by the first server so it forced closed it and transitioned to a second server, but there are a few things about this scenario I don't understand, probably because I don't know enough about the inner workings of the region transition process and would appreciate it if you can help me understand: 1. The RS opened the region at 16:37:49. 2. The master started handling the opened event at 16:39:54 - this delay can probably be explained by HBASE-3789 3. At 16:39:54 the master log says: Opened region gs_raw_events,. on hadoop1-s05.farm-ny.gigya.com 4. Then at 16:40:00 the master log says: master:6-0x13004a31d7804c4 Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3 09 with OFFLINE state - why did it decide to take the region offline after learning it was successfully opened? 5. Then it tries to reopen the region on hadoop1-s05, which indicates in its log that the open request failed because the region was already open - why didn't the master use that information to learn that the region was already open? 6. At 16:43:57 the master decides the region transition timed out and starts forcing the
Re: Errors after major compaction
Appreciate it, sorry I didn't get to it sooner. Had some crazy days :) -eran On Tue, Jul 5, 2011 at 17:19, Ted Yu yuzhih...@gmail.com wrote: Eran: I logged https://issues.apache.org/jira/browse/HBASE-4060 for you. On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the understanding. Can you log a JIRA and put your ideas below in it ? On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote: Thanks for the explanation Ted, I will try to apply HBASE-3789 and hope for the best but my understanding is that it doesn't really solve the problem, it only reduces the probability of it happening, at least in one particular scenario. I would hope for a more robust solution. My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. -eran On Sun, Jul 3, 2011 at 23:49, Ted Yu yuzhih...@gmail.com wrote: Let me try to answer some of your questions. The two paragraphs below were written along my reasoning which is in reverse order of the actual call sequence. For #4 below, the log indicates that the following was executed: private void assign(final RegionState state, final boolean setOfflineInZK, final boolean forceNewPlan) { for (int i = 0; i this.maximumAssignmentAttempts; i++) { if (setOfflineInZK !*setOfflineInZooKeeper*(state)) return; The above was due to the timeout which you noted in #2 which would have caused TimeoutMonitor.chore() to run this code (line 1787) for (Map.EntryHRegionInfo, Boolean e: assigns.entrySet()){ assign(e.getKey(), false, e.getValue()); } This means there is lack of coordination between assignmentManager.TimeoutMonitor and OpenedRegionHandler The reason I mention HBASE-3789 is that it is marked as Incompatible change and is in TRUNK already. The application of HBASE-3789 to 0.90 branch would change the behavior (timing) of region assignment. I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 BTW were the incorrect region assignments observed for a table with multiple initial regions ? If so, I have HBASE-4010 in TRUNK which speeds up initial region assignment by about 50%. Cheers On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner e...@gigya.com wrote: Ted, So if I understand correctly the the theory is that because of the issue fixed in HBASE-3789 the master took too long to detect that the region was successfully opened by the first server so it forced closed it and transitioned to a second server, but there are a few things about this scenario I don't understand, probably because I don't know enough about the inner workings of the region transition process and would appreciate it if you can help me understand: 1. The RS opened the region at 16:37:49. 2. The master started handling the opened event at 16:39:54 - this delay can probably be explained by HBASE-3789 3. At 16:39:54 the master log says: Opened region gs_raw_events,. on hadoop1-s05.farm-ny.gigya.com 4. Then at 16:40:00 the master log says: master:6-0x13004a31d7804c4 Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3 09 with OFFLINE state - why did it decide to take the region offline after learning it was successfully opened? 5. Then it tries to reopen the region on hadoop1-s05, which
WrongRegionException and inconsistent table found
Hi, We're running a hbase cluster including 37 regionservers. Today, we found losts of WrongRegionException when putting object into it. hbase hbck -details reports that Chain of regions in table STable is broken; edges does not contain ztxrGmCwn-6BE32s3cX1TNeHU_I= ERROR: Found inconsistency in table STable echo scan '.META.'| hbase shell meta.txt grep -A1 STARTKEY = 'EStore_everbox_z meta.txt reports that Ck=,1308802977279.71ffb1 1ffb10b8b95fd47b3eff468d00ab4e9.', STARTKEY = 'ztn0ukLW 0b8b95fd47b3eff468d00ab4 d1NSU3fuXKkkWq5ZVCk=', ENDKEY = 'ztqdVD8fCMP-dDbXUAydan e9.kboD4=', ENCODED = 71ffb10b8b95fd47b3eff468d00ab4e9, TABLE = {{NAME = -- D4=,1305619724446.c45191 45191821053d03537596f4a2e759718.', STARTKEY = ztqdVD8f 821053d03537596f4a2e7597 CMP-dDbXUAydankboD4=', ENDKEY = ' ztxrGmCwn-6BE32s3cX1TN 18.eHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = -- pA=,1309455605341.c5c5f55c5f578722ea3f8d1b099313bec8298.', STARTKEY = 'zu3zVaLc 78722ea3f8d1b099313bec82 GDnnpjKCbnboXgAFspA=', ENDKEY = 'zu7qkr5fH6MMJ3GxbCv_0d 98.6g8yI=', ENCODED = c5c5f578722ea3f8d1b099313bec8298, TABLE = {{NAME = It looks like the meta indeed has a hole.(We tried scan '.META.' several times, to confirm it's not a transient status.) We've tried hbase hbck -fix, does not help. We found a thread 'wrong region exception' about two months ago. Stack suggested a 'little surgery' like *So, make sure you actually have a hole. Dump out your meta table: echo scan '.META.'| ./bin/hbase shell /tmp/meta.txt Then look ensure that there is a hole between the above regions (compare start and end keys... the end key of one region needs to match the start key of the next). If indeed a hole, you need to do a little surgery inserting a new missing region (hbck should fix this but it doesn't have the smarts just yet). Basically, you create a new region with start and end keys to fill the hole then you insert it into .META. and then assign it. There are some scripts in our bin directory that do various parts of this. I'm pretty sure its beyond any but a few figuring this mess out so if you do the above foot work and provide a few more details, I'll hack up something for you (and hopefully something generalized to be use by others later, and later to be integrated into hbck).* Can anyone give a detailed example, step by step instruction would be greatly appreciated. My understand is we should 1.Since we already has the lost region, we now have start and end keys. 2.generate the row represents the missing region. But how can I generate the encoded name? It looks like I need column=info:server,column=info:serverstartcode and column=info:regioninfo for the missing region. And column=info:regioninfo includes so many information. How to generate them one by one? As for the name of row, it consists of tablename, startkey, encode, and one more long number, how to get this number? 3.use assing command in the hbase shell We also tried check_meta.rb --fix, it reports 11/07/06 00:09:08 WARN check_meta: hole after REGION = {NAME = 'STable,ztqdVD8fCMP-dDbXUAydankboD4=,1305619724446.c45191821053d03537596f4a2e759718.', STARTKEY = 'ztqdVD8fCMP-dDbXUAydankboD4=', ENDKEY = 'ztxrGmCwn-6BE32s3cX1TNeHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = 'STable', FAMILIES = [{NAME = 'file', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'filelength', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userbucket', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userpass', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} 11/07/06 00:28:40 WARN check_meta: Missing .regioninfo: hdfs:// hd0013.c.gj.com:9000/hbase/STable/3e6faca40a7ccad7ed8c0b5848c0f945/.regioninfo The problem is still there. BTW, what about the blue warning? Is this a serious issue? The situation is quite hard to us, it looks like even we can fill the hole in the meta, we would lost all the data in the hole region, right? Thanks and regards, Mao Xu-Feng
Re: WrongRegionException and inconsistent table found
We also check the master log, nothing interesting found. On Wed, Jul 6, 2011 at 12:58 AM, Xu-Feng Mao m9s...@gmail.com wrote: Hi, We're running a hbase cluster including 37 regionservers. Today, we found losts of WrongRegionException when putting object into it. hbase hbck -details reports that Chain of regions in table STable is broken; edges does not contain ztxrGmCwn-6BE32s3cX1TNeHU_I= ERROR: Found inconsistency in table STable echo scan '.META.'| hbase shell meta.txt grep -A1 STARTKEY = 'EStore_everbox_z meta.txt reports that Ck=,1308802977279.71ffb1 1ffb10b8b95fd47b3eff468d00ab4e9.', STARTKEY = 'ztn0ukLW 0b8b95fd47b3eff468d00ab4 d1NSU3fuXKkkWq5ZVCk=', ENDKEY = 'ztqdVD8fCMP-dDbXUAydan e9.kboD4=', ENCODED = 71ffb10b8b95fd47b3eff468d00ab4e9, TABLE = {{NAME = -- D4=,1305619724446.c45191 45191821053d03537596f4a2e759718.', STARTKEY = ztqdVD8f 821053d03537596f4a2e7597 CMP-dDbXUAydankboD4=', ENDKEY = ' ztxrGmCwn-6BE32s3cX1TN 18.eHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = -- pA=,1309455605341.c5c5f55c5f578722ea3f8d1b099313bec8298.', STARTKEY = 'zu3zVaLc 78722ea3f8d1b099313bec82 GDnnpjKCbnboXgAFspA=', ENDKEY = 'zu7qkr5fH6MMJ3GxbCv_0d 98.6g8yI=', ENCODED = c5c5f578722ea3f8d1b099313bec8298, TABLE = {{NAME = It looks like the meta indeed has a hole.(We tried scan '.META.' several times, to confirm it's not a transient status.) We've tried hbase hbck -fix, does not help. We found a thread 'wrong region exception' about two months ago. Stack suggested a 'little surgery' like *So, make sure you actually have a hole. Dump out your meta table: echo scan '.META.'| ./bin/hbase shell /tmp/meta.txt Then look ensure that there is a hole between the above regions (compare start and end keys... the end key of one region needs to match the start key of the next). If indeed a hole, you need to do a little surgery inserting a new missing region (hbck should fix this but it doesn't have the smarts just yet). Basically, you create a new region with start and end keys to fill the hole then you insert it into .META. and then assign it. There are some scripts in our bin directory that do various parts of this. I'm pretty sure its beyond any but a few figuring this mess out so if you do the above foot work and provide a few more details, I'll hack up something for you (and hopefully something generalized to be use by others later, and later to be integrated into hbck).* Can anyone give a detailed example, step by step instruction would be greatly appreciated. My understand is we should 1.Since we already has the lost region, we now have start and end keys. 2.generate the row represents the missing region. But how can I generate the encoded name? It looks like I need column=info:server,column=info:serverstartcode and column=info:regioninfo for the missing region. And column=info:regioninfo includes so many information. How to generate them one by one? As for the name of row, it consists of tablename, startkey, encode, and one more long number, how to get this number? 3.use assing command in the hbase shell We also tried check_meta.rb --fix, it reports 11/07/06 00:09:08 WARN check_meta: hole after REGION = {NAME = 'STable,ztqdVD8fCMP-dDbXUAydankboD4=,1305619724446.c45191821053d03537596f4a2e759718.', STARTKEY = 'ztqdVD8fCMP-dDbXUAydankboD4=', ENDKEY = 'ztxrGmCwn-6BE32s3cX1TNeHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = 'STable', FAMILIES = [{NAME = 'file', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'filelength', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = ' 2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userbucket', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userpass', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} 11/07/06 00:28:40 WARN check_meta: Missing .regioninfo: hdfs:// hd0013.c.gj.com:9000/hbase/STable/3e6faca40a7ccad7ed8c0b5848c0f945/.regioninfo The problem is still there. BTW, what about the blue warning? Is this a serious issue? The situation is quite hard to us, it looks like even we can fill the hole in the meta, we would lost all the data in the hole region, right? Thanks and regards, Mao Xu-Feng
Re: zookeeper connection issue - distributed mode
Entry in the hbase-site.xml worked for me. thanks, devush On 05/07/2011 15:00, Florin P wrote: Hello! The property hbase.zookeeper.quorum is taken from hbase-site.xml on the HBase master machine. taken from Hbase master hbase-site.xml property namehbase.zookeeper.quorum/name valueyour_server_name/value /property For me, it worked. Success, Florin --- On Tue, 7/5/11, devushdevushan...@gmail.com wrote: From: devushdevushan...@gmail.com Subject: zookeeper connection issue - distributed mode To: user@hbase.apache.org Date: Tuesday, July 5, 2011, 5:44 AM Hi, i have following environment - hbase-0.90.1-cdh3u0 on ubuntu. I have following code for the distributed mode and i am calling this java code from remote client: HBaseConfiguration config = new HBaseConfiguration(); config.clear(); config.set(hbase.zookeeper.quorum, ults01); config.set(hbase.zookeeper.property.clientPort, 2181); and I get following log: 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 11/07/05 03:33:03 INFO zookeeper.ZooKeeper: Client environment:host.name=192.168.1.64 ... 11/07/05 03:21:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ults01:2181 sessionTimeout=18 watcher=hconnection 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Opening socket connection to server ults01/192.168.22.133:2181 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Socket connection established to ults01/192.168.22.133:2181, initiating session 11/07/05 03:21:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ults01/192.168.22.133:2181, sessionid = 0x130f352453f0027, negotiated timeout = 4 11/07/05 03:21:32 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:33 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:34 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. 11/07/05 03:21:35 INFO ipc.HbaseRPC: Server at localhost/127.0.0.1:60020 could not be reached after 1 tries, giving up. my conf/regionservers used to have the localhost entry, which i tried to change to the hostname ults01, but no luck. /etc/hosts: 127.0.0.1localhost #127.0.0.1ults01 192.168.22.133 ults01 Basic question is from where it is picking up localhost/127.0.0.1:60020? I think, it should be ults01/192.168.22.133:60020. Even worst, it use to work, but due to the virtual server move, i have to change the IP address of ults01 machine. thanks for comments. regards, devush
Re: WrongRegionException and inconsistent table found
I forgot the version, we are using cdh3u0. Mao Xu-Feng 在 2011-7-6,0:59,Xu-Feng Mao m9s...@gmail.com 写道: We also check the master log, nothing interesting found. On Wed, Jul 6, 2011 at 12:58 AM, Xu-Feng Mao m9s...@gmail.com wrote: Hi, We're running a hbase cluster including 37 regionservers. Today, we found losts of WrongRegionException when putting object into it. hbase hbck -details reports that Chain of regions in table STable is broken; edges does not contain ztxrGmCwn-6BE32s3cX1TNeHU_I= ERROR: Found inconsistency in table STable echo scan '.META.'| hbase shell meta.txt grep -A1 STARTKEY = 'EStore_everbox_z meta.txt reports that Ck=,1308802977279.71ffb1 1ffb10b8b95fd47b3eff468d00ab4e9.', STARTKEY = 'ztn0ukLW 0b8b95fd47b3eff468d00ab4 d1NSU3fuXKkkWq5ZVCk=', ENDKEY = 'ztqdVD8fCMP-dDbXUAydan e9.kboD4=', ENCODED = 71ffb10b8b95fd47b3eff468d00ab4e9, TABLE = {{NAME = -- D4=,1305619724446.c45191 45191821053d03537596f4a2e759718.', STARTKEY = ztqdVD8f 821053d03537596f4a2e7597 CMP-dDbXUAydankboD4=', ENDKEY = ' ztxrGmCwn-6BE32s3cX1TN 18.eHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = -- pA=,1309455605341.c5c5f55c5f578722ea3f8d1b099313bec8298.', STARTKEY = 'zu3zVaLc 78722ea3f8d1b099313bec82 GDnnpjKCbnboXgAFspA=', ENDKEY = 'zu7qkr5fH6MMJ3GxbCv_0d 98.6g8yI=', ENCODED = c5c5f578722ea3f8d1b099313bec8298, TABLE = {{NAME = It looks like the meta indeed has a hole.(We tried scan '.META.' several times, to confirm it's not a transient status.) We've tried hbase hbck -fix, does not help. We found a thread 'wrong region exception' about two months ago. Stack suggested a 'little surgery' like *So, make sure you actually have a hole. Dump out your meta table: echo scan '.META.'| ./bin/hbase shell /tmp/meta.txt Then look ensure that there is a hole between the above regions (compare start and end keys... the end key of one region needs to match the start key of the next). If indeed a hole, you need to do a little surgery inserting a new missing region (hbck should fix this but it doesn't have the smarts just yet). Basically, you create a new region with start and end keys to fill the hole then you insert it into .META. and then assign it. There are some scripts in our bin directory that do various parts of this. I'm pretty sure its beyond any but a few figuring this mess out so if you do the above foot work and provide a few more details, I'll hack up something for you (and hopefully something generalized to be use by others later, and later to be integrated into hbck).* Can anyone give a detailed example, step by step instruction would be greatly appreciated. My understand is we should 1.Since we already has the lost region, we now have start and end keys. 2.generate the row represents the missing region. But how can I generate the encoded name? It looks like I need column=info:server,column=info:serverstartcode and column=info:regioninfo for the missing region. And column=info:regioninfo includes so many information. How to generate them one by one? As for the name of row, it consists of tablename, startkey, encode, and one more long number, how to get this number? 3.use assing command in the hbase shell We also tried check_meta.rb --fix, it reports 11/07/06 00:09:08 WARN check_meta: hole after REGION = {NAME = 'STable,ztqdVD8fCMP-dDbXUAydankboD4=,1305619724446.c45191821053d03537596f4a2e759718.', STARTKEY = 'ztqdVD8fCMP-dDbXUAydankboD4=', ENDKEY = 'ztxrGmCwn-6BE32s3cX1TNeHU_I=', ENCODED = c45191821053d03537596f4a2e759718, TABLE = {{NAME = 'STable', FAMILIES = [{NAME = 'file', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'filelength', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = ' 2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userbucket', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}, {NAME = 'userpass', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '3', TTL = '2147483647', BLOCKSIZE = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]}} 11/07/06 00:28:40 WARN check_meta: Missing .regioninfo: hdfs:// hd0013.c.gj.com:9000/hbase/STable/3e6faca40a7ccad7ed8c0b5848c0f945/.regioninfo The problem is still there. BTW, what about the blue warning? Is this a serious issue? The situation is quite hard to us, it looks like even we can fill the hole in the meta, we would
Re: Errors after major compaction
Eran: You didn't run hbck during the enabling of gs_raw_events table, right ? I saw: 2011-06-29 16:43:50,395 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction (major) requested for gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. because User-triggered major compaction; priority=1, compaction queue size=1248 The above might be related to: 2011-06-29 16:43:57,880 INFO org.apache.hadoop.hbase. master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. Thanks On Tue, Jul 5, 2011 at 7:19 AM, Ted Yu yuzhih...@gmail.com wrote: Eran: I logged https://issues.apache.org/jira/browse/HBASE-4060 for you. On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu yuzhih...@gmail.com wrote: Thanks for the understanding. Can you log a JIRA and put your ideas below in it ? On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote: Thanks for the explanation Ted, I will try to apply HBASE-3789 and hope for the best but my understanding is that it doesn't really solve the problem, it only reduces the probability of it happening, at least in one particular scenario. I would hope for a more robust solution. My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss. In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. -eran On Sun, Jul 3, 2011 at 23:49, Ted Yu yuzhih...@gmail.com wrote: Let me try to answer some of your questions. The two paragraphs below were written along my reasoning which is in reverse order of the actual call sequence. For #4 below, the log indicates that the following was executed: private void assign(final RegionState state, final boolean setOfflineInZK, final boolean forceNewPlan) { for (int i = 0; i this.maximumAssignmentAttempts; i++) { if (setOfflineInZK !*setOfflineInZooKeeper*(state)) return; The above was due to the timeout which you noted in #2 which would have caused TimeoutMonitor.chore() to run this code (line 1787) for (Map.EntryHRegionInfo, Boolean e: assigns.entrySet()){ assign(e.getKey(), false, e.getValue()); } This means there is lack of coordination between assignmentManager.TimeoutMonitor and OpenedRegionHandler The reason I mention HBASE-3789 is that it is marked as Incompatible change and is in TRUNK already. The application of HBASE-3789 to 0.90 branch would change the behavior (timing) of region assignment. I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 BTW were the incorrect region assignments observed for a table with multiple initial regions ? If so, I have HBASE-4010 in TRUNK which speeds up initial region assignment by about 50%. Cheers On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner e...@gigya.com wrote: Ted, So if I understand correctly the the theory is that because of the issue fixed in HBASE-3789 the master took too long to detect that the region was successfully opened by the first server so it forced closed it and transitioned to a second server, but there are a few things about this scenario I don't understand, probably because I don't know enough about the inner workings of the region transition process and would appreciate it if you can help me understand: 1. The RS opened the region at 16:37:49. 2. The master started handling the opened event at 16:39:54 - this delay can probably be explained by HBASE-3789
Re: hbase host dns ip and route for multi network interface card
Hi, I think The problem is that: the /etc/hosts file is resolved the dns node3 to 192.168.1.15eth1, but the hbase inner sometime uses the 192.168.1.13eth0. When I use the command ifdown eth0 on node3 and use stop-hbase.sh, there shows the message: 2011-07-06 10:25:50,683 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:25:50,683 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:00,685 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:10,687 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:26:20,689 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:26:30,691 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop And when I ifup eth0 on node3, it will work well and stop the hbase normal: 2011-07-06 10:28:47,139 INFO org.apache.hadoop.hbase.master.ServerManager: Region server node3,60020,1309860160318 quiesced 2011-07-06 10:28:47,139 INFO org.apache.hadoop.hbase.master.ServerManager: All user tables quiesced. Proceeding with shutdown 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling root scanner to stop 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: telling meta scanner to stop 2011-07-06 10:28:47,139 DEBUG org.apache.hadoop.hbase.master.RegionManager: meta and root scanners notified 2011-07-06 10:28:47,338 INFO org.apache.hadoop.hbase.master.ServerManager: Removing server's info node3,60020,1309860160318 2011-07-06 10:28:47,338 INFO org.apache.hadoop.hbase.master.ServerManager: Region server node3,60020,1309860160318: MSG_REPORT_EXITING 2011-07-06 10:28:50,719 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer 2011/7/5 Jameson Li hovlj...@gmail.com Hi, when I start my hbase cluster, there are some error logs in the master-log: the ip and hostname node3 192.168.1.15 192.168.1.13 are the same machine that have two NIC 2011-07-05 17:13:13,820 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 2011-07-05 17:13:13,840 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server node3/192.168.1.15:2181 2011-07-05 17:13:13,975 DEBUG org.apache.hadoop.hbase.master.HMaster: Checking cluster state... 2011-07-05 17:13:13,979 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/root-region-server got 192.168.1.13:60020 2011-07-05 17:13:19,732 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode /hbase/rs/1309857199677 with data 192.168.1.15:60020 2011-07-05 17:22:01,041 INFO org.apache.hadoop.ipc.HbaseRPC: Server at / 192.168.1.13:60020 could not be reached after 1 tries, giving up. 2011-07-05 17:22:01,042 WARN org.apache.hadoop.hbase.master.BaseScanner: Scan one META region: {server: 192.168.1.13:60020, regionname: .META.,,1, startKey: }org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up proxy to /192.168.1.13:60020 after attempts=1 at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:429) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:918) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getHRegionConnection(HConnectionManager.java:934) at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:173) at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73) at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129) at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153) at org.apache.hadoop.hbase.Chore.run(Chore.java:68) Sometimes when the .META. region is not assigned to the server node3, which has two NIC:eth0:192.168.1.13 and eth1:192.168.1.15 and resolve the dns/host as:192.168.1.15 node3, I means, when the region .META. is assigned to the others server that has only one NIC, the hbase will work well. here is some of my hbase cluster infos: Hbase version:0.20.6 Hadoop version:0.20-append+4 Zookeeper version:3.3.0 the hbase-site.xml:
How to apply multiple row filters in an efficient way?
Hello, My table holds stock information where the keys are in the format: date-stock symbol. In my mapreduce job I need to operate on a subset of that list, say 500-2000 stocks, out of a total of 7000~. Sometimes I also need to consider only rows after a certain date. Question is - how can I do that efficiently? I don't know if HBase allow me to set multiple filters in a single Scane object, but I can do that with regex (for example (GOOG|IBM|DELL|...|n|)), but is this the right way? Thanks Sol
Re: Seperating Application and Database Servers
Hi, Thanks for your reply. We solved it. We had not started zookeeper earlier in the application server machine. When we started it, it is working fine. Thanks a lot again. With Regards, Jr. On Tue, Jul 5, 2011 at 6:53 PM, Ted Yu yuzhih...@gmail.com wrote: Check your DNS. localhost in the log below can also mean that your hbase.zookeeper.quorum is carrying default value. Modify it to point to the real quorum. On Tue, Jul 5, 2011 at 1:27 AM, James Ram hbas...@gmail.com wrote: Hi, We are running hadoop and hbase in a 9 machine cluster. We tried to put our application server on a machine outside the HBase cluster, but we are getting the following error. Is there any way that we can do this? 11/07/05 11:26:41 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpe cted error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:43 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 11/07/05 11:26:44 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpe cted error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:45 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 -- With Regards, Jr. -- With Regards, Jr.