Re: RS crash upon replication
Actually, it seems like something else was wrong here - the servers came up just fine on trying again - so could not really reproduce the issue. Amit: Did you try patching 8207 ? Varun On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.comwrote: That sounds like a bug for sure. Could you create a jira with logs/znode dump/steps to reproduce it? Thanks, himanshu On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote: It seems I can reproduce this - I did a few rolling restarts and got screwed with NoNode exceptions - I am running 0.94.7 which has the fix but my nodes don't contain hyphens - nodes are no longer coming back up... Thanks Varun On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: I'd suggest to please patch the code with 8207; cdh4.2.1 doesn't have it. With hyphens in the name, ReplicationSource gets confused and tried to set data in a znode which doesn't exist. Thanks, Himanshu On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com wrote: yes, indeed - hyphens are part of the host name (annoying legacy stuff in my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was backported by Cloudera into their flavor of 0.94.2, but the mysterious occurrence of the percent sign in zkcli (ls /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895) might be a sign for such problem. How deep should my rmr in zkcli (an example would be most welcomed :) be ? I have no serious problem running copyTable with a time period corresponding to the outage and then to start the sync back again. One question though, how did it cause a crash ? On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com wrote: I believe there were cascading failures which got these deep nodes containing still to be replicated WAL(s) - I suspect there is either some parsing bug or something which is causing the replication source to not work - also which version are you using - does it have https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens in our paths. One way to get back up is to delete these nodes but then you lose data in these WAL(s)... On Wed, May 22, 2013 at 2:22 PM, Amit Mor amit.mor.m...@gmail.com wrote: va-p-hbase-02-d,60020,1369249862401 On Thu, May 23, 2013 at 12:20 AM, Varun Sharma va...@pinterest.com wrote: Basically ls /hbase/rs and what do you see for va-p-02-d ? On Wed, May 22, 2013 at 2:19 PM, Varun Sharma va...@pinterest.com wrote: Can you do ls /hbase/rs and see what you get for 02-d - instead of looking in /replication/, could you look in /hbase/replication/rs - I want to see if the timestamps are matching or not ? Varun On Wed, May 22, 2013 at 2:17 PM, Varun Sharma va...@pinterest.com wrote: I see - so looks okay - there's just a lot of deep nesting in there - if you look into these you nodes by doing ls - you should see a bunch of WAL(s) which still need to be replicated... Varun On Wed, May 22, 2013 at 2:16 PM, Varun Sharma va...@pinterest.com wrote: 2013-05-22 15:31:25,929 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for * /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719 * * * *01-[01-02-02]-01* *Looks like a bunch of cascading failures causing this deep nesting... * On Wed, May 22, 2013 at 2:09 PM, Amit Mor amit.mor.m...@gmail.com wrote: empty return: [zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1 [] On Thu, May 23, 2013 at 12:05 AM, Varun Sharma va...@pinterest.com wrote: Do an ls not a get here and give the output ? ls /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
Re: RS crash upon replication
No the server came out fine just because after the crash (RS's - the masters were still running), I immediately pulled the breaks with stop_replication. Then I start the RS's and they came back fine (not replicating). Once I hit 'start_replication' again they had crashed for the second time. Eventually I deleted the heavily nested replication znodes and the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH with Cloudera Manager Parcels thing and I'm still trying to figure out how to replace their jars with mine in a clean and non intrusive way On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote: Actually, it seems like something else was wrong here - the servers came up just fine on trying again - so could not really reproduce the issue. Amit: Did you try patching 8207 ? Varun On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: That sounds like a bug for sure. Could you create a jira with logs/znode dump/steps to reproduce it? Thanks, himanshu On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote: It seems I can reproduce this - I did a few rolling restarts and got screwed with NoNode exceptions - I am running 0.94.7 which has the fix but my nodes don't contain hyphens - nodes are no longer coming back up... Thanks Varun On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: I'd suggest to please patch the code with 8207; cdh4.2.1 doesn't have it. With hyphens in the name, ReplicationSource gets confused and tried to set data in a znode which doesn't exist. Thanks, Himanshu On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com wrote: yes, indeed - hyphens are part of the host name (annoying legacy stuff in my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was backported by Cloudera into their flavor of 0.94.2, but the mysterious occurrence of the percent sign in zkcli (ls /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895) might be a sign for such problem. How deep should my rmr in zkcli (an example would be most welcomed :) be ? I have no serious problem running copyTable with a time period corresponding to the outage and then to start the sync back again. One question though, how did it cause a crash ? On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com wrote: I believe there were cascading failures which got these deep nodes containing still to be replicated WAL(s) - I suspect there is either some parsing bug or something which is causing the replication source to not work - also which version are you using - does it have https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens in our paths. One way to get back up is to delete these nodes but then you lose data in these WAL(s)... On Wed, May 22, 2013 at 2:22 PM, Amit Mor amit.mor.m...@gmail.com wrote: va-p-hbase-02-d,60020,1369249862401 On Thu, May 23, 2013 at 12:20 AM, Varun Sharma va...@pinterest.com wrote: Basically ls /hbase/rs and what do you see for va-p-02-d ? On Wed, May 22, 2013 at 2:19 PM, Varun Sharma va...@pinterest.com wrote: Can you do ls /hbase/rs and see what you get for 02-d - instead of looking in /replication/, could you look in /hbase/replication/rs - I want to see if the timestamps are matching or not ? Varun On Wed, May 22, 2013 at 2:17 PM, Varun Sharma va...@pinterest.com wrote: I see - so looks okay - there's just a lot of deep nesting in there - if you look into these you nodes by doing ls - you should see a bunch of WAL(s) which still need to be replicated... Varun On Wed, May 22, 2013 at 2:16 PM, Varun Sharma va...@pinterest.com wrote: 2013-05-22 15:31:25,929 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for *
Re: Poor HBase map-reduce scan performance
I am considering scanning a snapshot instead of the table. I believe this is what the ExportSnapshot class does. If I could use the scanning code from ExportSnapshot then I will be able to scan the HDFS files directly and bypass the regionservers. This could potentially give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table. On May 22, 2013, at 3:57 PM, Ted Yu yuzhih...@gmail.com wrote: Sandy: Looking at patch v6 of HBASE-8420, I think it is different from your approach below for the case of cache.size() == 0. Maybe log a JIRA for further discussion ? On Wed, May 22, 2013 at 3:33 PM, Sandy Pratt prat...@adobe.com wrote: It seems to be in the ballpark of what I was getting at, but I haven't fully digested the code yet, so I can't say for sure. Here's what I'm getting at. Looking at o.a.h.h.client.ClientScanner.next() in the 94.2 source I have loaded, I see there are three branches with respect to the cache: public Result next() throws IOException { // If the scanner is closed and there's nothing left in the cache, next is a no-op. if (cache.size() == 0 this.closed) { return null; } if (cache.size() == 0) { // Request more results from RS ... } if (cache.size() 0) { return cache.poll(); } ... return null; } I think that middle branch wants to change as follows (pseudo-code): if the cache size is below a certain threshold then initiate asynchronous action to refill it if there is no result to return until the cache refill completes then block done done Or something along those lines. I haven't grokked the patch well enough yet to tell if that's what it does. What I think is happening in the 0.94.2 code I've got is that it requests nothing until the cache is empty, then blocks until it's non-empty. We want to eagerly and asynchronously refill the cache so that we ideally never have to block. Sandy On 5/22/13 1:39 PM, Ted Yu yuzhih...@gmail.com wrote: Sandy: Do you think the following JIRA would help with what you expect in this regard ? HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb Cheers On Wed, May 22, 2013 at 1:29 PM, Sandy Pratt prat...@adobe.com wrote: I found this thread on search-hadoop.com just now because I've been wrestling with the same issue for a while and have as yet been unable to solve it. However, I think I have an idea of the problem. My theory is based on assumptions about what's going on in HBase and HDFS internally, so please correct me if I'm wrong. Briefly, I think the issue is that sequential reads from HDFS are pipelined, whereas sequential reads from HBase are not. Therefore, sequential reads from HDFS tend to keep the IO subsystem saturated, while sequential reads from HBase allow it to idle for a relatively large proportion of time. To make this more concrete, suppose that I'm reading N bytes of data from a file in HDFS. I issue the calls to open the file and begin to read (from an InputStream, for example). As I'm reading byte 1 of the stream at my client, the datanode is reading byte M where 1 M = N from disk. Thus, three activities tend to happen concurrently for the most part (disregarding the beginning and end of the file): 1) processing at the client; 2) streaming over the network from datanode to client; and 3) reading data from disk at the datanode. The proportion of time these three activities overlap tends towards 100% as N - infinity. Now suppose I read a batch of R records from HBase (where R = whatever scanner caching happens to be). As I understand it, I issue my call to ResultScanner.next(), and this causes the RegionServer to block as if on a page fault while it loads enough HFile blocks from disk to cover the R records I (implicitly) requested. After the blocks are loaded into the block cache on the RS, the RS returns R records to me over the network. Then I process the R records locally. When they are exhausted, this cycle repeats. The notable upshot is that while the RS is faulting HFile blocks into the cache, my client is blocked. Furthermore, while my client is processing records, the RS is idle with respect to work on behalf of my client. That last point is really the killer, if I'm correct in my assumptions. It means that Scanner caching and larger block sizes work only to amortize the fixed overhead of disk IOs and RPCs -- they do nothing to keep the IO subsystems saturated during sequential reads. What *should* happen is that the RS should treat the Scanner caching value (R above) as a hint that it should always have ready records r + 1 to r + R when I'm reading record r, at least up to the region boundary. The RS should be preparing eagerly for the next call to ResultScanner.next(), which I suspect it's currently not doing. Another way to state this
Re: RS crash upon replication
fwiw stop_replication is a kill switch, not a general way to start and stop replicating, and start_replication may put you in an inconsistent state: hbase(main):001:0 help 'stop_replication' Stops all the replication features. The state in which each stream stops in is undetermined. WARNING: start/stop replication is only meant to be used in critical load situations. On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote: No the server came out fine just because after the crash (RS's - the masters were still running), I immediately pulled the breaks with stop_replication. Then I start the RS's and they came back fine (not replicating). Once I hit 'start_replication' again they had crashed for the second time. Eventually I deleted the heavily nested replication znodes and the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH with Cloudera Manager Parcels thing and I'm still trying to figure out how to replace their jars with mine in a clean and non intrusive way On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote: Actually, it seems like something else was wrong here - the servers came up just fine on trying again - so could not really reproduce the issue. Amit: Did you try patching 8207 ? Varun On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: That sounds like a bug for sure. Could you create a jira with logs/znode dump/steps to reproduce it? Thanks, himanshu On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote: It seems I can reproduce this - I did a few rolling restarts and got screwed with NoNode exceptions - I am running 0.94.7 which has the fix but my nodes don't contain hyphens - nodes are no longer coming back up... Thanks Varun On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: I'd suggest to please patch the code with 8207; cdh4.2.1 doesn't have it. With hyphens in the name, ReplicationSource gets confused and tried to set data in a znode which doesn't exist. Thanks, Himanshu On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com wrote: yes, indeed - hyphens are part of the host name (annoying legacy stuff in my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was backported by Cloudera into their flavor of 0.94.2, but the mysterious occurrence of the percent sign in zkcli (ls /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895) might be a sign for such problem. How deep should my rmr in zkcli (an example would be most welcomed :) be ? I have no serious problem running copyTable with a time period corresponding to the outage and then to start the sync back again. One question though, how did it cause a crash ? On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com wrote: I believe there were cascading failures which got these deep nodes containing still to be replicated WAL(s) - I suspect there is either some parsing bug or something which is causing the replication source to not work - also which version are you using - does it have https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens in our paths. One way to get back up is to delete these nodes but then you lose data in these WAL(s)... On Wed, May 22, 2013 at 2:22 PM, Amit Mor amit.mor.m...@gmail.com wrote: va-p-hbase-02-d,60020,1369249862401 On Thu, May 23, 2013 at 12:20 AM, Varun Sharma va...@pinterest.com wrote: Basically ls /hbase/rs and what do you see for va-p-02-d ? On Wed, May 22, 2013 at 2:19 PM, Varun Sharma va...@pinterest.com wrote: Can you do ls /hbase/rs and see what you get for 02-d - instead of looking in /replication/, could you look in /hbase/replication/rs - I want to see if the timestamps are matching or not ? Varun On Wed, May 22, 2013 at 2:17 PM, Varun Sharma va...@pinterest.com wrote: I see - so looks okay - there's just a lot of deep nesting in there - if you look into these you nodes by doing ls - you should see a bunch of WAL(s) which still need to be replicated... Varun On Wed, May 22, 2013 at 2:16 PM, Varun Sharma va...@pinterest.com wrote: 2013-05-22
Re: hbase region server shutdown after datanode connection exception
You are looking at it the wrong way. Per http://hbase.apache.org/book.html#trouble.general, always walk up the log to the first exception. In this case it's a session timeout. Whatever happens next is most probably a side effect of that. To help debug your issue, I would suggest reading this section of the reference guide: http://hbase.apache.org/book.html#trouble.rs.runtime J-D On Tue, May 21, 2013 at 7:17 PM, Cheng Su scarcer...@gmail.com wrote: Hi all. I have a small hbase cluster with 3 physical machines. On 192.168.1.80, there are HMaster and a region server. On 81 82, there is a region server on each. The region server on 80 can't sync HLog after a datanode access exception, and started to shutdown. The datanode itself was not shutdown and response other requests normally. I'll paste logs below. My question is: 1. Why this exception causes region server shutdown? Can I prevent it? 2. Is there any tools(shell command is best, like hadoop dfsadmin -report) can monitor hbase region server? to check whether it is alive or dead? I have done some research that nagios/ganglia can do such things. But actually I just want know the region server is alive or dead, so they are a little over qualify. And I'm not using CDH, so I can't use Cloudera Manager I think. Here are the logs. HBase master: 2013-05-21 17:03:32,675 ERROR org.apache.hadoop.hbase.master.HMaster: Region server ^@^@hadoop01,60020,1368774173179 reported a fatal error: ABORTING region server hadoop01,60020,1368774173179: regionserver:60020-0x3eb14c67540002 regionserver:60020-0x3eb14c67540002 received expired from ZooKeeper, aborting Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeper Watcher.java:369) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher. java:266) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521 ) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) Region Server: 2013-05-21 17:00:16,895 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 12ms for sessionid 0x3eb14c67540002, closing socket connection and attempting re connect 2013-05-21 17:00:35,896 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 12ms for sessionid 0x13eb14ca4bb, closing socket connection and attempting r econnect 2013-05-21 17:03:31,498 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_9188414668950016309_4925046java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.80:57020 remote=/192.168.1.82:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readLong(DataInputStream.java:399) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck. readFields(DataTransferProtocol.java:124) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCl ient.java:2784) 2013-05-21 17:03:31,520 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_9188414668950016309_4925046 bad datanode[0] 192.168.1.82:50010 2013-05-21 17:03:32,315 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server /192.168.1.82:2100 2013-05-21 17:03:32,316 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to hadoop03/192.168.1.82:2100, initiating session 2013-05-21 17:03:32,317 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server hadoop03/192.168.1.82:2100, sessionid = 0x13eb14ca4bb, negotiated timeout = 18 2013-05-21 17:03:32,497 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. Requesting close of hlog java.io.IOException: Reflection at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence FileLogWriter.java:230) at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1091) at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog. java:1057) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException
Re: RS crash upon replication
But wouldn't a copy table b/w timestamps bring you back since the mutations are all timestamp based we should okay ? Basically doing a copy table which supersedes the downtime interval ? On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: fwiw stop_replication is a kill switch, not a general way to start and stop replicating, and start_replication may put you in an inconsistent state: hbase(main):001:0 help 'stop_replication' Stops all the replication features. The state in which each stream stops in is undetermined. WARNING: start/stop replication is only meant to be used in critical load situations. On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote: No the server came out fine just because after the crash (RS's - the masters were still running), I immediately pulled the breaks with stop_replication. Then I start the RS's and they came back fine (not replicating). Once I hit 'start_replication' again they had crashed for the second time. Eventually I deleted the heavily nested replication znodes and the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH with Cloudera Manager Parcels thing and I'm still trying to figure out how to replace their jars with mine in a clean and non intrusive way On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote: Actually, it seems like something else was wrong here - the servers came up just fine on trying again - so could not really reproduce the issue. Amit: Did you try patching 8207 ? Varun On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: That sounds like a bug for sure. Could you create a jira with logs/znode dump/steps to reproduce it? Thanks, himanshu On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote: It seems I can reproduce this - I did a few rolling restarts and got screwed with NoNode exceptions - I am running 0.94.7 which has the fix but my nodes don't contain hyphens - nodes are no longer coming back up... Thanks Varun On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: I'd suggest to please patch the code with 8207; cdh4.2.1 doesn't have it. With hyphens in the name, ReplicationSource gets confused and tried to set data in a znode which doesn't exist. Thanks, Himanshu On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com wrote: yes, indeed - hyphens are part of the host name (annoying legacy stuff in my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was backported by Cloudera into their flavor of 0.94.2, but the mysterious occurrence of the percent sign in zkcli (ls /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895) might be a sign for such problem. How deep should my rmr in zkcli (an example would be most welcomed :) be ? I have no serious problem running copyTable with a time period corresponding to the outage and then to start the sync back again. One question though, how did it cause a crash ? On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com wrote: I believe there were cascading failures which got these deep nodes containing still to be replicated WAL(s) - I suspect there is either some parsing bug or something which is causing the replication source to not work - also which version are you using - does it have https://issues.apache.org/jira/browse/HBASE-8207 - since you use hyphens in our paths. One way to get back up is to delete these nodes but then you lose data in these WAL(s)... On Wed, May 22, 2013 at 2:22 PM, Amit Mor amit.mor.m...@gmail.com wrote: va-p-hbase-02-d,60020,1369249862401 On Thu, May 23, 2013 at 12:20 AM, Varun Sharma va...@pinterest.com wrote: Basically ls /hbase/rs and what do you see for va-p-02-d ? On Wed, May 22, 2013 at 2:19 PM, Varun Sharma va...@pinterest.com wrote: Can you do ls /hbase/rs and see what you get for 02-d - instead of looking in /replication/, could you look in /hbase/replication/rs - I want to see if the timestamps are matching or not ? Varun On Wed, May 22, 2013 at 2:17 PM, Varun Sharma
Re: HBase is not running.
Jay, I was looking at your blog and I noticed these entries for Zookeeper: hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3 Since I'm doing everything on my laptop, all that I would need to do is just put localhost in that location, yes? On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote: Yves, im going through the same issues, very tricky first time around. Use the stable version. Try these 3 tips : I just finished getting it working and wrote it up in distributed mode on a cluster of VMs. http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: How weird. When I start up hbase using start-hbase.sh and then check to make sure that the process is running, I don't see anything with JPS. However, when I run stop-hbase.sh and then check my java processes, I *DO* see a Java process running in JPS... why? On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret yoursurrogate...@gmail.com wrote: Still stuck on this. I did something different, I tried version 0.92.2. This is the log from that older version. http://bin.cakephp.org/view/617939270 The other weird thing that I noticed with version on 0.92.2 is this. I started HBase and this is the output that I got: $ $HBASE_HOME/bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out (pardon the annoyingly long file name) Now, when I run JPS, this is what I see: $ $JAVA_HOME/bin/jps 14038 RunJar 2059 Jps HBase... is not running? What the heck? On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com wrote: Is this the only thing which appears on your screen?Could you please show me your config files? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: This is what happens when I start hbase. $ bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out No real problems or error or warnings... but it does not have the same output that you have in your blog... perhaps that's an issue? On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com wrote: No issues. Are the HBase daemons running fine?Are you able to initiate anything from the shell? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi, sorry, I thought I had more info than what was displayed in my e-mail a little earlier today that had the little list, but I did not. My biggest hangup is getting that Web GUI to work. That's really where I'm stuck. I followed your tutorial on the cloudfront blog and when it came time to go to http://localhost:60010, I did not see an output. On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.com wrote: sure.. Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hello, No, still having issues. I'll give you some more details in a second. On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq donta...@gmail.com wrote: Hey Yves, I am sorry for being unresponsive. I was travelling and was out of reach. What's the current status?Are you good now? Warm Regards, Tariq cloudfront.blogspot.com On Tue, May 21, 2013 at 11:15 PM, Asaf Mesika asaf.mes...@gmail.com wrote: Yes. On May 21, 2013, at 8:32 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Do you mean this? http://blog.devving.com/hbase-quickstart-guide/ On Tue, May 21, 2013 at 1:29 PM, Asaf Mesika asaf.mes...@gmail.com wrote: Devving.com has a good tutorial on HBase first setup On Tuesday, May 21, 2013, Yves S. Garret wrote: Hi Mohammad, I was following your tutorial and when I got to the part when you do $ bin/start-hbase.sh, this is what I get: http://bin.cakephp.org/view/428090088 I'll keep looking online for
Re: Risk about RS logs clean ?
IIRC the version in previous branches should have an epic lock somewhere (cacheFlushLock or something like that) that should make this map manipulations safe also. On Wed, May 22, 2013 at 6:27 PM, Bing Jiang jiangbinglo...@gmail.comwrote: Hi,Sergey. The version of hbase in our environment is 0.94.3, and the FSHLog.java comes from 0.95 or version above. And it adds such codes in FSHLog::cleanOldLogs, long oldestOutstandingSeqNum = Long.MAX_VALUE; synchronized (oldestSeqNumsLock) { Long oldestFlushing = (oldestFlushingSeqNums.size() 0) ? Collections.min(oldestFlushingSeqNums.values()) : Long.MAX_VALUE ; Long oldestUnflushed = (oldestUnflushedSeqNums.size() 0) ? Collections.min(oldestUnflushedSeqNums.values()) : Long. MAX_VALUE; oldestOutstandingSeqNum = Math.min(oldestFlushing, oldestUnflushed); } Which is different from the function from 0.94.3. private byte [][] cleanOldLogs() throws IOException { Long oldestOutstandingSeqNum = getOldestOutstandingSeqNum(); ... } private Long getOldestOutstandingSeqNum() { return Collections.min(this.lastSeqWritten.values()); } And I think the version in trunk is safe. Thanks for Sergey. 2013/5/23 Sergey Shelukhin ser...@hortonworks.com FSHLog (in trunk) stores the earliest seqnums for each region in current memstore, and earliest flushing seqnum (see FSHLog::start/complete/abortCacheFlush). When logs are deleted the logs with seqnums that are above the earliest flushing/flushed seqnum for any region are not deleted (see FSHLog::cleanOldLogs). On Wed, May 22, 2013 at 5:39 AM, Bing Jiang jiangbinglo...@gmail.com wrote: Hi,all I want to know how RS eliminates the unnecessary hlogs. lastSeqNum stores RegionName, latest KV Seq id and outputfiles stores last Seq id before new hlog file, file path So, how does rs guarantee that the kv in the hlog to be cleared have been already flushed from memstore into hfile. I have try to read source code to make sense, however, I am not sure whether it is a source of the risk of data loss. Thanks. -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: http://blog.sina.com.cn/jiangbinglover National Research Center for Intelligent Computing Systems Institute of Computing technology Graduate University of Chinese Academy of Science -- Bing Jiang Tel:(86)134-2619-1361 weibo: http://weibo.com/jiangbinglover BLOG: http://blog.sina.com.cn/jiangbinglover National Research Center for Intelligent Computing Systems Institute of Computing technology Graduate University of Chinese Academy of Science
Re: RS crash upon replication
Thanks for the helpful comments. I would certainly dig deeper now that everything has stabilized. Regarding J-D's comment - once my slave cluster was started, after about 4 hours of downtime (it's for offline stuff), at the very moment it came back online, 5 RS of my master-replication cluster crashed. Since I had no time figuring out what went wrong with the replication I submitted the 'stop_replication' knowing that's a last resort,since I had to get those production RS's online asap. I think renaming that cmd to something like 'abort_replication' would be more fitting. On the other hand, remove_peer(1) at a time of crisis feels like a developer's solution to a DBA's problem ;) Regarding copyTable, it's all good and well, but one has to consider that I'm on ec2 and the cluster is already streched out by 'online' service requests and copyTable would hit it's resources quite badly. I'll be glad to update. Thanks again, Amit Original message From: Varun Sharma va...@pinterest.com Date: To: user@hbase.apache.org Subject: Re: RS crash upon replication But wouldn't a copy table b/w timestamps bring you back since the mutations are all timestamp based we should okay ? Basically doing a copy table which supersedes the downtime interval ? On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: fwiw stop_replication is a kill switch, not a general way to start and stop replicating, and start_replication may put you in an inconsistent state: hbase(main):001:0 help 'stop_replication' Stops all the replication features. The state in which each stream stops in is undetermined. WARNING: start/stop replication is only meant to be used in critical load situations. On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote: No the server came out fine just because after the crash (RS's - the masters were still running), I immediately pulled the breaks with stop_replication. Then I start the RS's and they came back fine (not replicating). Once I hit 'start_replication' again they had crashed for the second time. Eventually I deleted the heavily nested replication znodes and the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH with Cloudera Manager Parcels thing and I'm still trying to figure out how to replace their jars with mine in a clean and non intrusive way On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote: Actually, it seems like something else was wrong here - the servers came up just fine on trying again - so could not really reproduce the issue. Amit: Did you try patching 8207 ? Varun On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: That sounds like a bug for sure. Could you create a jira with logs/znode dump/steps to reproduce it? Thanks, himanshu On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote: It seems I can reproduce this - I did a few rolling restarts and got screwed with NoNode exceptions - I am running 0.94.7 which has the fix but my nodes don't contain hyphens - nodes are no longer coming back up... Thanks Varun On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com wrote: I'd suggest to please patch the code with 8207; cdh4.2.1 doesn't have it. With hyphens in the name, ReplicationSource gets confused and tried to set data in a znode which doesn't exist. Thanks, Himanshu On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com wrote: yes, indeed - hyphens are part of the host name (annoying legacy stuff in my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was backported by Cloudera into their flavor of 0.94.2, but the mysterious occurrence of the percent sign in zkcli (ls /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895) might be a sign for such problem. How deep should my rmr in zkcli (an example would be most welcomed :) be ? I have no serious problem running copyTable with a time period corresponding to the outage and then to start the sync back again. One question though, how did it cause a crash ? On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com wrote: I believe there were cascading failures which got these deep nodes containing still to be replicated WAL(s) - I suspect there is either some parsing bug or something which is causing the replication source to not work - also which version are you using -
Re: HBase is not running.
Progress! :) Now, I'm getting this error :) 13/05/23 13:29:00 ERROR client.HConnectionManager$HConnectionImplementation: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Jay, I was looking at your blog and I noticed these entries for Zookeeper: hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3 Since I'm doing everything on my laptop, all that I would need to do is just put localhost in that location, yes? On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote: Yves, im going through the same issues, very tricky first time around. Use the stable version. Try these 3 tips : I just finished getting it working and wrote it up in distributed mode on a cluster of VMs. http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: How weird. When I start up hbase using start-hbase.sh and then check to make sure that the process is running, I don't see anything with JPS. However, when I run stop-hbase.sh and then check my java processes, I *DO* see a Java process running in JPS... why? On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret yoursurrogate...@gmail.com wrote: Still stuck on this. I did something different, I tried version 0.92.2. This is the log from that older version. http://bin.cakephp.org/view/617939270 The other weird thing that I noticed with version on 0.92.2 is this. I started HBase and this is the output that I got: $ $HBASE_HOME/bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out (pardon the annoyingly long file name) Now, when I run JPS, this is what I see: $ $JAVA_HOME/bin/jps 14038 RunJar 2059 Jps HBase... is not running? What the heck? On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com wrote: Is this the only thing which appears on your screen?Could you please show me your config files? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: This is what happens when I start hbase. $ bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out No real problems or error or warnings... but it does not have the same output that you have in your blog... perhaps that's an issue? On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com wrote: No issues. Are the HBase daemons running fine?Are you able to initiate anything from the shell? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi, sorry, I thought I had more info than what was displayed in my e-mail a little earlier today that had the little list, but I did not. My biggest hangup is getting that Web GUI to work. That's really where I'm stuck. I followed your tutorial on the cloudfront blog and when it came time to go to http://localhost:60010, I did not see an output. On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.com wrote: sure.. Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hello, No, still having issues. I'll give you some more details in a second. On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq donta...@gmail.com wrote: Hey Yves, I am sorry for being unresponsive. I was travelling and was out of reach. What's the current status?Are you good now? Warm Regards, Tariq cloudfront.blogspot.com On Tue, May 21, 2013 at 11:15 PM, Asaf Mesika asaf.mes...@gmail.com wrote: Yes. On May 21, 2013, at 8:32 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Do you mean this? http://blog.devving.com/hbase-quickstart-guide/ On Tue, May 21, 2013 at 1:29 PM, Asaf Mesika asaf.mes...@gmail.com wrote: Devving.com has a good tutorial on HBase first setup On Tuesday,
Re: HBase is not running.
Hi all, some more info. The GUI does not work, even though Zookeeper is working. When I start the shell and enter list, this is what I get: http://bin.cakephp.org/view/518131554 This is my hbase-site.xml file: http://bin.cakephp.org/view/7632959 For zookeeper.znode.parent, I had create an /hbase (has regular user privs, not root). But, honestly, I'm not sure what the value should be for that property. Could someone provide some input on what this is supposed to be? I read the docs, but did not understand what they were saying Root ZNode for HBase in Zookeeper. On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Progress! :) Now, I'm getting this error :) 13/05/23 13:29:00 ERROR client.HConnectionManager$HConnectionImplementation: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Jay, I was looking at your blog and I noticed these entries for Zookeeper: hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3 Since I'm doing everything on my laptop, all that I would need to do is just put localhost in that location, yes? On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote: Yves, im going through the same issues, very tricky first time around. Use the stable version. Try these 3 tips : I just finished getting it working and wrote it up in distributed mode on a cluster of VMs. http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: How weird. When I start up hbase using start-hbase.sh and then check to make sure that the process is running, I don't see anything with JPS. However, when I run stop-hbase.sh and then check my java processes, I *DO* see a Java process running in JPS... why? On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret yoursurrogate...@gmail.com wrote: Still stuck on this. I did something different, I tried version 0.92.2. This is the log from that older version. http://bin.cakephp.org/view/617939270 The other weird thing that I noticed with version on 0.92.2 is this. I started HBase and this is the output that I got: $ $HBASE_HOME/bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out (pardon the annoyingly long file name) Now, when I run JPS, this is what I see: $ $JAVA_HOME/bin/jps 14038 RunJar 2059 Jps HBase... is not running? What the heck? On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com wrote: Is this the only thing which appears on your screen?Could you please show me your config files? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: This is what happens when I start hbase. $ bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out No real problems or error or warnings... but it does not have the same output that you have in your blog... perhaps that's an issue? On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com wrote: No issues. Are the HBase daemons running fine?Are you able to initiate anything from the shell? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi, sorry, I thought I had more info than what was displayed in my e-mail a little earlier today that had the little list, but I did not. My biggest hangup is getting that Web GUI to work. That's really where I'm stuck. I followed your tutorial on the cloudfront blog and when it came time to go to http://localhost:60010, I did not see an output. On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.com wrote: sure.. Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hello, No, still having issues. I'll give you some more details in a second. On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq donta...@gmail.com wrote: Hey Yves, I am sorry for being unresponsive. I was travelling and was out of reach. What's the current status?Are you good now?
Re: HBase is not running.
Hi Yves ! Okay... So - this is probably related to your last question. The zookeeper nodes should 1-1 match the /etc/hosts ... How many VMs do you have? Or are you just running in local mode with one jvm per HBASE process? I think that, if the latter, having localhost in the zookeeper xml will work. On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Progress! :) Now, I'm getting this error :) 13/05/23 13:29:00 ERROR client.HConnectionManager$HConnectionImplementation: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Jay, I was looking at your blog and I noticed these entries for Zookeeper: hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3 Since I'm doing everything on my laptop, all that I would need to do is just put localhost in that location, yes? On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote: Yves, im going through the same issues, very tricky first time around. Use the stable version. Try these 3 tips : I just finished getting it working and wrote it up in distributed mode on a cluster of VMs. http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: How weird. When I start up hbase using start-hbase.sh and then check to make sure that the process is running, I don't see anything with JPS. However, when I run stop-hbase.sh and then check my java processes, I *DO* see a Java process running in JPS... why? On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret yoursurrogate...@gmail.com wrote: Still stuck on this. I did something different, I tried version 0.92.2. This is the log from that older version. http://bin.cakephp.org/view/617939270 The other weird thing that I noticed with version on 0.92.2 is this. I started HBase and this is the output that I got: $ $HBASE_HOME/bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out (pardon the annoyingly long file name) Now, when I run JPS, this is what I see: $ $JAVA_HOME/bin/jps 14038 RunJar 2059 Jps HBase... is not running? What the heck? On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com wrote: Is this the only thing which appears on your screen?Could you please show me your config files? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: This is what happens when I start hbase. $ bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out No real problems or error or warnings... but it does not have the same output that you have in your blog... perhaps that's an issue? On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com wrote: No issues. Are the HBase daemons running fine?Are you able to initiate anything from the shell? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi, sorry, I thought I had more info than what was displayed in my e-mail a little earlier today that had the little list, but I did not. My biggest hangup is getting that Web GUI to work. That's really where I'm stuck. I followed your tutorial on the cloudfront blog and when it came time to go to http://localhost:60010, I did not see an output. On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.com wrote: sure.. Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hello, No, still having issues. I'll give you some more details in a second. On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq donta...@gmail.com wrote: Hey Yves, I am sorry for being unresponsive. I was travelling and was out of reach. What's the current status?Are you good now? Warm Regards, Tariq cloudfront.blogspot.com On
Re: HBase is not running.
Yup, just a single VM on one machine. /etc/hosts is currently this: 127.0.0.1 localhost On Thu, May 23, 2013 at 2:47 PM, Jay Vyas jayunit...@gmail.com wrote: Hi Yves ! Okay... So - this is probably related to your last question. The zookeeper nodes should 1-1 match the /etc/hosts ... How many VMs do you have? Or are you just running in local mode with one jvm per HBASE process? I think that, if the latter, having localhost in the zookeeper xml will work. On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Progress! :) Now, I'm getting this error :) 13/05/23 13:29:00 ERROR client.HConnectionManager$HConnectionImplementation: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master. On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: Jay, I was looking at your blog and I noticed these entries for Zookeeper: hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3 Since I'm doing everything on my laptop, all that I would need to do is just put localhost in that location, yes? On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote: Yves, im going through the same issues, very tricky first time around. Use the stable version. Try these 3 tips : I just finished getting it working and wrote it up in distributed mode on a cluster of VMs. http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret yoursurrogate...@gmail.comwrote: How weird. When I start up hbase using start-hbase.sh and then check to make sure that the process is running, I don't see anything with JPS. However, when I run stop-hbase.sh and then check my java processes, I *DO* see a Java process running in JPS... why? On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret yoursurrogate...@gmail.com wrote: Still stuck on this. I did something different, I tried version 0.92.2. This is the log from that older version. http://bin.cakephp.org/view/617939270 The other weird thing that I noticed with version on 0.92.2 is this. I started HBase and this is the output that I got: $ $HBASE_HOME/bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out (pardon the annoyingly long file name) Now, when I run JPS, this is what I see: $ $JAVA_HOME/bin/jps 14038 RunJar 2059 Jps HBase... is not running? What the heck? On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com wrote: Is this the only thing which appears on your screen?Could you please show me your config files? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: This is what happens when I start hbase. $ bin/start-hbase.sh starting master, logging to /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out No real problems or error or warnings... but it does not have the same output that you have in your blog... perhaps that's an issue? On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com wrote: No issues. Are the HBase daemons running fine?Are you able to initiate anything from the shell? Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi, sorry, I thought I had more info than what was displayed in my e-mail a little earlier today that had the little list, but I did not. My biggest hangup is getting that Web GUI to work. That's really where I'm stuck. I followed your tutorial on the cloudfront blog and when it came time to go to http://localhost:60010, I did not see an output. On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.com wrote: sure.. Warm Regards, Tariq cloudfront.blogspot.com On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hello, No, still having issues. I'll give you some more details in a second. On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq
Re: HBase is not running.
I think this is simply an issue, then, that /hbase has the wrong privileges or is inaccessible . (1) Are you sure hdfs or whatever filesystem you are using is running and (2) has the /hbase directory in it with (3) liberal enough permissions?
Re: RS crash upon replication
I have pasted most of the RS's logs just prior to their FATAL and including. Would be very thankful if someone can take a look: http://pastebin.com/qFzycXNS . Interestingly, some RS's experience an IOException for not finding an .oldlogs/ file. The rest get KeeperException$NoNodeException w/o the IOE. Thanks
Re: HBase is not running.
This is how hbase is looking from / drwxr-xr-x. 2 user user 4096 May 23 14:37 hbase But, now that you mention it, would I need to have hdfs running as well? On Thu, May 23, 2013 at 2:57 PM, Jay Vyas jayunit...@gmail.com wrote: I think this is simply an issue, then, that /hbase has the wrong privileges or is inaccessible . (1) Are you sure hdfs or whatever filesystem you are using is running and (2) has the /hbase directory in it with (3) liberal enough permissions?
Re: HBase is not running.
depends on how you define hbase root in your hbase-site.xml ? Can you paste it here
Re: HBase is not running.
Here is the entire file: http://bin.cakephp.org/view/1159889135 On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote: depends on how you define hbase root in your hbase-site.xml ? Can you paste it here
Re: HBase is not running.
1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. 3) killall -9 java and restart start-hbase.sh On May 23, 2013, at 5:31 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is the entire file: http://bin.cakephp.org/view/1159889135 On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote: depends on how you define hbase root in your hbase-site.xml ? Can you paste it here
Re: HBase is not running.
On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: HBase is not running.
On Thu, May 23, 2013 at 5:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). You mean like this? 127.0.0.1 localhost 192.168.1.3hbase-master 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Like so? http://bin.cakephp.org/view/2005790890 3) killall -9 java and restart start-hbase.sh Did that... web gui still not showing up. On May 23, 2013, at 5:31 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is the entire file: http://bin.cakephp.org/view/1159889135 On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote: depends on how you define hbase root in your hbase-site.xml ? Can you paste it here
Re: HBase is not running.
Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: HBase is not running.
That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: Poor HBase map-reduce scan performance
I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with the ClientScanner and ~44k records/sec with my P/C scanner). However, when I set scanner caching to 5000, it's more of a wash compared to the standard ClientScanner: ~53k records/sec with the ClientScanner and ~60k records/sec with the P/C scanner. I'm not sure what to make of those results. I think next I'll shut down HBase and read the HFiles directly, to see if there's a drop off in performance between reading them directly vs. via the RegionServer. I still think that to really solve this there needs to be sliding window of records in flight between disk and RS, and between RS and client. I'm thinking there's probably a single batch of records in flight between RS and client at the moment. Sandy On 5/23/13 8:45 AM, Bryan Keller brya...@gmail.com wrote: I am considering scanning a snapshot instead of the table. I believe this is what the ExportSnapshot class does. If I could use the scanning code from ExportSnapshot then I will be able to scan the HDFS files directly and bypass the regionservers. This could potentially give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table.
Re: Poor HBase map-reduce scan performance
Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote: I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with the ClientScanner and ~44k records/sec with my P/C scanner). However, when I set scanner caching to 5000, it's more of a wash compared to the standard ClientScanner: ~53k records/sec with the ClientScanner and ~60k records/sec with the P/C scanner. I'm not sure what to make of those results. I think next I'll shut down HBase and read the HFiles directly, to see if there's a drop off in performance between reading them directly vs. via the RegionServer. I still think that to really solve this there needs to be sliding window of records in flight between disk and RS, and between RS and client. I'm thinking there's probably a single batch of records in flight between RS and client at the moment. Sandy On 5/23/13 8:45 AM, Bryan Keller brya...@gmail.com wrote: I am considering scanning a snapshot instead of the table. I believe this is what the ExportSnapshot class does. If I could use the scanning code from ExportSnapshot then I will be able to scan the HDFS files directly and bypass the regionservers. This could potentially give me a huge boost in performance for full table scans. However, it doesn't really address the poor scan performance against a table.
Re: HBase is not running.
How weird. Admittedly I'm not terribly knowledgeable about Hadoop and all of its sub-projects, but I don't recall ever setting any networking info to something other than localhost. What would cause this? On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: querying hbase
Hi James, Thanks for joining the thread to provide more feedback and valuable information about Phoenix. I don't have a big knowledge on it, so better to see you around. The only thing I was referring is that applications I sent the links for are simple jars that you can download locally and run without requiring any specific rights to install/upload anything on any server. Just download, click on it. I might be wrong because I did not try Phoenix yet, but I think you need to upload the JAR on all the region servers first, and then restart them, right? People might not have the rights to do that. That's why I thought Pheonix was overkill regarding the need to just list a table content on a screen. JM 2013/5/22 James Taylor jtay...@salesforce.com Hey JM, Can you expand on what you mean? Phoenix is a single jar, easily deployed to any HBase cluster. It can map to existing HBase tables or create new ones. It allows you to use SQL (a fairly popular language) to query your data, and it surfaces it's functionality as a JDBC driver so that it can interop with the SQL ecosystem (which has been around for a while). Thanks, James On 05/21/2013 08:41 PM, Jean-Marc Spaggiari wrote: Using Phoenix for that is like trying to kill a mosquito with an atomic bomb, no? ;) Few easy to install and use tools which I already tried: - http://sourceforge.net/**projects/haredbhbaseclie/**files/http://sourceforge.net/projects/haredbhbaseclie/files/ - http://sourceforge.net/**projects/hbasemanagergui/http://sourceforge.net/projects/hbasemanagergui/ - https://github.com/**NiceSystems/hrider/wikihttps://github.com/NiceSystems/hrider/wiki There might be other, but those one at least are doing the basic things to look into you tables. JM 2013/5/21 lars hofhansl la...@apache.org Maybe Phoenix (http://phoenix-hbase.**blogspot.com/http://phoenix-hbase.blogspot.com/) is what you are looking for. -- Lars __**__ From: Aji Janis aji1...@gmail.com To: user user@hbase.apache.org Sent: Tuesday, May 21, 2013 3:43 PM Subject: Re: querying hbase I haven't tried that because I don't know how to. Still I think I am looking for a nice GUI interface that can take in HBase connection info and help me view the data something like pgadmin (or its php version), sql developer, etc On Tue, May 21, 2013 at 6:16 PM, Viral Bajaria viral.baja...@gmail.com wrote: The shell allows you to use filters just like the standard HBase API but with jruby syntax. Have you tried that or that is too painful and you want a simpler tool ? -Viral On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote: are there any tools out there that can help in visualizing data stored in Hbase? I know the shell lets you do basic stuff. But if I don't know what rowid I am looking for or if I want to rows with family say *name* (yes SQL like) are there any tools that can help with this? Not trying to use this on production (although that would be nice) just dev env for now. Thank you for any suggestionns
Re: HBase is not running.
It should only be a matter of network configuration and not a matter of whether you are a Hadoop expert or not. HBase is just trying to get the machine's hostname and bind to it and in your case it's given something it cannot use. It's unfortunate. IIUC your machine is hosted on cox.net? And it seems that while providing that machine they at some point set it up so that its hostname would resolve to a public address. Sounds like a misconfiguration. Anyways, you can edit your /etc/hosts so that your hostname points to 127.0.0.1 or, since you are using 0.94.7, set both hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0 in your hbase-site.xml so that it binds on the wildcard address instead. J-D On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: How weird. Admittedly I'm not terribly knowledgeable about Hadoop and all of its sub-projects, but I don't recall ever setting any networking info to something other than localhost. What would cause this? On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: HBase is not running.
Do you mean hbase.master.info.bindAddress and hbase.regionserver.info.bindAddress? I couldn't find anything else in the docs. But having said that, both are set to 0.0.0.0 by default. Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010, no web gui. On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: It should only be a matter of network configuration and not a matter of whether you are a Hadoop expert or not. HBase is just trying to get the machine's hostname and bind to it and in your case it's given something it cannot use. It's unfortunate. IIUC your machine is hosted on cox.net? And it seems that while providing that machine they at some point set it up so that its hostname would resolve to a public address. Sounds like a misconfiguration. Anyways, you can edit your /etc/hosts so that your hostname points to 127.0.0.1 or, since you are using 0.94.7, set both hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0 in your hbase-site.xml so that it binds on the wildcard address instead. J-D On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: How weird. Admittedly I'm not terribly knowledgeable about Hadoop and all of its sub-projects, but I don't recall ever setting any networking info to something other than localhost. What would cause this? On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: HBase is not running.
No, I meant hbase.master.ipc.address and hbase.regionserver.ipc.address. See https://issues.apache.org/jira/browse/HBASE-8148. J-D On Thu, May 23, 2013 at 4:34 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Do you mean hbase.master.info.bindAddress and hbase.regionserver.info.bindAddress? I couldn't find anything else in the docs. But having said that, both are set to 0.0.0.0 by default. Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010, no web gui. On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: It should only be a matter of network configuration and not a matter of whether you are a Hadoop expert or not. HBase is just trying to get the machine's hostname and bind to it and in your case it's given something it cannot use. It's unfortunate. IIUC your machine is hosted on cox.net? And it seems that while providing that machine they at some point set it up so that its hostname would resolve to a public address. Sounds like a misconfiguration. Anyways, you can edit your /etc/hosts so that your hostname points to 127.0.0.1 or, since you are using 0.94.7, set both hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0 in your hbase-site.xml so that it binds on the wildcard address instead. J-D On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: How weird. Admittedly I'm not terribly knowledgeable about Hadoop and all of its sub-projects, but I don't recall ever setting any networking info to something other than localhost. What would cause this? On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D
Re: querying hbase
Actually, with the great work you guys have been doing and the resolution of HBASE-1936 by Jimmy Xiang, we'll be able to ease the installation of Phoenix in our next release. You'll still need to bounce the regions servers to reload our custom filters and coprocessors, but you won't need to manually add the phoenix jar to the hbase classpath on each region server (as long as the installing user has permission to write into HDFS). Has there been any discussions on running the HBase server in an OSGi container? That would potentially even alleviate the need to bounce the region servers. I didn't see a JIRA, so I created this one: https://issues.apache.org/jira/browse/HBASE-8607 Thanks, James On 05/23/2013 04:17 PM, Jean-Marc Spaggiari wrote: Hi James, Thanks for joining the thread to provide more feedback and valuable information about Phoenix. I don't have a big knowledge on it, so better to see you around. The only thing I was referring is that applications I sent the links for are simple jars that you can download locally and run without requiring any specific rights to install/upload anything on any server. Just download, click on it. I might be wrong because I did not try Phoenix yet, but I think you need to upload the JAR on all the region servers first, and then restart them, right? People might not have the rights to do that. That's why I thought Pheonix was overkill regarding the need to just list a table content on a screen. JM 2013/5/22 James Taylor jtay...@salesforce.com Hey JM, Can you expand on what you mean? Phoenix is a single jar, easily deployed to any HBase cluster. It can map to existing HBase tables or create new ones. It allows you to use SQL (a fairly popular language) to query your data, and it surfaces it's functionality as a JDBC driver so that it can interop with the SQL ecosystem (which has been around for a while). Thanks, James On 05/21/2013 08:41 PM, Jean-Marc Spaggiari wrote: Using Phoenix for that is like trying to kill a mosquito with an atomic bomb, no? ;) Few easy to install and use tools which I already tried: - http://sourceforge.net/**projects/haredbhbaseclie/**files/http://sourceforge.net/projects/haredbhbaseclie/files/ - http://sourceforge.net/**projects/hbasemanagergui/http://sourceforge.net/projects/hbasemanagergui/ - https://github.com/**NiceSystems/hrider/wikihttps://github.com/NiceSystems/hrider/wiki There might be other, but those one at least are doing the basic things to look into you tables. JM 2013/5/21 lars hofhansl la...@apache.org Maybe Phoenix (http://phoenix-hbase.**blogspot.com/http://phoenix-hbase.blogspot.com/) is what you are looking for. -- Lars __**__ From: Aji Janis aji1...@gmail.com To: user user@hbase.apache.org Sent: Tuesday, May 21, 2013 3:43 PM Subject: Re: querying hbase I haven't tried that because I don't know how to. Still I think I am looking for a nice GUI interface that can take in HBase connection info and help me view the data something like pgadmin (or its php version), sql developer, etc On Tue, May 21, 2013 at 6:16 PM, Viral Bajaria viral.baja...@gmail.com wrote: The shell allows you to use filters just like the standard HBase API but with jruby syntax. Have you tried that or that is too painful and you want a simpler tool ? -Viral On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote: are there any tools out there that can help in visualizing data stored in Hbase? I know the shell lets you do basic stuff. But if I don't know what rowid I am looking for or if I want to rows with family say *name* (yes SQL like) are there any tools that can help with this? Not trying to use this on production (although that would be nice) just dev env for now. Thank you for any suggestionns
Re: HBase is not running.
Ok, didn't see that in hbase-0.94.7/docs/book.html, after doing a more thorough search, found it in here on line 293: hbase-0.94.7/docs/xref/org/apache/hadoop/hbase/master/HMaster.html I'll make the change in hbase-site.xml. On Thu, May 23, 2013 at 7:35 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: No, I meant hbase.master.ipc.address and hbase.regionserver.ipc.address. See https://issues.apache.org/jira/browse/HBASE-8148. J-D On Thu, May 23, 2013 at 4:34 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Do you mean hbase.master.info.bindAddress and hbase.regionserver.info.bindAddress? I couldn't find anything else in the docs. But having said that, both are set to 0.0.0.0 by default. Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010, no web gui. On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: It should only be a matter of network configuration and not a matter of whether you are a Hadoop expert or not. HBase is just trying to get the machine's hostname and bind to it and in your case it's given something it cannot use. It's unfortunate. IIUC your machine is hosted on cox.net? And it seems that while providing that machine they at some point set it up so that its hostname would resolve to a public address. Sounds like a misconfiguration. Anyways, you can edit your /etc/hosts so that your hostname points to 127.0.0.1 or, since you are using 0.94.7, set both hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0 in your hbase-site.xml so that it binds on the wildcard address instead. J-D On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: How weird. Admittedly I'm not terribly knowledgeable about Hadoop and all of its sub-projects, but I don't recall ever setting any networking info to something other than localhost. What would cause this? On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: That's your problem: Caused by: java.net.BindException: Problem binding to ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested address Either it's a public address and you can't bind to it or someone else is using it. J-D On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret yoursurrogate...@gmail.com wrote: Here is my dump of the sole log file in the logs directory: http://bin.cakephp.org/view/2116332048 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote: 1) Should hbase-master be changed to localhost? Maybe Try changing /etc/hosts to match the actual non loopback ip of your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :)) and make sure your /etc/hosts matches the file in my blog post, (you need hbase-master to be defined in your /etc/hosts...). hbase.master was dropped around 2009 now that we have zookeeper. So you can set it to whatever you want, it won't change anything :) 2) zookeeper parent seems bad.. Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with what you defined in zookeeper parent node. Those two are really unrelated, /hbase is the default so no need to override it, and I'm guessing that hbase.rootdir is somewhere writable so that's all good. Now, regarding the Check the value configured in 'zookeeper.znode.parent, it's triggered when the client wants to read the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist, it might be because your HBase is homed elsewhere. It could also be that HBase isn't running at all so the Master never got to create it. BTW you can start the shell with -d and it's gonna give more info and dump all the stack traces. Going by this thread I would guess that HBase isn't running so the shell won't help. Another way to check is pointing your browser to localhost:60010 and see if the master is responding. If not, time to open up the log and see what's up. J-D