Re: RS crash upon replication

2013-05-23 Thread Varun Sharma
Actually, it seems like something else was wrong here - the servers came up
just fine on trying again - so could not really reproduce the issue.

Amit: Did you try patching 8207 ?

Varun


On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.comwrote:

 That sounds like a bug for sure. Could you create a jira with logs/znode
 dump/steps to reproduce it?

 Thanks,
 himanshu


 On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com wrote:

  It seems I can reproduce this - I did a few rolling restarts and got
  screwed with NoNode exceptions - I am running 0.94.7 which has the fix
 but
  my nodes don't contain hyphens - nodes are no longer coming back up...
 
  Thanks
  Varun
 
 
  On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha hv.cs...@gmail.com
  wrote:
 
   I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't have
  it.
  
   With hyphens in the name, ReplicationSource gets confused and tried to
  set
   data in a znode which doesn't exist.
  
   Thanks,
   Himanshu
  
  
   On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com
  wrote:
  
yes, indeed - hyphens are part of the host name (annoying legacy
 stuff
  in
my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6 was
backported by Cloudera into their flavor of 0.94.2, but
the mysterious occurrence of the percent sign in zkcli (ls
   
   
  
 
 /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
might be a sign for such problem. How deep should my rmr in zkcli (an
example would be most welcomed :) be ? I have no serious problem
  running
copyTable with a time period corresponding to the outage and then to
   start
the sync back again. One question though, how did it cause a crash ?
   
   
On Thu, May 23, 2013 at 12:32 AM, Varun Sharma va...@pinterest.com
wrote:
   
 I believe there were cascading failures which got these deep nodes
 containing still to be replicated WAL(s) - I suspect there is
 either
   some
 parsing bug or something which is causing the replication source to
  not
 work - also which version are you using - does it have
 https://issues.apache.org/jira/browse/HBASE-8207 - since you use
   hyphens
 in
 our paths. One way to get back up is to delete these nodes but then
  you
 lose data in these WAL(s)...


 On Wed, May 22, 2013 at 2:22 PM, Amit Mor amit.mor.m...@gmail.com
 
wrote:

   va-p-hbase-02-d,60020,1369249862401
 
 
  On Thu, May 23, 2013 at 12:20 AM, Varun Sharma 
  va...@pinterest.com
  wrote:
 
   Basically
  
   ls /hbase/rs and what do you see for va-p-02-d ?
  
  
   On Wed, May 22, 2013 at 2:19 PM, Varun Sharma 
  va...@pinterest.com
   
  wrote:
  
Can you do ls /hbase/rs and see what you get for 02-d -
 instead
   of
   looking
in /replication/, could you look in /hbase/replication/rs - I
   want
to
  see
if the timestamps are matching or not ?
   
Varun
   
   
On Wed, May 22, 2013 at 2:17 PM, Varun Sharma 
   va...@pinterest.com

   wrote:
   
I see - so looks okay - there's just a lot of deep nesting
 in
there
 -
  if
you look into these you nodes by doing ls - you should see a
   bunch
 of
WAL(s) which still need to be replicated...
   
Varun
   
   
On Wed, May 22, 2013 at 2:16 PM, Varun Sharma 
va...@pinterest.com
   wrote:
   
2013-05-22 15:31:25,929 WARN
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
   Possibly
   transient
ZooKeeper exception:
   
 org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for *
   
  
 

   
  
 
 /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1-va-p-hbase-01-c,60020,1369042378287-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-01-c%2C60020%2C1369042378287.1369220050719
*
*
*
*01-[01-02-02]-01*
   
*Looks like a bunch of cascading failures causing this deep
  nesting...
   *
   
   
On Wed, May 22, 2013 at 2:09 PM, Amit Mor 
amit.mor.m...@gmail.com
   wrote:
   
empty return:
   
[zk: va-p-zookeeper-01-c:2181(CONNECTED) 10] ls
   
 /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1
[]
   
   
   
On Thu, May 23, 2013 at 12:05 AM, Varun Sharma 
 va...@pinterest.com
  
wrote:
   
 Do an ls not a get here and give the output ?

 ls
   /hbase/replication/rs/va-p-hbase-01-c,60020,1369249873379/1


 

Re: RS crash upon replication

2013-05-23 Thread Amit Mor
No the server came out fine just because after the crash (RS's - the
masters were still running), I immediately pulled the breaks with
stop_replication. Then I start the RS's and they came back fine (not
replicating). Once I hit 'start_replication' again they had crashed for the
second time. Eventually I deleted the heavily nested replication znodes and
the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
with Cloudera Manager Parcels thing and I'm still trying to figure out how
to replace their jars with mine in a clean and non intrusive way


On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote:

 Actually, it seems like something else was wrong here - the servers came up
 just fine on trying again - so could not really reproduce the issue.

 Amit: Did you try patching 8207 ?

 Varun


 On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com
 wrote:

  That sounds like a bug for sure. Could you create a jira with logs/znode
  dump/steps to reproduce it?
 
  Thanks,
  himanshu
 
 
  On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com
 wrote:
 
   It seems I can reproduce this - I did a few rolling restarts and got
   screwed with NoNode exceptions - I am running 0.94.7 which has the fix
  but
   my nodes don't contain hyphens - nodes are no longer coming back up...
  
   Thanks
   Varun
  
  
   On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha 
 hv.cs...@gmail.com
   wrote:
  
I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
 have
   it.
   
With hyphens in the name, ReplicationSource gets confused and tried
 to
   set
data in a znode which doesn't exist.
   
Thanks,
Himanshu
   
   
On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com
   wrote:
   
 yes, indeed - hyphens are part of the host name (annoying legacy
  stuff
   in
 my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6
 was
 backported by Cloudera into their flavor of 0.94.2, but
 the mysterious occurrence of the percent sign in zkcli (ls


   
  
 
 /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
 might be a sign for such problem. How deep should my rmr in zkcli
 (an
 example would be most welcomed :) be ? I have no serious problem
   running
 copyTable with a time period corresponding to the outage and then
 to
start
 the sync back again. One question though, how did it cause a crash
 ?


 On Thu, May 23, 2013 at 12:32 AM, Varun Sharma 
 va...@pinterest.com
 wrote:

  I believe there were cascading failures which got these deep
 nodes
  containing still to be replicated WAL(s) - I suspect there is
  either
some
  parsing bug or something which is causing the replication source
 to
   not
  work - also which version are you using - does it have
  https://issues.apache.org/jira/browse/HBASE-8207 - since you use
hyphens
  in
  our paths. One way to get back up is to delete these nodes but
 then
   you
  lose data in these WAL(s)...
 
 
  On Wed, May 22, 2013 at 2:22 PM, Amit Mor 
 amit.mor.m...@gmail.com
  
 wrote:
 
va-p-hbase-02-d,60020,1369249862401
  
  
   On Thu, May 23, 2013 at 12:20 AM, Varun Sharma 
   va...@pinterest.com
   wrote:
  
Basically
   
ls /hbase/rs and what do you see for va-p-02-d ?
   
   
On Wed, May 22, 2013 at 2:19 PM, Varun Sharma 
   va...@pinterest.com

   wrote:
   
 Can you do ls /hbase/rs and see what you get for 02-d -
  instead
of
looking
 in /replication/, could you look in /hbase/replication/rs
 - I
want
 to
   see
 if the timestamps are matching or not ?

 Varun


 On Wed, May 22, 2013 at 2:17 PM, Varun Sharma 
va...@pinterest.com
 
wrote:

 I see - so looks okay - there's just a lot of deep nesting
  in
 there
  -
   if
 you look into these you nodes by doing ls - you should
 see a
bunch
  of
 WAL(s) which still need to be replicated...

 Varun


 On Wed, May 22, 2013 at 2:16 PM, Varun Sharma 
 va...@pinterest.com
wrote:

 2013-05-22 15:31:25,929 WARN
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Possibly
transient
 ZooKeeper exception:

  org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for *

   
  
 

   
  
 
 

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Bryan Keller
I am considering scanning a snapshot instead of the table. I believe this is 
what the ExportSnapshot class does. If I could use the scanning code from 
ExportSnapshot then I will be able to scan the HDFS files directly and bypass 
the regionservers. This could potentially give me a huge boost in performance 
for full table scans. However, it doesn't really address the poor scan 
performance against a table.

On May 22, 2013, at 3:57 PM, Ted Yu yuzhih...@gmail.com wrote:

 Sandy:
 Looking at patch v6 of HBASE-8420, I think it is different from your
 approach below for the case of cache.size() == 0.
 
 Maybe log a JIRA for further discussion ?
 
 On Wed, May 22, 2013 at 3:33 PM, Sandy Pratt prat...@adobe.com wrote:
 
 It seems to be in the ballpark of what I was getting at, but I haven't
 fully digested the code yet, so I can't say for sure.
 
 Here's what I'm getting at.  Looking at
 o.a.h.h.client.ClientScanner.next() in the 94.2 source I have loaded, I
 see there are three branches with respect to the cache:
 
 public Result next() throws IOException {
 
 
  // If the scanner is closed and there's nothing left in the cache, next
 is a no-op.
  if (cache.size() == 0  this.closed) {
return null;
  }
 
  if (cache.size() == 0) {
 // Request more results from RS
  ...
  }
 
  if (cache.size()  0) {
return cache.poll();
  }
 
  ...
  return null;
 
 }
 
 
 I think that middle branch wants to change as follows (pseudo-code):
 
 if the cache size is below a certain threshold then
  initiate asynchronous action to refill it
  if there is no result to return until the cache refill completes then
block
  done
 done
 
 Or something along those lines.  I haven't grokked the patch well enough
 yet to tell if that's what it does.  What I think is happening in the
 0.94.2 code I've got is that it requests nothing until the cache is empty,
 then blocks until it's non-empty.  We want to eagerly and asynchronously
 refill the cache so that we ideally never have to block.
 
 
 Sandy
 
 
 On 5/22/13 1:39 PM, Ted Yu yuzhih...@gmail.com wrote:
 
 Sandy:
 Do you think the following JIRA would help with what you expect in this
 regard ?
 
 HBASE-8420 Port HBASE-6874 Implement prefetching for scanners from 0.89-fb
 
 Cheers
 
 On Wed, May 22, 2013 at 1:29 PM, Sandy Pratt prat...@adobe.com wrote:
 
 I found this thread on search-hadoop.com just now because I've been
 wrestling with the same issue for a while and have as yet been unable to
 solve it.  However, I think I have an idea of the problem.  My theory is
 based on assumptions about what's going on in HBase and HDFS internally,
 so please correct me if I'm wrong.
 
 Briefly, I think the issue is that sequential reads from HDFS are
 pipelined, whereas sequential reads from HBase are not.  Therefore,
 sequential reads from HDFS tend to keep the IO subsystem saturated,
 while
 sequential reads from HBase allow it to idle for a relatively large
 proportion of time.
 
 To make this more concrete, suppose that I'm reading N bytes of data
 from
 a file in HDFS.  I issue the calls to open the file and begin to read
 (from an InputStream, for example).  As I'm reading byte 1 of the stream
 at my client, the datanode is reading byte M where 1  M = N from disk.
 Thus, three activities tend to happen concurrently for the most part
 (disregarding the beginning and end of the file): 1) processing at the
 client; 2) streaming over the network from datanode to client; and 3)
 reading data from disk at the datanode.  The proportion of time these
 three activities overlap tends towards 100% as N - infinity.
 
 Now suppose I read a batch of R records from HBase (where R = whatever
 scanner caching happens to be).  As I understand it, I issue my call to
 ResultScanner.next(), and this causes the RegionServer to block as if
 on a
 page fault while it loads enough HFile blocks from disk to cover the R
 records I (implicitly) requested.  After the blocks are loaded into the
 block cache on the RS, the RS returns R records to me over the network.
 Then I process the R records locally.  When they are exhausted, this
 cycle
 repeats.  The notable upshot is that while the RS is faulting HFile
 blocks
 into the cache, my client is blocked.  Furthermore, while my client is
 processing records, the RS is idle with respect to work on behalf of my
 client.
 
 That last point is really the killer, if I'm correct in my assumptions.
 It means that Scanner caching and larger block sizes work only to
 amortize
 the fixed overhead of disk IOs and RPCs -- they do nothing to keep the
 IO
 subsystems saturated during sequential reads.  What *should* happen is
 that the RS should treat the Scanner caching value (R above) as a hint
 that it should always have ready records r + 1 to r + R when I'm reading
 record r, at least up to the region boundary.  The RS should be
 preparing
 eagerly for the next call to ResultScanner.next(), which I suspect it's
 currently not doing.
 
 Another way to state this 

Re: RS crash upon replication

2013-05-23 Thread Jean-Daniel Cryans
fwiw stop_replication is a kill switch, not a general way to start and
stop replicating, and start_replication may put you in an inconsistent
state:

hbase(main):001:0 help 'stop_replication'
Stops all the replication features. The state in which each
stream stops in is undetermined.
WARNING:
start/stop replication is only meant to be used in critical load situations.

On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote:
 No the server came out fine just because after the crash (RS's - the
 masters were still running), I immediately pulled the breaks with
 stop_replication. Then I start the RS's and they came back fine (not
 replicating). Once I hit 'start_replication' again they had crashed for the
 second time. Eventually I deleted the heavily nested replication znodes and
 the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
 with Cloudera Manager Parcels thing and I'm still trying to figure out how
 to replace their jars with mine in a clean and non intrusive way


 On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com wrote:

 Actually, it seems like something else was wrong here - the servers came up
 just fine on trying again - so could not really reproduce the issue.

 Amit: Did you try patching 8207 ?

 Varun


 On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha hv.cs...@gmail.com
 wrote:

  That sounds like a bug for sure. Could you create a jira with logs/znode
  dump/steps to reproduce it?
 
  Thanks,
  himanshu
 
 
  On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com
 wrote:
 
   It seems I can reproduce this - I did a few rolling restarts and got
   screwed with NoNode exceptions - I am running 0.94.7 which has the fix
  but
   my nodes don't contain hyphens - nodes are no longer coming back up...
  
   Thanks
   Varun
  
  
   On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha 
 hv.cs...@gmail.com
   wrote:
  
I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
 have
   it.
   
With hyphens in the name, ReplicationSource gets confused and tried
 to
   set
data in a znode which doesn't exist.
   
Thanks,
Himanshu
   
   
On Wed, May 22, 2013 at 2:42 PM, Amit Mor amit.mor.m...@gmail.com
   wrote:
   
 yes, indeed - hyphens are part of the host name (annoying legacy
  stuff
   in
 my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if 0.94.6
 was
 backported by Cloudera into their flavor of 0.94.2, but
 the mysterious occurrence of the percent sign in zkcli (ls


   
  
 
 /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
 might be a sign for such problem. How deep should my rmr in zkcli
 (an
 example would be most welcomed :) be ? I have no serious problem
   running
 copyTable with a time period corresponding to the outage and then
 to
start
 the sync back again. One question though, how did it cause a crash
 ?


 On Thu, May 23, 2013 at 12:32 AM, Varun Sharma 
 va...@pinterest.com
 wrote:

  I believe there were cascading failures which got these deep
 nodes
  containing still to be replicated WAL(s) - I suspect there is
  either
some
  parsing bug or something which is causing the replication source
 to
   not
  work - also which version are you using - does it have
  https://issues.apache.org/jira/browse/HBASE-8207 - since you use
hyphens
  in
  our paths. One way to get back up is to delete these nodes but
 then
   you
  lose data in these WAL(s)...
 
 
  On Wed, May 22, 2013 at 2:22 PM, Amit Mor 
 amit.mor.m...@gmail.com
  
 wrote:
 
va-p-hbase-02-d,60020,1369249862401
  
  
   On Thu, May 23, 2013 at 12:20 AM, Varun Sharma 
   va...@pinterest.com
   wrote:
  
Basically
   
ls /hbase/rs and what do you see for va-p-02-d ?
   
   
On Wed, May 22, 2013 at 2:19 PM, Varun Sharma 
   va...@pinterest.com

   wrote:
   
 Can you do ls /hbase/rs and see what you get for 02-d -
  instead
of
looking
 in /replication/, could you look in /hbase/replication/rs
 - I
want
 to
   see
 if the timestamps are matching or not ?

 Varun


 On Wed, May 22, 2013 at 2:17 PM, Varun Sharma 
va...@pinterest.com
 
wrote:

 I see - so looks okay - there's just a lot of deep nesting
  in
 there
  -
   if
 you look into these you nodes by doing ls - you should
 see a
bunch
  of
 WAL(s) which still need to be replicated...

 Varun


 On Wed, May 22, 2013 at 2:16 PM, Varun Sharma 
 va...@pinterest.com
wrote:

 2013-05-22 

Re: hbase region server shutdown after datanode connection exception

2013-05-23 Thread Jean-Daniel Cryans
You are looking at it the wrong way. Per
http://hbase.apache.org/book.html#trouble.general, always walk up the
log to the first exception. In this case it's a session timeout.
Whatever happens next is most probably a side effect of that.

To help debug your issue, I would suggest reading this section of the
reference guide: http://hbase.apache.org/book.html#trouble.rs.runtime

J-D

On Tue, May 21, 2013 at 7:17 PM, Cheng Su scarcer...@gmail.com wrote:
 Hi all.



  I have a small hbase cluster with 3 physical machines.

  On 192.168.1.80, there are HMaster and a region server. On 81  82,
 there is a region server on each.

  The region server on 80 can't sync HLog after a datanode access
 exception, and started to shutdown.

  The datanode itself was not shutdown and response other requests
 normally. I'll paste logs below.

  My question is:

  1. Why this exception causes region server shutdown? Can I prevent
 it?

  2. Is there any tools(shell command is best, like hadoop dfsadmin
 -report) can monitor hbase region server? to check whether it is alive or
 dead?

I have done some research that nagios/ganglia can do such things.


   But actually I just want know the region server is alive or dead, so
 they are a little over qualify.

And I'm not using CDH, so I can't use Cloudera Manager I think.



  Here are the logs.



  HBase master:
 2013-05-21 17:03:32,675 ERROR org.apache.hadoop.hbase.master.HMaster: Region
 server ^@^@hadoop01,60020,1368774173179 reported a fatal error:

 ABORTING region server hadoop01,60020,1368774173179:
 regionserver:60020-0x3eb14c67540002 regionserver:60020-0x3eb14c67540002
 received expired from ZooKeeper, aborting

 Cause:

 org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired

 at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeper
 Watcher.java:369)

 at
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.
 java:266)

 at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521
 )

 at
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)



  Region Server:

 2013-05-21 17:00:16,895 INFO org.apache.zookeeper.ClientCnxn: Client session
 timed out, have not heard from server in 12ms for sessionid
 0x3eb14c67540002, closing socket connection and attempting re

 connect

 2013-05-21 17:00:35,896 INFO org.apache.zookeeper.ClientCnxn: Client session
 timed out, have not heard from server in 12ms for sessionid
 0x13eb14ca4bb, closing socket connection and attempting r

 econnect

 2013-05-21 17:03:31,498 WARN org.apache.hadoop.hdfs.DFSClient:
 DFSOutputStream ResponseProcessor exception  for block
 blk_9188414668950016309_4925046java.net.SocketTimeoutException: 63000 millis
 timeout

  while waiting for channel to be ready for read. ch :
 java.nio.channels.SocketChannel[connected local=/192.168.1.80:57020
 remote=/192.168.1.82:50010]

 at
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)

 at
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)

 at
 org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)

 at java.io.DataInputStream.readFully(DataInputStream.java:178)

 at java.io.DataInputStream.readLong(DataInputStream.java:399)

 at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.
 readFields(DataTransferProtocol.java:124)

 at
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSCl
 ient.java:2784)



 2013-05-21 17:03:31,520 WARN org.apache.hadoop.hdfs.DFSClient: Error
 Recovery for block blk_9188414668950016309_4925046 bad datanode[0]
 192.168.1.82:50010

 2013-05-21 17:03:32,315 INFO org.apache.zookeeper.ClientCnxn: Opening socket
 connection to server /192.168.1.82:2100

 2013-05-21 17:03:32,316 INFO org.apache.zookeeper.ClientCnxn: Socket
 connection established to hadoop03/192.168.1.82:2100, initiating session

 2013-05-21 17:03:32,317 INFO org.apache.zookeeper.ClientCnxn: Session
 establishment complete on server hadoop03/192.168.1.82:2100, sessionid =
 0x13eb14ca4bb, negotiated timeout = 18

 2013-05-21 17:03:32,497 FATAL org.apache.hadoop.hbase.regionserver.wal.HLog:
 Could not sync. Requesting close of hlog

 java.io.IOException: Reflection

 at
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(Sequence
 FileLogWriter.java:230)

 at
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1091)

 at
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)

 at org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.
 java:1057)

 at java.lang.Thread.run(Thread.java:662)

 Caused by: java.lang.reflect.InvocationTargetException

  

Re: RS crash upon replication

2013-05-23 Thread Varun Sharma
But wouldn't a copy table b/w timestamps bring you back since the mutations
are all timestamp based we should okay ? Basically doing a copy table which
supersedes the downtime interval ?


On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 fwiw stop_replication is a kill switch, not a general way to start and
 stop replicating, and start_replication may put you in an inconsistent
 state:

 hbase(main):001:0 help 'stop_replication'
 Stops all the replication features. The state in which each
 stream stops in is undetermined.
 WARNING:
 start/stop replication is only meant to be used in critical load
 situations.

 On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote:
  No the server came out fine just because after the crash (RS's - the
  masters were still running), I immediately pulled the breaks with
  stop_replication. Then I start the RS's and they came back fine (not
  replicating). Once I hit 'start_replication' again they had crashed for
 the
  second time. Eventually I deleted the heavily nested replication znodes
 and
  the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
  with Cloudera Manager Parcels thing and I'm still trying to figure out
 how
  to replace their jars with mine in a clean and non intrusive way
 
 
  On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com
 wrote:
 
  Actually, it seems like something else was wrong here - the servers
 came up
  just fine on trying again - so could not really reproduce the issue.
 
  Amit: Did you try patching 8207 ?
 
  Varun
 
 
  On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha 
 hv.cs...@gmail.com
  wrote:
 
   That sounds like a bug for sure. Could you create a jira with
 logs/znode
   dump/steps to reproduce it?
  
   Thanks,
   himanshu
  
  
   On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com
  wrote:
  
It seems I can reproduce this - I did a few rolling restarts and got
screwed with NoNode exceptions - I am running 0.94.7 which has the
 fix
   but
my nodes don't contain hyphens - nodes are no longer coming back
 up...
   
Thanks
Varun
   
   
On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha 
  hv.cs...@gmail.com
wrote:
   
 I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
  have
it.

 With hyphens in the name, ReplicationSource gets confused and
 tried
  to
set
 data in a znode which doesn't exist.

 Thanks,
 Himanshu


 On Wed, May 22, 2013 at 2:42 PM, Amit Mor 
 amit.mor.m...@gmail.com
wrote:

  yes, indeed - hyphens are part of the host name (annoying legacy
   stuff
in
  my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if
 0.94.6
  was
  backported by Cloudera into their flavor of 0.94.2, but
  the mysterious occurrence of the percent sign in zkcli (ls
 
 

   
  
 
 /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
  might be a sign for such problem. How deep should my rmr in
 zkcli
  (an
  example would be most welcomed :) be ? I have no serious problem
running
  copyTable with a time period corresponding to the outage and
 then
  to
 start
  the sync back again. One question though, how did it cause a
 crash
  ?
 
 
  On Thu, May 23, 2013 at 12:32 AM, Varun Sharma 
  va...@pinterest.com
  wrote:
 
   I believe there were cascading failures which got these deep
  nodes
   containing still to be replicated WAL(s) - I suspect there is
   either
 some
   parsing bug or something which is causing the replication
 source
  to
not
   work - also which version are you using - does it have
   https://issues.apache.org/jira/browse/HBASE-8207 - since you
 use
 hyphens
   in
   our paths. One way to get back up is to delete these nodes but
  then
you
   lose data in these WAL(s)...
  
  
   On Wed, May 22, 2013 at 2:22 PM, Amit Mor 
  amit.mor.m...@gmail.com
   
  wrote:
  
 va-p-hbase-02-d,60020,1369249862401
   
   
On Thu, May 23, 2013 at 12:20 AM, Varun Sharma 
va...@pinterest.com
wrote:
   
 Basically

 ls /hbase/rs and what do you see for va-p-02-d ?


 On Wed, May 22, 2013 at 2:19 PM, Varun Sharma 
va...@pinterest.com
 
wrote:

  Can you do ls /hbase/rs and see what you get for 02-d -
   instead
 of
 looking
  in /replication/, could you look in
 /hbase/replication/rs
  - I
 want
  to
see
  if the timestamps are matching or not ?
 
  Varun
 
 
  On Wed, May 22, 2013 at 2:17 PM, Varun Sharma 
 

Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Jay, I was looking at your blog and I noticed these entries for Zookeeper:

hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3

Since I'm doing everything on my laptop, all that I would need to do is just
put localhost in that location, yes?


On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote:

 Yves, im going through the same issues, very tricky first time around.  Use
 the stable version.  Try these 3 tips : I just finished getting it working
 and wrote it up in distributed mode on a cluster of VMs.

 http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html




 On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  How weird.  When I start up hbase using start-hbase.sh and then
  check to make sure that the process is running, I don't see anything
  with JPS.  However, when I run stop-hbase.sh and then check my
  java processes, I *DO* see a Java process running in JPS... why?
 
 
  On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret 
  yoursurrogate...@gmail.com
   wrote:
 
   Still stuck on this.
  
   I did something different, I tried version 0.92.2.  This is the log
 from
   that older version.
   http://bin.cakephp.org/view/617939270
  
   The other weird thing that I noticed with version on 0.92.2 is this.  I
   started
   HBase and this is the output that I got:
   $ $HBASE_HOME/bin/start-hbase.sh
   starting master, logging to
  
 
 /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out
  
   (pardon the annoyingly long file name)
  
   Now, when I run JPS, this is what I see:
   $ $JAVA_HOME/bin/jps
   14038 RunJar
   2059 Jps
  
   HBase... is not running?  What the heck?
  
  
   On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com
  wrote:
  
   Is this the only thing which appears on your screen?Could you please
  show
   me your config files?
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
This is what happens when I start hbase.
   
$ bin/start-hbase.sh
starting master, logging to
   
   
  
 
 /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out
   
No real problems or error or warnings... but it does not have the
 same
output that you have in your blog... perhaps that's an issue?
   
   
On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq donta...@gmail.com
 
wrote:
   
 No issues.

 Are the HBase daemons running fine?Are you able to initiate
 anything
   from
 the shell?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Hi, sorry, I thought I had more info than what was displayed in
 my
  e-mail a little earlier today that had the little list, but I
 did
   not.
 
  My biggest hangup is getting that Web GUI to work.  That's
 really
  where I'm stuck.  I followed your tutorial on the cloudfront
 blog
   and
  when it came time to go to http://localhost:60010, I did not
 see
  an
  output.
 
 
  On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq 
  donta...@gmail.com
   
  wrote:
 
   sure..
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
Hello,
   
No, still having issues.  I'll give you some more details
 in a
 second.
   
   
On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq 
donta...@gmail.com
wrote:
   
 Hey Yves,

 I am sorry for being unresponsive. I was
 travelling
   and
was
  out
of
 reach. What's the current status?Are you good now?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Tue, May 21, 2013 at 11:15 PM, Asaf Mesika 
 asaf.mes...@gmail.com
  
 wrote:

  Yes.
 
  On May 21, 2013, at 8:32 PM, Yves S. Garret 
 yoursurrogate...@gmail.com
  wrote:
 
   Do you mean this?
  
   http://blog.devving.com/hbase-quickstart-guide/
  
  
   On Tue, May 21, 2013 at 1:29 PM, Asaf Mesika 
   asaf.mes...@gmail.com
  wrote:
  
   Devving.com has a good tutorial on HBase first setup
  
   On Tuesday, May 21, 2013, Yves S. Garret wrote:
  
   Hi Mohammad,
  
   I was following your tutorial and when I got to the
  part
when
  you
do
   $ bin/start-hbase.sh, this is what I get:
   http://bin.cakephp.org/view/428090088
  
   I'll keep looking online for 

Re: Risk about RS logs clean ?

2013-05-23 Thread Sergey Shelukhin
IIRC the version in previous branches should have an epic lock somewhere
(cacheFlushLock or something like that) that should make this map
manipulations safe also.

On Wed, May 22, 2013 at 6:27 PM, Bing Jiang jiangbinglo...@gmail.comwrote:

 Hi,Sergey.
 The version of hbase in our environment is 0.94.3, and the FSHLog.java
 comes from 0.95 or version above.
 And it adds such codes in FSHLog::cleanOldLogs,
  long oldestOutstandingSeqNum = Long.MAX_VALUE;
 synchronized (oldestSeqNumsLock) {
   Long oldestFlushing = (oldestFlushingSeqNums.size()  0)
 ? Collections.min(oldestFlushingSeqNums.values()) : Long.MAX_VALUE
 ;
   Long oldestUnflushed = (oldestUnflushedSeqNums.size()  0)
 ? Collections.min(oldestUnflushedSeqNums.values()) : Long.
 MAX_VALUE;
   oldestOutstandingSeqNum = Math.min(oldestFlushing, oldestUnflushed);
 }

 Which is different from the function from 0.94.3.

  private byte [][] cleanOldLogs() throws IOException {
   Long oldestOutstandingSeqNum = getOldestOutstandingSeqNum();
   ...
   }
  private Long getOldestOutstandingSeqNum() {
 return Collections.min(this.lastSeqWritten.values());
   }

 And I think the version in trunk is safe.

 Thanks for Sergey.


 2013/5/23 Sergey Shelukhin ser...@hortonworks.com

 FSHLog (in trunk) stores the earliest seqnums for each region in current
 memstore, and earliest flushing seqnum (see
 FSHLog::start/complete/abortCacheFlush). When logs are deleted the logs
 with seqnums that are above the earliest flushing/flushed seqnum for any
 region are not deleted (see FSHLog::cleanOldLogs).

 On Wed, May 22, 2013 at 5:39 AM, Bing Jiang jiangbinglo...@gmail.com
 wrote:

  Hi,all
  I want to know how RS eliminates the unnecessary hlogs.
  lastSeqNum stores RegionName, latest KV Seq id
  and
  outputfiles stores last Seq id before new hlog file, file path
 
  So, how does rs guarantee that the kv in the hlog to be cleared  have
 been
  already flushed from memstore into hfile.
  I have try to read source code to make sense, however, I am not sure
  whether it is a source of the risk of data loss.
 
  Thanks.
  --
  Bing Jiang
  Tel:(86)134-2619-1361
  weibo: http://weibo.com/jiangbinglover
  BLOG: http://blog.sina.com.cn/jiangbinglover
  National Research Center for Intelligent Computing Systems
  Institute of Computing technology
  Graduate University of Chinese Academy of Science
 




 --
 Bing Jiang
 Tel:(86)134-2619-1361
 weibo: http://weibo.com/jiangbinglover
 BLOG: http://blog.sina.com.cn/jiangbinglover
 National Research Center for Intelligent Computing Systems
 Institute of Computing technology
 Graduate University of Chinese Academy of Science



Re: RS crash upon replication

2013-05-23 Thread Amit Mor
Thanks for the helpful comments. I would certainly dig deeper now that 
everything has stabilized. Regarding J-D's comment - once my slave cluster was 
started, after about 4 hours of downtime (it's for offline stuff), at the very 
moment it came back online, 5 RS of my master-replication cluster crashed. 
Since I had no time figuring out what went wrong with the replication I 
submitted the 'stop_replication' knowing that's a last resort,since I had to 
get those production RS's online asap. I think renaming that cmd to something 
like 'abort_replication' would be more fitting. On the other hand, 
remove_peer(1) at a time of crisis feels like a developer's solution to a 
DBA's problem ;) 
Regarding copyTable, it's all good and well, but one has to consider that I'm 
on ec2 and the cluster is already streched out by 'online' service requests and 
copyTable would hit it's resources quite badly. I'll be glad to update. 
Thanks again,
Amit

 Original message 
From: Varun Sharma va...@pinterest.com 
Date:  
To: user@hbase.apache.org 
Subject: Re: RS crash upon replication 
 
But wouldn't a copy table b/w timestamps bring you back since the mutations
are all timestamp based we should okay ? Basically doing a copy table which
supersedes the downtime interval ?


On Thu, May 23, 2013 at 9:48 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 fwiw stop_replication is a kill switch, not a general way to start and
 stop replicating, and start_replication may put you in an inconsistent
 state:

 hbase(main):001:0 help 'stop_replication'
 Stops all the replication features. The state in which each
 stream stops in is undetermined.
 WARNING:
 start/stop replication is only meant to be used in critical load
 situations.

 On Thu, May 23, 2013 at 1:17 AM, Amit Mor amit.mor.m...@gmail.com wrote:
  No the server came out fine just because after the crash (RS's - the
  masters were still running), I immediately pulled the breaks with
  stop_replication. Then I start the RS's and they came back fine (not
  replicating). Once I hit 'start_replication' again they had crashed for
 the
  second time. Eventually I deleted the heavily nested replication znodes
 and
  the 'start_replication' succeeded. I didn't patch 8207 because I'm on CDH
  with Cloudera Manager Parcels thing and I'm still trying to figure out
 how
  to replace their jars with mine in a clean and non intrusive way
 
 
  On Thu, May 23, 2013 at 10:33 AM, Varun Sharma va...@pinterest.com
 wrote:
 
  Actually, it seems like something else was wrong here - the servers
 came up
  just fine on trying again - so could not really reproduce the issue.
 
  Amit: Did you try patching 8207 ?
 
  Varun
 
 
  On Wed, May 22, 2013 at 5:40 PM, Himanshu Vashishtha 
 hv.cs...@gmail.com
  wrote:
 
   That sounds like a bug for sure. Could you create a jira with
 logs/znode
   dump/steps to reproduce it?
  
   Thanks,
   himanshu
  
  
   On Wed, May 22, 2013 at 5:01 PM, Varun Sharma va...@pinterest.com
  wrote:
  
It seems I can reproduce this - I did a few rolling restarts and got
screwed with NoNode exceptions - I am running 0.94.7 which has the
 fix
   but
my nodes don't contain hyphens - nodes are no longer coming back
 up...
   
Thanks
Varun
   
   
On Wed, May 22, 2013 at 3:02 PM, Himanshu Vashishtha 
  hv.cs...@gmail.com
wrote:
   
 I'd suggest to please patch the code with 8207;  cdh4.2.1 doesn't
  have
it.

 With hyphens in the name, ReplicationSource gets confused and
 tried
  to
set
 data in a znode which doesn't exist.

 Thanks,
 Himanshu


 On Wed, May 22, 2013 at 2:42 PM, Amit Mor 
 amit.mor.m...@gmail.com
wrote:

  yes, indeed - hyphens are part of the host name (annoying legacy
   stuff
in
  my company). It's hbase-0.94.2-cdh4.2.1. I have no idea if
 0.94.6
  was
  backported by Cloudera into their flavor of 0.94.2, but
  the mysterious occurrence of the percent sign in zkcli (ls
 
 

   
  
 
 /hbase/replication/rs/va-p-hbase-02-d,60020,1369249862401/1-va-p-hbase-02-e,60020,1369042377129-va-p-hbase-02-c,60020,1369042377731-va-p-hbase-02-d,60020,1369233252475/va-p-hbase-02-e%2C60020%2C1369042377129.1369227474895)
  might be a sign for such problem. How deep should my rmr in
 zkcli
  (an
  example would be most welcomed :) be ? I have no serious problem
running
  copyTable with a time period corresponding to the outage and
 then
  to
 start
  the sync back again. One question though, how did it cause a
 crash
  ?
 
 
  On Thu, May 23, 2013 at 12:32 AM, Varun Sharma 
  va...@pinterest.com
  wrote:
 
   I believe there were cascading failures which got these deep
  nodes
   containing still to be replicated WAL(s) - I suspect there is
   either
 some
   parsing bug or something which is causing the replication
 source
  to
not
   work - also which version are you using - 

Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Progress!  :)

Now, I'm getting this error :)

13/05/23 13:29:00 ERROR
client.HConnectionManager$HConnectionImplementation: Check the
value configured in 'zookeeper.znode.parent'. There could be a
mismatch with the one configured in the master.


On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret
yoursurrogate...@gmail.comwrote:

 Jay, I was looking at your blog and I noticed these entries for Zookeeper:

 hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3

 Since I'm doing everything on my laptop, all that I would need to do is
 just
 put localhost in that location, yes?


 On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote:

 Yves, im going through the same issues, very tricky first time around.
  Use
 the stable version.  Try these 3 tips : I just finished getting it working
 and wrote it up in distributed mode on a cluster of VMs.

 http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html




 On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  How weird.  When I start up hbase using start-hbase.sh and then
  check to make sure that the process is running, I don't see anything
  with JPS.  However, when I run stop-hbase.sh and then check my
  java processes, I *DO* see a Java process running in JPS... why?
 
 
  On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret 
  yoursurrogate...@gmail.com
   wrote:
 
   Still stuck on this.
  
   I did something different, I tried version 0.92.2.  This is the log
 from
   that older version.
   http://bin.cakephp.org/view/617939270
  
   The other weird thing that I noticed with version on 0.92.2 is this.
  I
   started
   HBase and this is the output that I got:
   $ $HBASE_HOME/bin/start-hbase.sh
   starting master, logging to
  
 
 /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out
  
   (pardon the annoyingly long file name)
  
   Now, when I run JPS, this is what I see:
   $ $JAVA_HOME/bin/jps
   14038 RunJar
   2059 Jps
  
   HBase... is not running?  What the heck?
  
  
   On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com
  wrote:
  
   Is this the only thing which appears on your screen?Could you please
  show
   me your config files?
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
This is what happens when I start hbase.
   
$ bin/start-hbase.sh
starting master, logging to
   
   
  
 
 /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out
   
No real problems or error or warnings... but it does not have the
 same
output that you have in your blog... perhaps that's an issue?
   
   
On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq 
 donta...@gmail.com
wrote:
   
 No issues.

 Are the HBase daemons running fine?Are you able to initiate
 anything
   from
 the shell?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Hi, sorry, I thought I had more info than what was displayed
 in my
  e-mail a little earlier today that had the little list, but I
 did
   not.
 
  My biggest hangup is getting that Web GUI to work.  That's
 really
  where I'm stuck.  I followed your tutorial on the cloudfront
 blog
   and
  when it came time to go to http://localhost:60010, I did not
 see
  an
  output.
 
 
  On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq 
  donta...@gmail.com
   
  wrote:
 
   sure..
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
Hello,
   
No, still having issues.  I'll give you some more details
 in a
 second.
   
   
On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq 
donta...@gmail.com
wrote:
   
 Hey Yves,

 I am sorry for being unresponsive. I was
 travelling
   and
was
  out
of
 reach. What's the current status?Are you good now?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Tue, May 21, 2013 at 11:15 PM, Asaf Mesika 
 asaf.mes...@gmail.com
  
 wrote:

  Yes.
 
  On May 21, 2013, at 8:32 PM, Yves S. Garret 
 yoursurrogate...@gmail.com
  wrote:
 
   Do you mean this?
  
   http://blog.devving.com/hbase-quickstart-guide/
  
  
   On Tue, May 21, 2013 at 1:29 PM, Asaf Mesika 
   asaf.mes...@gmail.com
  wrote:
  
   Devving.com has a good tutorial on HBase first setup
  
   On Tuesday, 

Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Hi all, some more info.  The GUI does not work, even though
Zookeeper is working.  When I start the shell and enter list, this
is what I get:
http://bin.cakephp.org/view/518131554

This is my hbase-site.xml file:
http://bin.cakephp.org/view/7632959

For zookeeper.znode.parent, I had create an /hbase (has regular user
privs, not root).  But, honestly, I'm not sure what the value should be
for that property.  Could someone provide some input on what this is
supposed to be?  I read the docs, but did not understand what they
were saying Root ZNode for HBase in Zookeeper.


On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret
yoursurrogate...@gmail.comwrote:

 Progress!  :)

 Now, I'm getting this error :)

 13/05/23 13:29:00 ERROR
 client.HConnectionManager$HConnectionImplementation: Check the
 value configured in 'zookeeper.znode.parent'. There could be a
 mismatch with the one configured in the master.


 On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret 
 yoursurrogate...@gmail.com wrote:

 Jay, I was looking at your blog and I noticed these entries for Zookeeper:

 hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3

 Since I'm doing everything on my laptop, all that I would need to do is
 just
 put localhost in that location, yes?


 On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote:

 Yves, im going through the same issues, very tricky first time around.
  Use
 the stable version.  Try these 3 tips : I just finished getting it
 working
 and wrote it up in distributed mode on a cluster of VMs.

 http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html




 On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  How weird.  When I start up hbase using start-hbase.sh and then
  check to make sure that the process is running, I don't see anything
  with JPS.  However, when I run stop-hbase.sh and then check my
  java processes, I *DO* see a Java process running in JPS... why?
 
 
  On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret 
  yoursurrogate...@gmail.com
   wrote:
 
   Still stuck on this.
  
   I did something different, I tried version 0.92.2.  This is the log
 from
   that older version.
   http://bin.cakephp.org/view/617939270
  
   The other weird thing that I noticed with version on 0.92.2 is this.
  I
   started
   HBase and this is the output that I got:
   $ $HBASE_HOME/bin/start-hbase.sh
   starting master, logging to
  
 
 /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out
  
   (pardon the annoyingly long file name)
  
   Now, when I run JPS, this is what I see:
   $ $JAVA_HOME/bin/jps
   14038 RunJar
   2059 Jps
  
   HBase... is not running?  What the heck?
  
  
   On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com
  wrote:
  
   Is this the only thing which appears on your screen?Could you please
  show
   me your config files?
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
This is what happens when I start hbase.
   
$ bin/start-hbase.sh
starting master, logging to
   
   
  
 
 /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out
   
No real problems or error or warnings... but it does not have the
 same
output that you have in your blog... perhaps that's an issue?
   
   
On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq 
 donta...@gmail.com
wrote:
   
 No issues.

 Are the HBase daemons running fine?Are you able to initiate
 anything
   from
 the shell?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Hi, sorry, I thought I had more info than what was displayed
 in my
  e-mail a little earlier today that had the little list, but I
 did
   not.
 
  My biggest hangup is getting that Web GUI to work.  That's
 really
  where I'm stuck.  I followed your tutorial on the cloudfront
 blog
   and
  when it came time to go to http://localhost:60010, I did not
 see
  an
  output.
 
 
  On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq 
  donta...@gmail.com
   
  wrote:
 
   sure..
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
Hello,
   
No, still having issues.  I'll give you some more details
 in a
 second.
   
   
On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq 
donta...@gmail.com
wrote:
   
 Hey Yves,

 I am sorry for being unresponsive. I was
 travelling
   and
was
  out
of
 reach. What's the current status?Are you good now?

 

Re: HBase is not running.

2013-05-23 Thread Jay Vyas
Hi Yves !

Okay... So - this is probably related to your last question.

The zookeeper nodes should 1-1 match the /etc/hosts ...
How many VMs do you have?
Or are you just running in local mode with one jvm per HBASE process?

I think that, if the latter, having localhost in the zookeeper xml will
work.







On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret
yoursurrogate...@gmail.comwrote:

 Progress!  :)

 Now, I'm getting this error :)

 13/05/23 13:29:00 ERROR
 client.HConnectionManager$HConnectionImplementation: Check the
 value configured in 'zookeeper.znode.parent'. There could be a
 mismatch with the one configured in the master.


 On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Jay, I was looking at your blog and I noticed these entries for
 Zookeeper:
 
  hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3
 
  Since I'm doing everything on my laptop, all that I would need to do is
  just
  put localhost in that location, yes?
 
 
  On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com wrote:
 
  Yves, im going through the same issues, very tricky first time around.
   Use
  the stable version.  Try these 3 tips : I just finished getting it
 working
  and wrote it up in distributed mode on a cluster of VMs.
 
 
 http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html
 
 
 
 
  On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret
  yoursurrogate...@gmail.comwrote:
 
   How weird.  When I start up hbase using start-hbase.sh and then
   check to make sure that the process is running, I don't see anything
   with JPS.  However, when I run stop-hbase.sh and then check my
   java processes, I *DO* see a Java process running in JPS... why?
  
  
   On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret 
   yoursurrogate...@gmail.com
wrote:
  
Still stuck on this.
   
I did something different, I tried version 0.92.2.  This is the log
  from
that older version.
http://bin.cakephp.org/view/617939270
   
The other weird thing that I noticed with version on 0.92.2 is this.
   I
started
HBase and this is the output that I got:
$ $HBASE_HOME/bin/start-hbase.sh
starting master, logging to
   
  
 
 /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out
   
(pardon the annoyingly long file name)
   
Now, when I run JPS, this is what I see:
$ $JAVA_HOME/bin/jps
14038 RunJar
2059 Jps
   
HBase... is not running?  What the heck?
   
   
On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq donta...@gmail.com
   wrote:
   
Is this the only thing which appears on your screen?Could you
 please
   show
me your config files?
   
Warm Regards,
Tariq
cloudfront.blogspot.com
   
   
On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret
yoursurrogate...@gmail.comwrote:
   
 This is what happens when I start hbase.

 $ bin/start-hbase.sh
 starting master, logging to


   
  
 
 /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out

 No real problems or error or warnings... but it does not have the
  same
 output that you have in your blog... perhaps that's an issue?


 On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq 
  donta...@gmail.com
 wrote:

  No issues.
 
  Are the HBase daemons running fine?Are you able to initiate
  anything
from
  the shell?
 
  Warm Regards,
  Tariq
  cloudfront.blogspot.com
 
 
  On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret
  yoursurrogate...@gmail.comwrote:
 
   Hi, sorry, I thought I had more info than what was displayed
  in my
   e-mail a little earlier today that had the little list, but I
  did
not.
  
   My biggest hangup is getting that Web GUI to work.  That's
  really
   where I'm stuck.  I followed your tutorial on the cloudfront
  blog
and
   when it came time to go to http://localhost:60010, I did not
  see
   an
   output.
  
  
   On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq 
   donta...@gmail.com

   wrote:
  
sure..
   
Warm Regards,
Tariq
cloudfront.blogspot.com
   
   
On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret
yoursurrogate...@gmail.comwrote:
   
 Hello,

 No, still having issues.  I'll give you some more details
  in a
  second.


 On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq 
 donta...@gmail.com
 wrote:

  Hey Yves,
 
  I am sorry for being unresponsive. I was
  travelling
and
 was
   out
 of
  reach. What's the current status?Are you good now?
 
  Warm Regards,
  Tariq
  cloudfront.blogspot.com
 
 
  On 

Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Yup, just a single VM on one machine.  /etc/hosts is currently this:
127.0.0.1   localhost


On Thu, May 23, 2013 at 2:47 PM, Jay Vyas jayunit...@gmail.com wrote:

 Hi Yves !

 Okay... So - this is probably related to your last question.

 The zookeeper nodes should 1-1 match the /etc/hosts ...
 How many VMs do you have?
 Or are you just running in local mode with one jvm per HBASE process?

 I think that, if the latter, having localhost in the zookeeper xml will
 work.







 On Thu, May 23, 2013 at 1:48 PM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Progress!  :)
 
  Now, I'm getting this error :)
 
  13/05/23 13:29:00 ERROR
  client.HConnectionManager$HConnectionImplementation: Check the
  value configured in 'zookeeper.znode.parent'. There could be a
  mismatch with the one configured in the master.
 
 
  On Thu, May 23, 2013 at 1:15 PM, Yves S. Garret
  yoursurrogate...@gmail.comwrote:
 
   Jay, I was looking at your blog and I noticed these entries for
  Zookeeper:
  
  
 hbase-master,hbase-regionserver1,hbase-regionserver2,hbase-regionserver3
  
   Since I'm doing everything on my laptop, all that I would need to do is
   just
   put localhost in that location, yes?
  
  
   On Wed, May 22, 2013 at 1:44 PM, Jay Vyas jayunit...@gmail.com
 wrote:
  
   Yves, im going through the same issues, very tricky first time around.
Use
   the stable version.  Try these 3 tips : I just finished getting it
  working
   and wrote it up in distributed mode on a cluster of VMs.
  
  
  http://jayunit100.blogspot.com/2013/05/debugging-hbase-installation.html
  
  
  
  
   On Wed, May 22, 2013 at 1:02 PM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
How weird.  When I start up hbase using start-hbase.sh and then
check to make sure that the process is running, I don't see anything
with JPS.  However, when I run stop-hbase.sh and then check my
java processes, I *DO* see a Java process running in JPS... why?
   
   
On Wed, May 22, 2013 at 11:28 AM, Yves S. Garret 
yoursurrogate...@gmail.com
 wrote:
   
 Still stuck on this.

 I did something different, I tried version 0.92.2.  This is the
 log
   from
 that older version.
 http://bin.cakephp.org/view/617939270

 The other weird thing that I noticed with version on 0.92.2 is
 this.
I
 started
 HBase and this is the output that I got:
 $ $HBASE_HOME/bin/start-hbase.sh
 starting master, logging to

   
  
 
 /media/alternative-storage-do-not-touch/hbase-install/hbase-0.92.2/logs/hbase-ysg-master-ysg.connect.out

 (pardon the annoyingly long file name)

 Now, when I run JPS, this is what I see:
 $ $JAVA_HOME/bin/jps
 14038 RunJar
 2059 Jps

 HBase... is not running?  What the heck?


 On Tue, May 21, 2013 at 5:25 PM, Mohammad Tariq 
 donta...@gmail.com
wrote:

 Is this the only thing which appears on your screen?Could you
  please
show
 me your config files?

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Wed, May 22, 2013 at 2:49 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  This is what happens when I start hbase.
 
  $ bin/start-hbase.sh
  starting master, logging to
 
 

   
  
 
 /media/alternative-storage-do-not-touch/hbase-0.94.7/logs/hbase-ysg-master-ysg.connect.out
 
  No real problems or error or warnings... but it does not have
 the
   same
  output that you have in your blog... perhaps that's an issue?
 
 
  On Tue, May 21, 2013 at 5:14 PM, Mohammad Tariq 
   donta...@gmail.com
  wrote:
 
   No issues.
  
   Are the HBase daemons running fine?Are you able to initiate
   anything
 from
   the shell?
  
   Warm Regards,
   Tariq
   cloudfront.blogspot.com
  
  
   On Wed, May 22, 2013 at 2:37 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
Hi, sorry, I thought I had more info than what was
 displayed
   in my
e-mail a little earlier today that had the little list,
 but I
   did
 not.
   
My biggest hangup is getting that Web GUI to work.  That's
   really
where I'm stuck.  I followed your tutorial on the
 cloudfront
   blog
 and
when it came time to go to http://localhost:60010, I did
 not
   see
an
output.
   
   
On Tue, May 21, 2013 at 4:21 PM, Mohammad Tariq 
donta...@gmail.com
 
wrote:
   
 sure..

 Warm Regards,
 Tariq
 cloudfront.blogspot.com


 On Wed, May 22, 2013 at 1:48 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:

  Hello,
 
  No, still having issues.  I'll give you some more
 details
   in a
   second.
 
 
  On Tue, May 21, 2013 at 4:07 PM, Mohammad Tariq 
  

Re: HBase is not running.

2013-05-23 Thread Jay Vyas
I think this is simply an issue, then, that /hbase has the wrong privileges
or is inaccessible .

(1) Are you sure hdfs or whatever filesystem you are using is running and
(2) has the /hbase directory in it with
(3) liberal enough permissions?


Re: RS crash upon replication

2013-05-23 Thread Amit Mor
I have pasted most of the RS's logs just prior to their FATAL and
including. Would be very thankful if someone can take a look:
http://pastebin.com/qFzycXNS . Interestingly, some RS's experience an
IOException for not finding an .oldlogs/ file. The rest get
KeeperException$NoNodeException
w/o the IOE.

Thanks


Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
This is how hbase is looking from /
drwxr-xr-x.   2 user  user   4096 May 23 14:37 hbase

But, now that you mention it, would I need to have hdfs running as well?


On Thu, May 23, 2013 at 2:57 PM, Jay Vyas jayunit...@gmail.com wrote:

 I think this is simply an issue, then, that /hbase has the wrong privileges
 or is inaccessible .

 (1) Are you sure hdfs or whatever filesystem you are using is running and
 (2) has the /hbase directory in it with
 (3) liberal enough permissions?



Re: HBase is not running.

2013-05-23 Thread Jay Vyas
depends on how you define hbase root in your hbase-site.xml ?

Can you paste it here


Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Here is the entire file:
http://bin.cakephp.org/view/1159889135


On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote:

 depends on how you define hbase root in your hbase-site.xml ?

 Can you paste it here



Re: HBase is not running.

2013-05-23 Thread Jay Vyas
1) Should hbase-master be changed to localhost?

Maybe Try changing /etc/hosts to match the actual non loopback ip of your 
machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :))
 and make sure your /etc/hosts matches the file in my blog post, (you need 
hbase-master to be defined in your /etc/hosts...).

2) zookeeper parent seems bad.. 

Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent with 
what you defined in zookeeper parent node.

3) killall -9 java and restart start-hbase.sh



On May 23, 2013, at 5:31 PM, Yves S. Garret yoursurrogate...@gmail.com 
wrote:

 Here is the entire file:
 http://bin.cakephp.org/view/1159889135
 
 
 On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote:
 
 depends on how you define hbase root in your hbase-site.xml ?
 
 Can you paste it here
 


Re: HBase is not running.

2013-05-23 Thread Jean-Daniel Cryans
On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote:
 1) Should hbase-master be changed to localhost?

 Maybe Try changing /etc/hosts to match the actual non loopback ip of your 
 machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :))
  and make sure your /etc/hosts matches the file in my blog post, (you need 
 hbase-master to be defined in your /etc/hosts...).

hbase.master was dropped around 2009 now that we have zookeeper. So
you can set it to whatever you want, it won't change anything :)


 2) zookeeper parent seems bad..

 Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent 
 with what you defined in zookeeper parent node.

Those two are really unrelated, /hbase is the default so no need to
override it, and I'm guessing that hbase.rootdir is somewhere writable
so that's all good.

Now, regarding the Check the value configured in
'zookeeper.znode.parent, it's triggered when the client wants to read
the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist,
it might be because your HBase is homed elsewhere. It could also be
that HBase isn't running at all so the Master never got to create it.

BTW you can start the shell with -d and it's gonna give more info and
dump all the stack traces.

Going by this thread I would guess that HBase isn't running so the
shell won't help. Another way to check is pointing your browser to
localhost:60010 and see if the master is responding. If not, time to
open up the log and see what's up.

J-D


Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
On Thu, May 23, 2013 at 5:50 PM, Jay Vyas jayunit...@gmail.com wrote:

 1) Should hbase-master be changed to localhost?

 Maybe Try changing /etc/hosts to match the actual non loopback ip of your
 machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out :))
  and make sure your /etc/hosts matches the file in my blog post, (you need
 hbase-master to be defined in your /etc/hosts...).


You mean like this?
127.0.0.1   localhost
192.168.1.3hbase-master


 2) zookeeper parent seems bad..

 Change hbase-rootdir to hbase (in hbase.rootdir) so that it's consistent
 with what you defined in zookeeper parent node.


Like so?
http://bin.cakephp.org/view/2005790890


 3) killall -9 java and restart start-hbase.sh


Did that... web gui still not showing up.




 On May 23, 2013, at 5:31 PM, Yves S. Garret yoursurrogate...@gmail.com
 wrote:

  Here is the entire file:
  http://bin.cakephp.org/view/1159889135
 
 
  On Thu, May 23, 2013 at 4:35 PM, Jay Vyas jayunit...@gmail.com wrote:
 
  depends on how you define hbase root in your hbase-site.xml ?
 
  Can you paste it here
 



Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Here is my dump of the sole log file in the logs directory:
http://bin.cakephp.org/view/2116332048


On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote:
  1) Should hbase-master be changed to localhost?
 
  Maybe Try changing /etc/hosts to match the actual non loopback ip of
 your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out
 :))
   and make sure your /etc/hosts matches the file in my blog post, (you
 need hbase-master to be defined in your /etc/hosts...).

 hbase.master was dropped around 2009 now that we have zookeeper. So
 you can set it to whatever you want, it won't change anything :)

 
  2) zookeeper parent seems bad..
 
  Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
 consistent with what you defined in zookeeper parent node.

 Those two are really unrelated, /hbase is the default so no need to
 override it, and I'm guessing that hbase.rootdir is somewhere writable
 so that's all good.

 Now, regarding the Check the value configured in
 'zookeeper.znode.parent, it's triggered when the client wants to read
 the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist,
 it might be because your HBase is homed elsewhere. It could also be
 that HBase isn't running at all so the Master never got to create it.

 BTW you can start the shell with -d and it's gonna give more info and
 dump all the stack traces.

 Going by this thread I would guess that HBase isn't running so the
 shell won't help. Another way to check is pointing your browser to
 localhost:60010 and see if the master is responding. If not, time to
 open up the log and see what's up.

 J-D



Re: HBase is not running.

2013-05-23 Thread Jean-Daniel Cryans
That's your problem:

Caused by: java.net.BindException: Problem binding to
ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested
address

Either it's a public address and you can't bind to it or someone else
is using it.

J-D

On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
yoursurrogate...@gmail.com wrote:
 Here is my dump of the sole log file in the logs directory:
 http://bin.cakephp.org/view/2116332048


 On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote:
  1) Should hbase-master be changed to localhost?
 
  Maybe Try changing /etc/hosts to match the actual non loopback ip of
 your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes out
 :))
   and make sure your /etc/hosts matches the file in my blog post, (you
 need hbase-master to be defined in your /etc/hosts...).

 hbase.master was dropped around 2009 now that we have zookeeper. So
 you can set it to whatever you want, it won't change anything :)

 
  2) zookeeper parent seems bad..
 
  Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
 consistent with what you defined in zookeeper parent node.

 Those two are really unrelated, /hbase is the default so no need to
 override it, and I'm guessing that hbase.rootdir is somewhere writable
 so that's all good.

 Now, regarding the Check the value configured in
 'zookeeper.znode.parent, it's triggered when the client wants to read
 the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist,
 it might be because your HBase is homed elsewhere. It could also be
 that HBase isn't running at all so the Master never got to create it.

 BTW you can start the shell with -d and it's gonna give more info and
 dump all the stack traces.

 Going by this thread I would guess that HBase isn't running so the
 shell won't help. Another way to check is pointing your browser to
 localhost:60010 and see if the master is responding. If not, time to
 open up the log and see what's up.

 J-D



Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Sandy Pratt
I wrote myself a Scanner wrapper that uses a producer/consumer queue to
keep the client fed with a full buffer as much as possible.  When scanning
my table with scanner caching at 100 records, I see about a 24% uplift in
performance (~35k records/sec with the ClientScanner and ~44k records/sec
with my P/C scanner).  However, when I set scanner caching to 5000, it's
more of a wash compared to the standard ClientScanner: ~53k records/sec
with the ClientScanner and ~60k records/sec with the P/C scanner.

I'm not sure what to make of those results.  I think next I'll shut down
HBase and read the HFiles directly, to see if there's a drop off in
performance between reading them directly vs. via the RegionServer.

I still think that to really solve this there needs to be sliding window
of records in flight between disk and RS, and between RS and client.  I'm
thinking there's probably a single batch of records in flight between RS
and client at the moment.

Sandy

On 5/23/13 8:45 AM, Bryan Keller brya...@gmail.com wrote:

I am considering scanning a snapshot instead of the table. I believe this
is what the ExportSnapshot class does. If I could use the scanning code
from ExportSnapshot then I will be able to scan the HDFS files directly
and bypass the regionservers. This could potentially give me a huge boost
in performance for full table scans. However, it doesn't really address
the poor scan performance against a table.



Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Ted Yu
Thanks for the update, Sandy.

If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.

On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote:

 I wrote myself a Scanner wrapper that uses a producer/consumer queue to
 keep the client fed with a full buffer as much as possible.  When scanning
 my table with scanner caching at 100 records, I see about a 24% uplift in
 performance (~35k records/sec with the ClientScanner and ~44k records/sec
 with my P/C scanner).  However, when I set scanner caching to 5000, it's
 more of a wash compared to the standard ClientScanner: ~53k records/sec
 with the ClientScanner and ~60k records/sec with the P/C scanner.

 I'm not sure what to make of those results.  I think next I'll shut down
 HBase and read the HFiles directly, to see if there's a drop off in
 performance between reading them directly vs. via the RegionServer.

 I still think that to really solve this there needs to be sliding window
 of records in flight between disk and RS, and between RS and client.  I'm
 thinking there's probably a single batch of records in flight between RS
 and client at the moment.

 Sandy

 On 5/23/13 8:45 AM, Bryan Keller brya...@gmail.com wrote:

 I am considering scanning a snapshot instead of the table. I believe this
 is what the ExportSnapshot class does. If I could use the scanning code
 from ExportSnapshot then I will be able to scan the HDFS files directly
 and bypass the regionservers. This could potentially give me a huge boost
 in performance for full table scans. However, it doesn't really address
 the poor scan performance against a table.




Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
How weird.  Admittedly I'm not terribly knowledgeable about Hadoop
and all of its sub-projects, but I don't recall ever setting any networking
info to something other than localhost.  What would cause this?


On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 That's your problem:

 Caused by: java.net.BindException: Problem binding to
 ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested
 address

 Either it's a public address and you can't bind to it or someone else
 is using it.

 J-D

 On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
 yoursurrogate...@gmail.com wrote:
  Here is my dump of the sole log file in the logs directory:
  http://bin.cakephp.org/view/2116332048
 
 
  On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote:
   1) Should hbase-master be changed to localhost?
  
   Maybe Try changing /etc/hosts to match the actual non loopback ip of
  your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes
 out
  :))
and make sure your /etc/hosts matches the file in my blog post, (you
  need hbase-master to be defined in your /etc/hosts...).
 
  hbase.master was dropped around 2009 now that we have zookeeper. So
  you can set it to whatever you want, it won't change anything :)
 
  
   2) zookeeper parent seems bad..
  
   Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
  consistent with what you defined in zookeeper parent node.
 
  Those two are really unrelated, /hbase is the default so no need to
  override it, and I'm guessing that hbase.rootdir is somewhere writable
  so that's all good.
 
  Now, regarding the Check the value configured in
  'zookeeper.znode.parent, it's triggered when the client wants to read
  the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist,
  it might be because your HBase is homed elsewhere. It could also be
  that HBase isn't running at all so the Master never got to create it.
 
  BTW you can start the shell with -d and it's gonna give more info and
  dump all the stack traces.
 
  Going by this thread I would guess that HBase isn't running so the
  shell won't help. Another way to check is pointing your browser to
  localhost:60010 and see if the master is responding. If not, time to
  open up the log and see what's up.
 
  J-D
 



Re: querying hbase

2013-05-23 Thread Jean-Marc Spaggiari
Hi James,

Thanks for joining the thread to provide more feedback and valuable
information about Phoenix. I don't have a big knowledge on it, so better to
see you around.

The only thing I was referring is that applications I sent the links for
are simple jars that you can download locally and run without requiring any
specific rights to install/upload anything on any server. Just download,
click on it.

I might be wrong because I did not try Phoenix yet, but I think you need to
upload the JAR on all the region servers first, and then restart them,
right? People might not have the rights to do that. That's why I thought
Pheonix was overkill regarding the need to just list a table content on a
screen.

JM

2013/5/22 James Taylor jtay...@salesforce.com

 Hey JM,
 Can you expand on what you mean? Phoenix is a single jar, easily deployed
 to any HBase cluster. It can map to existing HBase tables or create new
 ones. It allows you to use SQL (a fairly popular language) to query your
 data, and it surfaces it's functionality as a JDBC driver so that it can
 interop with the SQL ecosystem (which has been around for a while).
 Thanks,
 James


 On 05/21/2013 08:41 PM, Jean-Marc Spaggiari wrote:

 Using Phoenix for that is like trying to kill a mosquito with an atomic
 bomb, no? ;)

 Few easy to install and use tools which I already tried:
 - 
 http://sourceforge.net/**projects/haredbhbaseclie/**files/http://sourceforge.net/projects/haredbhbaseclie/files/
 - 
 http://sourceforge.net/**projects/hbasemanagergui/http://sourceforge.net/projects/hbasemanagergui/
 - 
 https://github.com/**NiceSystems/hrider/wikihttps://github.com/NiceSystems/hrider/wiki

 There might be other, but those one at least are doing the basic things to
 look into you tables.

 JM

 2013/5/21 lars hofhansl la...@apache.org

  Maybe Phoenix 
 (http://phoenix-hbase.**blogspot.com/http://phoenix-hbase.blogspot.com/)
 is what you are
 looking for.

 -- Lars

 __**__
 From: Aji Janis aji1...@gmail.com
 To: user user@hbase.apache.org
 Sent: Tuesday, May 21, 2013 3:43 PM
 Subject: Re: querying hbase


 I haven't tried that because I don't know how to. Still I think I am
 looking for a nice GUI interface that can take in HBase connection info
 and
 help me view the data something like pgadmin (or its php version), sql
 developer, etc


 On Tue, May 21, 2013 at 6:16 PM, Viral Bajaria viral.baja...@gmail.com

 wrote:
 The shell allows you to use filters just like the standard HBase API but
 with jruby syntax. Have you tried that or that is too painful and you

 want

 a simpler tool ?

 -Viral

 On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote:

  are there any tools out there that can help in visualizing data stored

 in

 Hbase? I know the shell lets you do basic stuff. But if I don't know

 what

 rowid I am looking for or if I want to rows with family say *name* (yes

 SQL

 like) are there any tools that can help with this? Not trying to use

 this

 on production (although that would be nice) just dev env for now. Thank

 you

 for any suggestionns





Re: HBase is not running.

2013-05-23 Thread Jean-Daniel Cryans
It should only be a matter of network configuration and not a matter
of whether you are a Hadoop expert or not. HBase is just trying to get
the machine's hostname and bind to it and in your case it's given
something it cannot use. It's unfortunate.

IIUC your machine is hosted on cox.net? And it seems that while
providing that machine they at some point set it up so that its
hostname would resolve to a public address. Sounds like a
misconfiguration. Anyways, you can edit your /etc/hosts so that your
hostname points to 127.0.0.1 or, since you are using 0.94.7, set both
hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0
in your hbase-site.xml so that it binds on the wildcard address
instead.

J-D

On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret
yoursurrogate...@gmail.com wrote:
 How weird.  Admittedly I'm not terribly knowledgeable about Hadoop
 and all of its sub-projects, but I don't recall ever setting any networking
 info to something other than localhost.  What would cause this?


 On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 That's your problem:

 Caused by: java.net.BindException: Problem binding to
 ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested
 address

 Either it's a public address and you can't bind to it or someone else
 is using it.

 J-D

 On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
 yoursurrogate...@gmail.com wrote:
  Here is my dump of the sole log file in the logs directory:
  http://bin.cakephp.org/view/2116332048
 
 
  On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com wrote:
   1) Should hbase-master be changed to localhost?
  
   Maybe Try changing /etc/hosts to match the actual non loopback ip of
  your machine... (i.e. just run Ifconfig | grep 1 and see what ip comes
 out
  :))
and make sure your /etc/hosts matches the file in my blog post, (you
  need hbase-master to be defined in your /etc/hosts...).
 
  hbase.master was dropped around 2009 now that we have zookeeper. So
  you can set it to whatever you want, it won't change anything :)
 
  
   2) zookeeper parent seems bad..
  
   Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
  consistent with what you defined in zookeeper parent node.
 
  Those two are really unrelated, /hbase is the default so no need to
  override it, and I'm guessing that hbase.rootdir is somewhere writable
  so that's all good.
 
  Now, regarding the Check the value configured in
  'zookeeper.znode.parent, it's triggered when the client wants to read
  the /hbase znode in ZooKeeper but it's unable to. If it doesn't exist,
  it might be because your HBase is homed elsewhere. It could also be
  that HBase isn't running at all so the Master never got to create it.
 
  BTW you can start the shell with -d and it's gonna give more info and
  dump all the stack traces.
 
  Going by this thread I would guess that HBase isn't running so the
  shell won't help. Another way to check is pointing your browser to
  localhost:60010 and see if the master is responding. If not, time to
  open up the log and see what's up.
 
  J-D
 



Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Do you mean hbase.master.info.bindAddress and
hbase.regionserver.info.bindAddress?  I couldn't find
anything else in the docs.  But having said that, both
are set to 0.0.0.0 by default.

Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010,
no web gui.


On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 It should only be a matter of network configuration and not a matter
 of whether you are a Hadoop expert or not. HBase is just trying to get
 the machine's hostname and bind to it and in your case it's given
 something it cannot use. It's unfortunate.

 IIUC your machine is hosted on cox.net? And it seems that while
 providing that machine they at some point set it up so that its
 hostname would resolve to a public address. Sounds like a
 misconfiguration. Anyways, you can edit your /etc/hosts so that your
 hostname points to 127.0.0.1 or, since you are using 0.94.7, set both
 hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0
 in your hbase-site.xml so that it binds on the wildcard address
 instead.

 J-D

 On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret
 yoursurrogate...@gmail.com wrote:
  How weird.  Admittedly I'm not terribly knowledgeable about Hadoop
  and all of its sub-projects, but I don't recall ever setting any
 networking
  info to something other than localhost.  What would cause this?
 
 
  On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  That's your problem:
 
  Caused by: java.net.BindException: Problem binding to
  ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested
  address
 
  Either it's a public address and you can't bind to it or someone else
  is using it.
 
  J-D
 
  On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
  yoursurrogate...@gmail.com wrote:
   Here is my dump of the sole log file in the logs directory:
   http://bin.cakephp.org/view/2116332048
  
  
   On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans 
 jdcry...@apache.org
  wrote:
  
   On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com
 wrote:
1) Should hbase-master be changed to localhost?
   
Maybe Try changing /etc/hosts to match the actual non loopback ip
 of
   your machine... (i.e. just run Ifconfig | grep 1 and see what ip
 comes
  out
   :))
 and make sure your /etc/hosts matches the file in my blog post,
 (you
   need hbase-master to be defined in your /etc/hosts...).
  
   hbase.master was dropped around 2009 now that we have zookeeper. So
   you can set it to whatever you want, it won't change anything :)
  
   
2) zookeeper parent seems bad..
   
Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
   consistent with what you defined in zookeeper parent node.
  
   Those two are really unrelated, /hbase is the default so no need to
   override it, and I'm guessing that hbase.rootdir is somewhere
 writable
   so that's all good.
  
   Now, regarding the Check the value configured in
   'zookeeper.znode.parent, it's triggered when the client wants to
 read
   the /hbase znode in ZooKeeper but it's unable to. If it doesn't
 exist,
   it might be because your HBase is homed elsewhere. It could also be
   that HBase isn't running at all so the Master never got to create it.
  
   BTW you can start the shell with -d and it's gonna give more info and
   dump all the stack traces.
  
   Going by this thread I would guess that HBase isn't running so the
   shell won't help. Another way to check is pointing your browser to
   localhost:60010 and see if the master is responding. If not, time to
   open up the log and see what's up.
  
   J-D
  
 



Re: HBase is not running.

2013-05-23 Thread Jean-Daniel Cryans
No, I meant hbase.master.ipc.address and
hbase.regionserver.ipc.address. See
https://issues.apache.org/jira/browse/HBASE-8148.

J-D

On Thu, May 23, 2013 at 4:34 PM, Yves S. Garret
yoursurrogate...@gmail.com wrote:
 Do you mean hbase.master.info.bindAddress and
 hbase.regionserver.info.bindAddress?  I couldn't find
 anything else in the docs.  But having said that, both
 are set to 0.0.0.0 by default.

 Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010,
 no web gui.


 On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans 
 jdcry...@apache.orgwrote:

 It should only be a matter of network configuration and not a matter
 of whether you are a Hadoop expert or not. HBase is just trying to get
 the machine's hostname and bind to it and in your case it's given
 something it cannot use. It's unfortunate.

 IIUC your machine is hosted on cox.net? And it seems that while
 providing that machine they at some point set it up so that its
 hostname would resolve to a public address. Sounds like a
 misconfiguration. Anyways, you can edit your /etc/hosts so that your
 hostname points to 127.0.0.1 or, since you are using 0.94.7, set both
 hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0
 in your hbase-site.xml so that it binds on the wildcard address
 instead.

 J-D

 On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret
 yoursurrogate...@gmail.com wrote:
  How weird.  Admittedly I'm not terribly knowledgeable about Hadoop
  and all of its sub-projects, but I don't recall ever setting any
 networking
  info to something other than localhost.  What would cause this?
 
 
  On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  That's your problem:
 
  Caused by: java.net.BindException: Problem binding to
  ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign requested
  address
 
  Either it's a public address and you can't bind to it or someone else
  is using it.
 
  J-D
 
  On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
  yoursurrogate...@gmail.com wrote:
   Here is my dump of the sole log file in the logs directory:
   http://bin.cakephp.org/view/2116332048
  
  
   On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans 
 jdcry...@apache.org
  wrote:
  
   On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com
 wrote:
1) Should hbase-master be changed to localhost?
   
Maybe Try changing /etc/hosts to match the actual non loopback ip
 of
   your machine... (i.e. just run Ifconfig | grep 1 and see what ip
 comes
  out
   :))
 and make sure your /etc/hosts matches the file in my blog post,
 (you
   need hbase-master to be defined in your /etc/hosts...).
  
   hbase.master was dropped around 2009 now that we have zookeeper. So
   you can set it to whatever you want, it won't change anything :)
  
   
2) zookeeper parent seems bad..
   
Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
   consistent with what you defined in zookeeper parent node.
  
   Those two are really unrelated, /hbase is the default so no need to
   override it, and I'm guessing that hbase.rootdir is somewhere
 writable
   so that's all good.
  
   Now, regarding the Check the value configured in
   'zookeeper.znode.parent, it's triggered when the client wants to
 read
   the /hbase znode in ZooKeeper but it's unable to. If it doesn't
 exist,
   it might be because your HBase is homed elsewhere. It could also be
   that HBase isn't running at all so the Master never got to create it.
  
   BTW you can start the shell with -d and it's gonna give more info and
   dump all the stack traces.
  
   Going by this thread I would guess that HBase isn't running so the
   shell won't help. Another way to check is pointing your browser to
   localhost:60010 and see if the master is responding. If not, time to
   open up the log and see what's up.
  
   J-D
  
 



Re: querying hbase

2013-05-23 Thread James Taylor
Actually, with the great work you guys have been doing and the 
resolution of HBASE-1936 by Jimmy Xiang, we'll be able to ease the 
installation of Phoenix in our next release. You'll still need to bounce 
the regions servers to reload our custom filters and coprocessors, but 
you won't need to manually add the phoenix jar to the hbase classpath on 
each region server (as long as the installing user has permission to 
write into HDFS).


Has there been any discussions on running the HBase server in an OSGi 
container? That would potentially even alleviate the need to bounce the 
region servers. I didn't see a JIRA, so I created this one: 
https://issues.apache.org/jira/browse/HBASE-8607


Thanks,
James

On 05/23/2013 04:17 PM, Jean-Marc Spaggiari wrote:

Hi James,

Thanks for joining the thread to provide more feedback and valuable
information about Phoenix. I don't have a big knowledge on it, so better to
see you around.

The only thing I was referring is that applications I sent the links for
are simple jars that you can download locally and run without requiring any
specific rights to install/upload anything on any server. Just download,
click on it.

I might be wrong because I did not try Phoenix yet, but I think you need to
upload the JAR on all the region servers first, and then restart them,
right? People might not have the rights to do that. That's why I thought
Pheonix was overkill regarding the need to just list a table content on a
screen.

JM

2013/5/22 James Taylor jtay...@salesforce.com


Hey JM,
Can you expand on what you mean? Phoenix is a single jar, easily deployed
to any HBase cluster. It can map to existing HBase tables or create new
ones. It allows you to use SQL (a fairly popular language) to query your
data, and it surfaces it's functionality as a JDBC driver so that it can
interop with the SQL ecosystem (which has been around for a while).
Thanks,
James


On 05/21/2013 08:41 PM, Jean-Marc Spaggiari wrote:


Using Phoenix for that is like trying to kill a mosquito with an atomic
bomb, no? ;)

Few easy to install and use tools which I already tried:
- 
http://sourceforge.net/**projects/haredbhbaseclie/**files/http://sourceforge.net/projects/haredbhbaseclie/files/
- 
http://sourceforge.net/**projects/hbasemanagergui/http://sourceforge.net/projects/hbasemanagergui/
- 
https://github.com/**NiceSystems/hrider/wikihttps://github.com/NiceSystems/hrider/wiki

There might be other, but those one at least are doing the basic things to
look into you tables.

JM

2013/5/21 lars hofhansl la...@apache.org

  Maybe Phoenix 
(http://phoenix-hbase.**blogspot.com/http://phoenix-hbase.blogspot.com/)

is what you are
looking for.

-- Lars

__**__
From: Aji Janis aji1...@gmail.com
To: user user@hbase.apache.org
Sent: Tuesday, May 21, 2013 3:43 PM
Subject: Re: querying hbase


I haven't tried that because I don't know how to. Still I think I am
looking for a nice GUI interface that can take in HBase connection info
and
help me view the data something like pgadmin (or its php version), sql
developer, etc


On Tue, May 21, 2013 at 6:16 PM, Viral Bajaria viral.baja...@gmail.com


wrote:
The shell allows you to use filters just like the standard HBase API but
with jruby syntax. Have you tried that or that is too painful and you


want


a simpler tool ?

-Viral

On Tue, May 21, 2013 at 2:58 PM, Aji Janis aji1...@gmail.com wrote:

  are there any tools out there that can help in visualizing data stored
in
Hbase? I know the shell lets you do basic stuff. But if I don't know
what
rowid I am looking for or if I want to rows with family say *name* (yes
SQL


like) are there any tools that can help with this? Not trying to use


this
on production (although that would be nice) just dev env for now. Thank
you


for any suggestionns






Re: HBase is not running.

2013-05-23 Thread Yves S. Garret
Ok, didn't see that in hbase-0.94.7/docs/book.html, after doing a more
thorough search, found it in here on line 293:
hbase-0.94.7/docs/xref/org/apache/hadoop/hbase/master/HMaster.html

I'll make the change in hbase-site.xml.


On Thu, May 23, 2013 at 7:35 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote:

 No, I meant hbase.master.ipc.address and
 hbase.regionserver.ipc.address. See
 https://issues.apache.org/jira/browse/HBASE-8148.

 J-D

 On Thu, May 23, 2013 at 4:34 PM, Yves S. Garret
 yoursurrogate...@gmail.com wrote:
  Do you mean hbase.master.info.bindAddress and
  hbase.regionserver.info.bindAddress?  I couldn't find
  anything else in the docs.  But having said that, both
  are set to 0.0.0.0 by default.
 
  Also, I checked out 127.0.0.1:60010 and 0.0.0.0:60010,
  no web gui.
 
 
  On Thu, May 23, 2013 at 7:19 PM, Jean-Daniel Cryans jdcry...@apache.org
 wrote:
 
  It should only be a matter of network configuration and not a matter
  of whether you are a Hadoop expert or not. HBase is just trying to get
  the machine's hostname and bind to it and in your case it's given
  something it cannot use. It's unfortunate.
 
  IIUC your machine is hosted on cox.net? And it seems that while
  providing that machine they at some point set it up so that its
  hostname would resolve to a public address. Sounds like a
  misconfiguration. Anyways, you can edit your /etc/hosts so that your
  hostname points to 127.0.0.1 or, since you are using 0.94.7, set both
  hbase.master.ipc.address and hbase.regionserver.ipc.address to 0.0.0.0
  in your hbase-site.xml so that it binds on the wildcard address
  instead.
 
  J-D
 
  On Thu, May 23, 2013 at 4:07 PM, Yves S. Garret
  yoursurrogate...@gmail.com wrote:
   How weird.  Admittedly I'm not terribly knowledgeable about Hadoop
   and all of its sub-projects, but I don't recall ever setting any
  networking
   info to something other than localhost.  What would cause this?
  
  
   On Thu, May 23, 2013 at 6:26 PM, Jean-Daniel Cryans 
 jdcry...@apache.org
  wrote:
  
   That's your problem:
  
   Caused by: java.net.BindException: Problem binding to
   ip72-215-225-9.at.at.cox.net/72.215.225.9:0 : Cannot assign
 requested
   address
  
   Either it's a public address and you can't bind to it or someone else
   is using it.
  
   J-D
  
   On Thu, May 23, 2013 at 3:24 PM, Yves S. Garret
   yoursurrogate...@gmail.com wrote:
Here is my dump of the sole log file in the logs directory:
http://bin.cakephp.org/view/2116332048
   
   
On Thu, May 23, 2013 at 6:20 PM, Jean-Daniel Cryans 
  jdcry...@apache.org
   wrote:
   
On Thu, May 23, 2013 at 2:50 PM, Jay Vyas jayunit...@gmail.com
  wrote:
 1) Should hbase-master be changed to localhost?

 Maybe Try changing /etc/hosts to match the actual non loopback
 ip
  of
your machine... (i.e. just run Ifconfig | grep 1 and see what ip
  comes
   out
:))
  and make sure your /etc/hosts matches the file in my blog post,
  (you
need hbase-master to be defined in your /etc/hosts...).
   
hbase.master was dropped around 2009 now that we have zookeeper.
 So
you can set it to whatever you want, it won't change anything :)
   

 2) zookeeper parent seems bad..

 Change hbase-rootdir to hbase (in hbase.rootdir) so that it's
consistent with what you defined in zookeeper parent node.
   
Those two are really unrelated, /hbase is the default so no need
 to
override it, and I'm guessing that hbase.rootdir is somewhere
  writable
so that's all good.
   
Now, regarding the Check the value configured in
'zookeeper.znode.parent, it's triggered when the client wants to
  read
the /hbase znode in ZooKeeper but it's unable to. If it doesn't
  exist,
it might be because your HBase is homed elsewhere. It could also
 be
that HBase isn't running at all so the Master never got to create
 it.
   
BTW you can start the shell with -d and it's gonna give more info
 and
dump all the stack traces.
   
Going by this thread I would guess that HBase isn't running so the
shell won't help. Another way to check is pointing your browser to
localhost:60010 and see if the master is responding. If not, time
 to
open up the log and see what's up.
   
J-D