Re: HBase is not running.

2013-04-26 Thread Jean-Marc Spaggiari
Hi Yves,

You need to add an entry with your host name and your local IP.

As an example, here is mine:

127.0.0.1   localhost
192.168.23.2buldo

My host name is buldo.

JM

2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 Hi Jean, this is my /etc/hosts.

 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 127.0.0.1   localhost
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6


 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Yves,

 You seems to have some network configuration issue with your installation.

 java.net.BindException: Cannot assign requested address and
 ip72-215-225-9.at.at.cox.net/72.215.225.9:0

 How is your host file configured? You need to have your host name
 pointing to you local IP (and not 127.0.0.1).

 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  My mistake.  I thought I had all of those logs.  This is what I currently
  have:
  http://bin.cakephp.org/view/2112130549
 
  I have $JAVA_HOME set to this:
  /usr/java/jdk1.7.0_17
  I have extracted 0.94 and ran bin/start-hbase.sh
 
  Thanks for your help!
 
 
 
  On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Mohammad,
 
  He is running standalone, so no need to update the zookeeper qorum yet.
 
  Yes, can you share the entire hbase-ysg-master-ysg.connect.log file?
  Not just the first lines. Or what you sent is already all?
 
  So what have you done yet? Downloaded 0.94, extracted it, setup the
  JAVA_HOME and ran bin/start-hbase.sh ?
 
  JMS
 
  2013/4/25 Mohammad Tariq donta...@gmail.com:
   Hello Yves,
  
  The log seems to be incomplete. Could you please the complete
   logs?Have you set the hbase.zookeeper.quorum property properly?Is
 your
   Hadoop running fine?
  
   Warm Regards,
   Tariq
   https://mtariq.jux.com/
   cloudfront.blogspot.com
  
  
   On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret
   yoursurrogate...@gmail.comwrote:
  
   Hi again.  I have 3 log files and only one of them had anything in
 them,
   here are the file names.  I'm assuming that you're talking about the
   directory ${APACHE_HBASE_HOME}/logs, yes?
  
   Here are the file names:
   -rw-rw-r--. 1 user user 12465 Apr 25 14:54
  hbase-ysg-master-ysg.connect.log
   -rw-rw-r--. 1 user user 0 Apr 25 14:54
  hbase-ysg-master-ysg.connect.out
   -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit
  
   Also, to answer your question about the UI, I tried that URL (I'm
 doing
  all
   of this on my laptop just to learn at the moment) and neither the URL
  nor
   localhost:60010 worked.  So, the answer to your question is that the
 UI
  is
   not showing up.  This could be due to not being far along in the
  tutorial,
   perhaps?
  
   Thanks again!
  
  
   On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
There is no stupid question ;)
   
Are the log truncated? Anything else after that? Or that's all what
  you
have?
   
For the UI, you can access it with
   http://192.168.X.X:60010/master-status
   
Replace the X with your own IP. You should see some information
 about
your HBase cluster (even in Standalone mode).
   
JMS
   
2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 Here are the logs, what should I be looking for?  Seems like
  everything
 is fine for the moment, no?

 http://bin.cakephp.org/view/2144893539

 The web UI?  What do you mean?  Sorry if this is a stupid
 question,
  I'm
 a Hadoop newb.

 On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Before trying the shell, can you look at the server logs and
 see if
 everything is fine?

 Also, is the web UI working fine?

 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  Ok, spoke too soon :) .
 
  I ran this command [ create 'test', 'cf' ] and this is the
 result
that I
  got:
  http://bin.cakephp.org/view/168926019
 
  This is after running helpenter and having this run just
 fine.
 
 
  On Thu, Apr 25, 2013 at 1:23 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Yves,
 
  0.95.0 is a developer version. If you are starting with
 HBase, I
   will
  recommend you to choose a more stable version like 0.94.6.1.
 
  Regarding the 3 choices you are listing below.
  1) This one is HBase 0.95 running over Hadoop 1.0
  2) This one is HBase 0.95 running over Hadoop 2.0
  3) This one are the HBase source classes.
 
  Again, I think you are better to go with a stable version for
  the
  first steps:
  http://www.bizdirusa.com/mirrors/apache/hbase/stable/
 
  Would you mind to retry you tests with this version and let
 me
  know
if
  it's working better?
 
  JM
 
  2013/4/25 Yves S. 

Re: undefined method `internal_command' for Shell::Formatter::Console

2013-04-26 Thread Robin Gowin
Ok thanks for the clarification.

I tried that (removing ruby but not yet re-installing it) and I got the
same error message.


On Thu, Apr 25, 2013 at 5:02 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Robin,

 No, the idea is to run yum remove, and then test the HBase sheel.
 Don't run yum install ruby until we get that fixed. I want to see if
 your installed very of Ruby can cause the issue.

 The it was refering to the Ruby package.

 JM

 2013/4/25 Robin Gowin landr...@gmail.com:
  To be more explicit:
 
  I'm running CentOS release 6.4 in a vm on Mac OSx 10.6
  I ran yum remove ruby and then yum install ruby (inside the vm). Is that
  what you meant?
 
  Also I put in some simple print statements in several of the ruby
  scripts called by the hbase shell, and they are getting executed.
  (for example: admin.rb, hbase.rb, and table.rb)
 
  (I wasn't sure what it referred to in your email)
 
  Robin
 
 
  On Thu, Apr 25, 2013 at 3:58 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  No, don't re-install it ;)
 
  Remove it and retry. To make sure it's not using any lib anywhere
 else...
 
  JM
 
  2013/4/25 Robin Gowin landr...@gmail.com:
   I removed ruby and reinstalled it; same results.
  
   On Thu, Apr 25, 2013 at 11:59 AM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
   Is it easy for you to de-install it and re-install it? If so, would
   you mind giving it a try?
  



Re: HBase is not running.

2013-04-26 Thread Yves S. Garret
Hi, thanks for your reply.

I did [ hostname ] in my linux OS and this is what I have for a
hostname [ ysg.connect ].

This is how my hosts file looks like.
127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4
127.0.0.1   localhost
192.168.1.6 ysg.connect
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6

Now, I fired up the shell and this is the result that I got when I
tried to execute [ create 'test', 'cf' ].  This is the error that I got:
http://bin.cakephp.org/view/1016732333

The weird thing is that after starting the shell, executing that
command, having that command error out and keep going and
then exiting the command, I checked the logs and... nothing
was displayed.  It's as if nothing was stored.


On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Yves,

 You need to add an entry with your host name and your local IP.

 As an example, here is mine:

 127.0.0.1   localhost
 192.168.23.2buldo

 My host name is buldo.

 JM

 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  Hi Jean, this is my /etc/hosts.
 
  127.0.0.1   localhost localhost.localdomain localhost4
  localhost4.localdomain4
  127.0.0.1   localhost
  ::1 localhost localhost.localdomain localhost6
  localhost6.localdomain6
 
 
  On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Yves,
 
  You seems to have some network configuration issue with your
 installation.
 
  java.net.BindException: Cannot assign requested address and
  ip72-215-225-9.at.at.cox.net/72.215.225.9:0
 
  How is your host file configured? You need to have your host name
  pointing to you local IP (and not 127.0.0.1).
 
  2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
   My mistake.  I thought I had all of those logs.  This is what I
 currently
   have:
   http://bin.cakephp.org/view/2112130549
  
   I have $JAVA_HOME set to this:
   /usr/java/jdk1.7.0_17
   I have extracted 0.94 and ran bin/start-hbase.sh
  
   Thanks for your help!
  
  
  
   On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari 
   jean-m...@spaggiari.org wrote:
  
   Hi Mohammad,
  
   He is running standalone, so no need to update the zookeeper qorum
 yet.
  
   Yes, can you share the entire hbase-ysg-master-ysg.connect.log file?
   Not just the first lines. Or what you sent is already all?
  
   So what have you done yet? Downloaded 0.94, extracted it, setup the
   JAVA_HOME and ran bin/start-hbase.sh ?
  
   JMS
  
   2013/4/25 Mohammad Tariq donta...@gmail.com:
Hello Yves,
   
   The log seems to be incomplete. Could you please the
 complete
logs?Have you set the hbase.zookeeper.quorum property properly?Is
  your
Hadoop running fine?
   
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
   
   
On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret
yoursurrogate...@gmail.comwrote:
   
Hi again.  I have 3 log files and only one of them had anything in
  them,
here are the file names.  I'm assuming that you're talking about
 the
directory ${APACHE_HBASE_HOME}/logs, yes?
   
Here are the file names:
-rw-rw-r--. 1 user user 12465 Apr 25 14:54
   hbase-ysg-master-ysg.connect.log
-rw-rw-r--. 1 user user 0 Apr 25 14:54
   hbase-ysg-master-ysg.connect.out
-rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit
   
Also, to answer your question about the UI, I tried that URL (I'm
  doing
   all
of this on my laptop just to learn at the moment) and neither the
 URL
   nor
localhost:60010 worked.  So, the answer to your question is that
 the
  UI
   is
not showing up.  This could be due to not being far along in the
   tutorial,
perhaps?
   
Thanks again!
   
   
On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:
   
 There is no stupid question ;)

 Are the log truncated? Anything else after that? Or that's all
 what
   you
 have?

 For the UI, you can access it with
http://192.168.X.X:60010/master-status

 Replace the X with your own IP. You should see some information
  about
 your HBase cluster (even in Standalone mode).

 JMS

 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  Here are the logs, what should I be looking for?  Seems like
   everything
  is fine for the moment, no?
 
  http://bin.cakephp.org/view/2144893539
 
  The web UI?  What do you mean?  Sorry if this is a stupid
  question,
   I'm
  a Hadoop newb.
 
  On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Before trying the shell, can you look at the server logs and
  see if
  everything is fine?
 
  Also, is the web UI working fine?
 
  2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
   Ok, spoke too soon :) .
  
   I ran this 

Schema Design Question

2013-04-26 Thread Cameron Gandevia
Hi

I am new to HBase, I have been trying to POC an application and have a
design questions.

Currently we have a single table with the following key design

jobId_batchId_bundleId_uniquefileId

This is an offline processing system so data would be bulk loaded into
HBase via map/reduce jobs. We only need to support report generation
queries using map/reduce over a batch (And possibly a single column filter)
with the batchId as the start/end scan key. Once we have finished
processing a job we are free to remove the data from HBase.

We have varied workloads so a job could be made up of 10 rows, 100,000 rows
or 1 billion rows with the average falling somewhere around 10 million rows.

My question is related to pre-splitting. If we have a billion rows all with
the same batchId (Our map/reduce scan key) my understanding is we should
perform pre-splitting to create buckets hosted by different regions. If a
jobs workload can be so varied would it make sense to have a single table
containing all jobs? Or should we create 1 table per job and pre-split the
table for the given workload? If we had separate table we could drop them
when no longer needed.

If we didn't have a separate table per job how should we perform splitting?
Should we choose our largest possible workload and split for that? even
though 90% of our jobs would fall in the lower bound in terms of row count.
Would we experience any issue purging jobs of varying sizes if everything
was in a single table?

any advice would be greatly appreciated.

Thanks


Re: HBase is not running.

2013-04-26 Thread Leonid Fedotov
Looks like your zookeeper configuration is incorrect in HBase.

Check it out.

Thank you!

Sincerely,
Leonid Fedotov
Technical Support Engineer

On Apr 26, 2013, at 9:59 AM, Yves S. Garret wrote:

 Hi, thanks for your reply.
 
 I did [ hostname ] in my linux OS and this is what I have for a
 hostname [ ysg.connect ].
 
 This is how my hosts file looks like.
 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 127.0.0.1   localhost
 192.168.1.6 ysg.connect
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6
 
 Now, I fired up the shell and this is the result that I got when I
 tried to execute [ create 'test', 'cf' ].  This is the error that I got:
 http://bin.cakephp.org/view/1016732333
 
 The weird thing is that after starting the shell, executing that
 command, having that command error out and keep going and
 then exiting the command, I checked the logs and... nothing
 was displayed.  It's as if nothing was stored.
 
 
 On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:
 
 Hi Yves,
 
 You need to add an entry with your host name and your local IP.
 
 As an example, here is mine:
 
 127.0.0.1   localhost
 192.168.23.2buldo
 
 My host name is buldo.
 
 JM
 
 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 Hi Jean, this is my /etc/hosts.
 
 127.0.0.1   localhost localhost.localdomain localhost4
 localhost4.localdomain4
 127.0.0.1   localhost
 ::1 localhost localhost.localdomain localhost6
 localhost6.localdomain6
 
 
 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:
 
 Hi Yves,
 
 You seems to have some network configuration issue with your
 installation.
 
 java.net.BindException: Cannot assign requested address and
 ip72-215-225-9.at.at.cox.net/72.215.225.9:0
 
 How is your host file configured? You need to have your host name
 pointing to you local IP (and not 127.0.0.1).
 
 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 My mistake.  I thought I had all of those logs.  This is what I
 currently
 have:
 http://bin.cakephp.org/view/2112130549
 
 I have $JAVA_HOME set to this:
 /usr/java/jdk1.7.0_17
 I have extracted 0.94 and ran bin/start-hbase.sh
 
 Thanks for your help!
 
 
 
 On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:
 
 Hi Mohammad,
 
 He is running standalone, so no need to update the zookeeper qorum
 yet.
 
 Yes, can you share the entire hbase-ysg-master-ysg.connect.log file?
 Not just the first lines. Or what you sent is already all?
 
 So what have you done yet? Downloaded 0.94, extracted it, setup the
 JAVA_HOME and ran bin/start-hbase.sh ?
 
 JMS
 
 2013/4/25 Mohammad Tariq donta...@gmail.com:
 Hello Yves,
 
   The log seems to be incomplete. Could you please the
 complete
 logs?Have you set the hbase.zookeeper.quorum property properly?Is
 your
 Hadoop running fine?
 
 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com
 
 
 On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret
 yoursurrogate...@gmail.comwrote:
 
 Hi again.  I have 3 log files and only one of them had anything in
 them,
 here are the file names.  I'm assuming that you're talking about
 the
 directory ${APACHE_HBASE_HOME}/logs, yes?
 
 Here are the file names:
 -rw-rw-r--. 1 user user 12465 Apr 25 14:54
 hbase-ysg-master-ysg.connect.log
 -rw-rw-r--. 1 user user 0 Apr 25 14:54
 hbase-ysg-master-ysg.connect.out
 -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit
 
 Also, to answer your question about the UI, I tried that URL (I'm
 doing
 all
 of this on my laptop just to learn at the moment) and neither the
 URL
 nor
 localhost:60010 worked.  So, the answer to your question is that
 the
 UI
 is
 not showing up.  This could be due to not being far along in the
 tutorial,
 perhaps?
 
 Thanks again!
 
 
 On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:
 
 There is no stupid question ;)
 
 Are the log truncated? Anything else after that? Or that's all
 what
 you
 have?
 
 For the UI, you can access it with
 http://192.168.X.X:60010/master-status
 
 Replace the X with your own IP. You should see some information
 about
 your HBase cluster (even in Standalone mode).
 
 JMS
 
 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 Here are the logs, what should I be looking for?  Seems like
 everything
 is fine for the moment, no?
 
 http://bin.cakephp.org/view/2144893539
 
 The web UI?  What do you mean?  Sorry if this is a stupid
 question,
 I'm
 a Hadoop newb.
 
 On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:
 
 Before trying the shell, can you look at the server logs and
 see if
 everything is fine?
 
 Also, is the web UI working fine?
 
 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
 Ok, spoke too soon :) .
 
 I ran this command [ create 'test', 'cf' ] and this is the
 result
 that I
 got:
 

Re: Snapshot Export Problem

2013-04-26 Thread Sean MacDonald
Hi Jon, 

I've actually discovered another issue with snapshot export. If you have a 
region that has recently split and you take a snapshot of that table and try to 
export it while the children still have references to the files in the split 
parent, the files will not be transferred and will be counted in the missing 
total. You end with error messages like:

java.io.FileNotFoundException: Unable to open link: 
org.apache.hadoop.hbase.io.HLogLink

Please let me know if you would like any additional information.

Thanks and have a great day,

Sean 


On Wednesday, 24 April, 2013 at 9:19 AM, Sean MacDonald wrote:

 Hi Jon, 
 
 No problem. We do have snapshots enabled on the target cluster, and we are 
 using the default hfile archiver settings on both clusters.
 
 Thanks,
 
 Sean 
 
 
 On Tuesday, 23 April, 2013 at 1:54 PM, Jonathan Hsieh wrote:
 
  Sean,
  
  Thanks for finding this problem. Can you provide some more information so
  that we can try to duplicate and fix this problem?
  
  Are snapshots on on the target cluster?
  What are the hfile archiver settings in your hbase-site.xml on both
  clusters?
  
  Thanks,
  Jon.
  
  
  On Mon, Apr 22, 2013 at 4:47 PM, Sean MacDonald s...@opendns.com 
  (mailto:s...@opendns.com) wrote:
  
   It looks like you can't export a snapshot to a running cluster or it will
   start cleaning up files from the archive after a period of time. I have
   turned off HBase on the destination cluster and the export is working as
   expected now.
   
   Sean
   
   
   On Monday, 22 April, 2013 at 9:22 AM, Sean MacDonald wrote:
   
Hello,

I am using HBase 0.94.6 on CDH 4.2 and trying to export a snapshot to
   another cluster (also CDH 4.2), but this is failing repeatedly. The table 
   I
   am trying to export is approximately 4TB in size and has 10GB regions. 
   Each
   of the map jobs runs for about 6 minutes and appears to be running
   properly, but then fails with a message like the following:

2013-04-22 16:12:50,699 WARN org.apache.hadoop.hdfs.DFSClient:
   DataStreamer Exception
   org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
   No lease on
   /hbase/.archive/queries/533fcbb7858ef34b103a4f8804fa8719/d/651e974dafb64eefb9c49032aec4a35b
   File does not exist. Holder DFSClient_NONMAPREDUCE_-192704511_1 does not
   have any open files. at
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
   at
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
   at
   org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
   at
   org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
   at
   org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
   at
   org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtoc
   ol
$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) at
   
   
   
   
   org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at
   org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at
   org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at
   java.security.AccessController.doPrivileged(Native Method) at
   javax.security.auth.Subject.doAs(Subject.java:396) at
   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

I was able to see the file that the LeaseExpiredException mentions on
   the destination cluster before the exception happened (it is gone
   afterwards).

Any help that could be provided in resolving this would be greatly
   appreciated.

Thanks and have a great day,

Sean
  
  
  -- 
  // Jonathan Hsieh (shay)
  // Software Engineer, Cloudera
  // j...@cloudera.com (mailto:j...@cloudera.com)
 





Re: Snapshot Export Problem

2013-04-26 Thread Matteo Bertozzi
Hey Sean,

could you provide us the full stack trace of the FileNotFoundException
Unable to open link
and also the output of: hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo
-files -stats -snapshot SNAPSHOT_NAME
to give us a better idea of what is the state of the snapshot

Thanks!


On Fri, Apr 26, 2013 at 9:51 PM, Sean MacDonald s...@opendns.com wrote:

 Hi Jon,

 I've actually discovered another issue with snapshot export. If you have a
 region that has recently split and you take a snapshot of that table and
 try to export it while the children still have references to the files in
 the split parent, the files will not be transferred and will be counted in
 the missing total. You end with error messages like:

 java.io.FileNotFoundException: Unable to open link:
 org.apache.hadoop.hbase.io.HLogLink

 Please let me know if you would like any additional information.

 Thanks and have a great day,

 Sean


 On Wednesday, 24 April, 2013 at 9:19 AM, Sean MacDonald wrote:

  Hi Jon,
 
  No problem. We do have snapshots enabled on the target cluster, and we
 are using the default hfile archiver settings on both clusters.
 
  Thanks,
 
  Sean
 
 
  On Tuesday, 23 April, 2013 at 1:54 PM, Jonathan Hsieh wrote:
 
   Sean,
  
   Thanks for finding this problem. Can you provide some more information
 so
   that we can try to duplicate and fix this problem?
  
   Are snapshots on on the target cluster?
   What are the hfile archiver settings in your hbase-site.xml on both
   clusters?
  
   Thanks,
   Jon.
  
  
   On Mon, Apr 22, 2013 at 4:47 PM, Sean MacDonald s...@opendns.com(mailto:
 s...@opendns.com) wrote:
  
It looks like you can't export a snapshot to a running cluster or it
 will
start cleaning up files from the archive after a period of time. I
 have
turned off HBase on the destination cluster and the export is
 working as
expected now.
   
Sean
   
   
On Monday, 22 April, 2013 at 9:22 AM, Sean MacDonald wrote:
   
 Hello,

 I am using HBase 0.94.6 on CDH 4.2 and trying to export a snapshot
 to
another cluster (also CDH 4.2), but this is failing repeatedly. The
 table I
am trying to export is approximately 4TB in size and has 10GB
 regions. Each
of the map jobs runs for about 6 minutes and appears to be running
properly, but then fails with a message like the following:

 2013-04-22 16:12:50,699 WARN org.apache.hadoop.hdfs.DFSClient:
DataStreamer Exception
   
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
   
 /hbase/.archive/queries/533fcbb7858ef34b103a4f8804fa8719/d/651e974dafb64eefb9c49032aec4a35b
File does not exist. Holder DFSClient_NONMAPREDUCE_-192704511_1 does
 not
have any open files. at
   
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
at
   
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
at
   
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
at
   
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
at
   
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
at
   
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtoc
ol
 $2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) at
   
   
   
   
   
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:396) at
   
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)

 I was able to see the file that the LeaseExpiredException mentions
 on
the destination cluster before the exception happened (it is gone
afterwards).

 Any help that could be provided in resolving this would be greatly
appreciated.

 Thanks and have a great day,

 Sean
  
  
   --
   // Jonathan Hsieh (shay)
   // Software Engineer, Cloudera
   // j...@cloudera.com (mailto:j...@cloudera.com)
 






Re: Schema Design Question

2013-04-26 Thread Ted Yu
My understanding of your use case is that data for different jobIds would
be continuously loaded into the underlying table(s).

Looks like you can have one table per job. This way you drop the table
after map reduce is complete. In the single table approach, you would
delete many rows in the table which is not as fast as dropping the separate
table.

Cheers

On Sat, Apr 27, 2013 at 3:49 AM, Cameron Gandevia cgande...@gmail.comwrote:

 Hi

 I am new to HBase, I have been trying to POC an application and have a
 design questions.

 Currently we have a single table with the following key design

 jobId_batchId_bundleId_uniquefileId

 This is an offline processing system so data would be bulk loaded into
 HBase via map/reduce jobs. We only need to support report generation
 queries using map/reduce over a batch (And possibly a single column filter)
 with the batchId as the start/end scan key. Once we have finished
 processing a job we are free to remove the data from HBase.

 We have varied workloads so a job could be made up of 10 rows, 100,000 rows
 or 1 billion rows with the average falling somewhere around 10 million
 rows.

 My question is related to pre-splitting. If we have a billion rows all with
 the same batchId (Our map/reduce scan key) my understanding is we should
 perform pre-splitting to create buckets hosted by different regions. If a
 jobs workload can be so varied would it make sense to have a single table
 containing all jobs? Or should we create 1 table per job and pre-split the
 table for the given workload? If we had separate table we could drop them
 when no longer needed.

 If we didn't have a separate table per job how should we perform splitting?
 Should we choose our largest possible workload and split for that? even
 though 90% of our jobs would fall in the lower bound in terms of row count.
 Would we experience any issue purging jobs of varying sizes if everything
 was in a single table?

 any advice would be greatly appreciated.

 Thanks



Re: Dual Hadoop/HBase configuration through same client

2013-04-26 Thread Ted Yu
Looks like the easiest solution is to use separate clients, one for each
cluster you want to connect to.

Cheers

On Sat, Apr 27, 2013 at 6:51 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Hello,

 This is a follow-up to my previous post a few days back. I am trying to
 connect to 2 different Hadoop clusters' setups through a same client but I
 am running into the issue that the config of one overwrites the other.

 The scenario is that I want to read data from an HBase table from one
 cluster and write it as a file on HDFS on the other. Individually, if I try
 to write to them they both work but when I try this through a same Java
 client, they fail.

 I have tried loading the core-site.xml through addResource method of the
 Configuration class but only the first found config file is picked? I have
 also tried by renaming the config files and then adding them as a resource
 (again through the addResource method).

 The situation is compounded by the fact that one cluster is using Kerberos
 authentication and the other is not? If the Kerberos server's file is found
 first then authentication failures are faced for the other server when
 Hadoop tries to find client authentication information. If the 'simple'
 cluster's config is loaded first then 'Authentication is Required' error is
 encountered against the Kerberos server.

 I will gladly provide more information. Is it even possible even if let us
 say both servers have same security configuration or none? Any ideas?
 Thanks a million.

 Regards,
 Shahab



Re: Schema Design Question

2013-04-26 Thread Enis Söztutar
Hi,

Interesting use case. I think it depends on job many jobId's you expect to
have. If it is on the order of thousands, I would caution against going the
one table per jobid approach, since for every table, there is some master
overhead, as well as file structures in hdfs. If jobId's are managable,
going with separate tables makes sense if you want to efficiently delete
all the data related to a job.

Also pre-splitting will depend on expected number of jobIds / batchIds and
their ranges vs desired number of regions. You would want to keep number of
regions hosted by a single region server in the low tens, thus, your splits
can be across jobs or within jobs depending on cardinality. Can you share
some more?

Enis


On Fri, Apr 26, 2013 at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote:

 My understanding of your use case is that data for different jobIds would
 be continuously loaded into the underlying table(s).

 Looks like you can have one table per job. This way you drop the table
 after map reduce is complete. In the single table approach, you would
 delete many rows in the table which is not as fast as dropping the separate
 table.

 Cheers

 On Sat, Apr 27, 2013 at 3:49 AM, Cameron Gandevia cgande...@gmail.com
 wrote:

  Hi
 
  I am new to HBase, I have been trying to POC an application and have a
  design questions.
 
  Currently we have a single table with the following key design
 
  jobId_batchId_bundleId_uniquefileId
 
  This is an offline processing system so data would be bulk loaded into
  HBase via map/reduce jobs. We only need to support report generation
  queries using map/reduce over a batch (And possibly a single column
 filter)
  with the batchId as the start/end scan key. Once we have finished
  processing a job we are free to remove the data from HBase.
 
  We have varied workloads so a job could be made up of 10 rows, 100,000
 rows
  or 1 billion rows with the average falling somewhere around 10 million
  rows.
 
  My question is related to pre-splitting. If we have a billion rows all
 with
  the same batchId (Our map/reduce scan key) my understanding is we should
  perform pre-splitting to create buckets hosted by different regions. If a
  jobs workload can be so varied would it make sense to have a single table
  containing all jobs? Or should we create 1 table per job and pre-split
 the
  table for the given workload? If we had separate table we could drop them
  when no longer needed.
 
  If we didn't have a separate table per job how should we perform
 splitting?
  Should we choose our largest possible workload and split for that? even
  though 90% of our jobs would fall in the lower bound in terms of row
 count.
  Would we experience any issue purging jobs of varying sizes if everything
  was in a single table?
 
  any advice would be greatly appreciated.
 
  Thanks
 



Re: How practical is it to add a timestamp oracle on Zookeeper

2013-04-26 Thread Enis Söztutar
Hi,

I presume you have read the percolator paper. The design there uses a
single ts oracle, and BigTable itself as the transaction manager. In omid,
they also have a TS oracle, but I do not know how scalable it is. But using
ZK as the TS oracle would not work, since ZK can scale up to 40-50K
requests per second, but depending on the cluster size, you should be
getting much more than that. Especially considering all clients doing reads
and writes has to obtain a TS. Instead what you want is a TS that can scale
to millions of requests per sec. This can be achieved by the technique in
the percolator paper, by pre allocating a range by persisting to disk, and
an extremely lightweight rpc. I do not know whether Omid provides this.
There is a twitter project https://github.com/twitter/snowflake that you
might want to look at.

Hope this helps.

Enis


On Sun, Apr 21, 2013 at 9:36 AM, Michel Segel michael_se...@hotmail.comwrote:

 Time is relative.
 What does the timestamp mean?

 Sounds like a simple question, but its not. Is it the time your
 application says they wrote to HBase? Is it the time HBase first gets the
 row? Or is it the time that the row was written to the memstore?

 Each RS has its own clock in addition to your app server.


 Sent from a remote device. Please excuse any typos...

 Mike Segel

 On Apr 16, 2013, at 7:14 AM, yun peng pengyunm...@gmail.com wrote:

  Hi, All,
  I'd like to add a global timestamp oracle on Zookeep to assign globally
  unique timestamp for each Put/Get issued from HBase cluster. The reason I
  put it on Zookeeper is that each Put/Get needs to go through it and
 unique
  timestamp needs some global centralised facility to do it. But I am
 asking
  how practical is this scheme, like anyone used in practice?
 
  Also, how difficulty is it to extend Zookeeper, or to inject code to the
  code path of HBase inside Zookeeper. I know HBase has Coprocessor on
 region
  server to let programmer to extend without recompiling HBase itself. Does
  Zk allow such extensibility? Thanks.
 
  Regards
  Yun



Re: HBase is not running.

2013-04-26 Thread Yves S. Garret
Hi, but I don't understand what you mean.  Did I miss a step
in the tutorial?


On Fri, Apr 26, 2013 at 4:26 PM, Leonid Fedotov lfedo...@hortonworks.comwrote:

 Looks like your zookeeper configuration is incorrect in HBase.

 Check it out.

 Thank you!

 Sincerely,
 Leonid Fedotov
 Technical Support Engineer

 On Apr 26, 2013, at 9:59 AM, Yves S. Garret wrote:

  Hi, thanks for your reply.
 
  I did [ hostname ] in my linux OS and this is what I have for a
  hostname [ ysg.connect ].
 
  This is how my hosts file looks like.
  127.0.0.1   localhost localhost.localdomain localhost4
  localhost4.localdomain4
  127.0.0.1   localhost
  192.168.1.6 ysg.connect
  ::1 localhost localhost.localdomain localhost6
  localhost6.localdomain6
 
  Now, I fired up the shell and this is the result that I got when I
  tried to execute [ create 'test', 'cf' ].  This is the error that I got:
  http://bin.cakephp.org/view/1016732333
 
  The weird thing is that after starting the shell, executing that
  command, having that command error out and keep going and
  then exiting the command, I checked the logs and... nothing
  was displayed.  It's as if nothing was stored.
 
 
  On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Yves,
 
  You need to add an entry with your host name and your local IP.
 
  As an example, here is mine:
 
  127.0.0.1   localhost
  192.168.23.2buldo
 
  My host name is buldo.
 
  JM
 
  2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  Hi Jean, this is my /etc/hosts.
 
  127.0.0.1   localhost localhost.localdomain localhost4
  localhost4.localdomain4
  127.0.0.1   localhost
  ::1 localhost localhost.localdomain localhost6
  localhost6.localdomain6
 
 
  On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Yves,
 
  You seems to have some network configuration issue with your
  installation.
 
  java.net.BindException: Cannot assign requested address and
  ip72-215-225-9.at.at.cox.net/72.215.225.9:0
 
  How is your host file configured? You need to have your host name
  pointing to you local IP (and not 127.0.0.1).
 
  2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  My mistake.  I thought I had all of those logs.  This is what I
  currently
  have:
  http://bin.cakephp.org/view/2112130549
 
  I have $JAVA_HOME set to this:
  /usr/java/jdk1.7.0_17
  I have extracted 0.94 and ran bin/start-hbase.sh
 
  Thanks for your help!
 
 
 
  On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Hi Mohammad,
 
  He is running standalone, so no need to update the zookeeper qorum
  yet.
 
  Yes, can you share the entire hbase-ysg-master-ysg.connect.log file?
  Not just the first lines. Or what you sent is already all?
 
  So what have you done yet? Downloaded 0.94, extracted it, setup the
  JAVA_HOME and ran bin/start-hbase.sh ?
 
  JMS
 
  2013/4/25 Mohammad Tariq donta...@gmail.com:
  Hello Yves,
 
The log seems to be incomplete. Could you please the
  complete
  logs?Have you set the hbase.zookeeper.quorum property properly?Is
  your
  Hadoop running fine?
 
  Warm Regards,
  Tariq
  https://mtariq.jux.com/
  cloudfront.blogspot.com
 
 
  On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret
  yoursurrogate...@gmail.comwrote:
 
  Hi again.  I have 3 log files and only one of them had anything in
  them,
  here are the file names.  I'm assuming that you're talking about
  the
  directory ${APACHE_HBASE_HOME}/logs, yes?
 
  Here are the file names:
  -rw-rw-r--. 1 user user 12465 Apr 25 14:54
  hbase-ysg-master-ysg.connect.log
  -rw-rw-r--. 1 user user 0 Apr 25 14:54
  hbase-ysg-master-ysg.connect.out
  -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit
 
  Also, to answer your question about the UI, I tried that URL (I'm
  doing
  all
  of this on my laptop just to learn at the moment) and neither the
  URL
  nor
  localhost:60010 worked.  So, the answer to your question is that
  the
  UI
  is
  not showing up.  This could be due to not being far along in the
  tutorial,
  perhaps?
 
  Thanks again!
 
 
  On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  There is no stupid question ;)
 
  Are the log truncated? Anything else after that? Or that's all
  what
  you
  have?
 
  For the UI, you can access it with
  http://192.168.X.X:60010/master-status
 
  Replace the X with your own IP. You should see some information
  about
  your HBase cluster (even in Standalone mode).
 
  JMS
 
  2013/4/25 Yves S. Garret yoursurrogate...@gmail.com:
  Here are the logs, what should I be looking for?  Seems like
  everything
  is fine for the moment, no?
 
  http://bin.cakephp.org/view/2144893539
 
  The web UI?  What do you mean?  Sorry if this is a stupid
  question,
  I'm
  a Hadoop newb.
 
  On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org wrote:
 
  Before trying the shell, can you 

Re: Dual Hadoop/HBase configuration through same client

2013-04-26 Thread Shahab Yunus
Thanks Ted for the response. But the issue is that I want to read from one
cluster and write to another. If I will have to clients then how will they
communicate with each other? Essentially what am I trying to do here is
intra-cluster data copy/exchange. Any other ideas or suggestions? Even if
both servers have no security or one has Kerberos or both have
authentication how to exchange data between them?

I was actually not expecting that I cannot load multiple Hadoop or HBase
configurations in 2 different Configuration objects in one application.
As mentioned I have tried overwriting properties as well but
security/authentication properties are overwritten somehow.

Regards,
Shahab


On Fri, Apr 26, 2013 at 7:43 PM, Ted Yu yuzhih...@gmail.com wrote:

 Looks like the easiest solution is to use separate clients, one for each
 cluster you want to connect to.

 Cheers

 On Sat, Apr 27, 2013 at 6:51 AM, Shahab Yunus shahab.yu...@gmail.com
 wrote:

  Hello,
 
  This is a follow-up to my previous post a few days back. I am trying to
  connect to 2 different Hadoop clusters' setups through a same client but
 I
  am running into the issue that the config of one overwrites the other.
 
  The scenario is that I want to read data from an HBase table from one
  cluster and write it as a file on HDFS on the other. Individually, if I
 try
  to write to them they both work but when I try this through a same Java
  client, they fail.
 
  I have tried loading the core-site.xml through addResource method of the
  Configuration class but only the first found config file is picked? I
 have
  also tried by renaming the config files and then adding them as a
 resource
  (again through the addResource method).
 
  The situation is compounded by the fact that one cluster is using
 Kerberos
  authentication and the other is not? If the Kerberos server's file is
 found
  first then authentication failures are faced for the other server when
  Hadoop tries to find client authentication information. If the 'simple'
  cluster's config is loaded first then 'Authentication is Required' error
 is
  encountered against the Kerberos server.
 
  I will gladly provide more information. Is it even possible even if let
 us
  say both servers have same security configuration or none? Any ideas?
  Thanks a million.
 
  Regards,
  Shahab