Re: HBase is not running.
Hi Yves, You need to add an entry with your host name and your local IP. As an example, here is mine: 127.0.0.1 localhost 192.168.23.2buldo My host name is buldo. JM 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Hi Jean, this is my /etc/hosts. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You seems to have some network configuration issue with your installation. java.net.BindException: Cannot assign requested address and ip72-215-225-9.at.at.cox.net/72.215.225.9:0 How is your host file configured? You need to have your host name pointing to you local IP (and not 127.0.0.1). 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: My mistake. I thought I had all of those logs. This is what I currently have: http://bin.cakephp.org/view/2112130549 I have $JAVA_HOME set to this: /usr/java/jdk1.7.0_17 I have extracted 0.94 and ran bin/start-hbase.sh Thanks for your help! On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohammad, He is running standalone, so no need to update the zookeeper qorum yet. Yes, can you share the entire hbase-ysg-master-ysg.connect.log file? Not just the first lines. Or what you sent is already all? So what have you done yet? Downloaded 0.94, extracted it, setup the JAVA_HOME and ran bin/start-hbase.sh ? JMS 2013/4/25 Mohammad Tariq donta...@gmail.com: Hello Yves, The log seems to be incomplete. Could you please the complete logs?Have you set the hbase.zookeeper.quorum property properly?Is your Hadoop running fine? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi again. I have 3 log files and only one of them had anything in them, here are the file names. I'm assuming that you're talking about the directory ${APACHE_HBASE_HOME}/logs, yes? Here are the file names: -rw-rw-r--. 1 user user 12465 Apr 25 14:54 hbase-ysg-master-ysg.connect.log -rw-rw-r--. 1 user user 0 Apr 25 14:54 hbase-ysg-master-ysg.connect.out -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit Also, to answer your question about the UI, I tried that URL (I'm doing all of this on my laptop just to learn at the moment) and neither the URL nor localhost:60010 worked. So, the answer to your question is that the UI is not showing up. This could be due to not being far along in the tutorial, perhaps? Thanks again! On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: There is no stupid question ;) Are the log truncated? Anything else after that? Or that's all what you have? For the UI, you can access it with http://192.168.X.X:60010/master-status Replace the X with your own IP. You should see some information about your HBase cluster (even in Standalone mode). JMS 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Here are the logs, what should I be looking for? Seems like everything is fine for the moment, no? http://bin.cakephp.org/view/2144893539 The web UI? What do you mean? Sorry if this is a stupid question, I'm a Hadoop newb. On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Before trying the shell, can you look at the server logs and see if everything is fine? Also, is the web UI working fine? 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Ok, spoke too soon :) . I ran this command [ create 'test', 'cf' ] and this is the result that I got: http://bin.cakephp.org/view/168926019 This is after running helpenter and having this run just fine. On Thu, Apr 25, 2013 at 1:23 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, 0.95.0 is a developer version. If you are starting with HBase, I will recommend you to choose a more stable version like 0.94.6.1. Regarding the 3 choices you are listing below. 1) This one is HBase 0.95 running over Hadoop 1.0 2) This one is HBase 0.95 running over Hadoop 2.0 3) This one are the HBase source classes. Again, I think you are better to go with a stable version for the first steps: http://www.bizdirusa.com/mirrors/apache/hbase/stable/ Would you mind to retry you tests with this version and let me know if it's working better? JM 2013/4/25 Yves S.
Re: undefined method `internal_command' for Shell::Formatter::Console
Ok thanks for the clarification. I tried that (removing ruby but not yet re-installing it) and I got the same error message. On Thu, Apr 25, 2013 at 5:02 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Robin, No, the idea is to run yum remove, and then test the HBase sheel. Don't run yum install ruby until we get that fixed. I want to see if your installed very of Ruby can cause the issue. The it was refering to the Ruby package. JM 2013/4/25 Robin Gowin landr...@gmail.com: To be more explicit: I'm running CentOS release 6.4 in a vm on Mac OSx 10.6 I ran yum remove ruby and then yum install ruby (inside the vm). Is that what you meant? Also I put in some simple print statements in several of the ruby scripts called by the hbase shell, and they are getting executed. (for example: admin.rb, hbase.rb, and table.rb) (I wasn't sure what it referred to in your email) Robin On Thu, Apr 25, 2013 at 3:58 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: No, don't re-install it ;) Remove it and retry. To make sure it's not using any lib anywhere else... JM 2013/4/25 Robin Gowin landr...@gmail.com: I removed ruby and reinstalled it; same results. On Thu, Apr 25, 2013 at 11:59 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Is it easy for you to de-install it and re-install it? If so, would you mind giving it a try?
Re: HBase is not running.
Hi, thanks for your reply. I did [ hostname ] in my linux OS and this is what I have for a hostname [ ysg.connect ]. This is how my hosts file looks like. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost 192.168.1.6 ysg.connect ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Now, I fired up the shell and this is the result that I got when I tried to execute [ create 'test', 'cf' ]. This is the error that I got: http://bin.cakephp.org/view/1016732333 The weird thing is that after starting the shell, executing that command, having that command error out and keep going and then exiting the command, I checked the logs and... nothing was displayed. It's as if nothing was stored. On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You need to add an entry with your host name and your local IP. As an example, here is mine: 127.0.0.1 localhost 192.168.23.2buldo My host name is buldo. JM 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Hi Jean, this is my /etc/hosts. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You seems to have some network configuration issue with your installation. java.net.BindException: Cannot assign requested address and ip72-215-225-9.at.at.cox.net/72.215.225.9:0 How is your host file configured? You need to have your host name pointing to you local IP (and not 127.0.0.1). 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: My mistake. I thought I had all of those logs. This is what I currently have: http://bin.cakephp.org/view/2112130549 I have $JAVA_HOME set to this: /usr/java/jdk1.7.0_17 I have extracted 0.94 and ran bin/start-hbase.sh Thanks for your help! On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohammad, He is running standalone, so no need to update the zookeeper qorum yet. Yes, can you share the entire hbase-ysg-master-ysg.connect.log file? Not just the first lines. Or what you sent is already all? So what have you done yet? Downloaded 0.94, extracted it, setup the JAVA_HOME and ran bin/start-hbase.sh ? JMS 2013/4/25 Mohammad Tariq donta...@gmail.com: Hello Yves, The log seems to be incomplete. Could you please the complete logs?Have you set the hbase.zookeeper.quorum property properly?Is your Hadoop running fine? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi again. I have 3 log files and only one of them had anything in them, here are the file names. I'm assuming that you're talking about the directory ${APACHE_HBASE_HOME}/logs, yes? Here are the file names: -rw-rw-r--. 1 user user 12465 Apr 25 14:54 hbase-ysg-master-ysg.connect.log -rw-rw-r--. 1 user user 0 Apr 25 14:54 hbase-ysg-master-ysg.connect.out -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit Also, to answer your question about the UI, I tried that URL (I'm doing all of this on my laptop just to learn at the moment) and neither the URL nor localhost:60010 worked. So, the answer to your question is that the UI is not showing up. This could be due to not being far along in the tutorial, perhaps? Thanks again! On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: There is no stupid question ;) Are the log truncated? Anything else after that? Or that's all what you have? For the UI, you can access it with http://192.168.X.X:60010/master-status Replace the X with your own IP. You should see some information about your HBase cluster (even in Standalone mode). JMS 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Here are the logs, what should I be looking for? Seems like everything is fine for the moment, no? http://bin.cakephp.org/view/2144893539 The web UI? What do you mean? Sorry if this is a stupid question, I'm a Hadoop newb. On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Before trying the shell, can you look at the server logs and see if everything is fine? Also, is the web UI working fine? 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Ok, spoke too soon :) . I ran this
Schema Design Question
Hi I am new to HBase, I have been trying to POC an application and have a design questions. Currently we have a single table with the following key design jobId_batchId_bundleId_uniquefileId This is an offline processing system so data would be bulk loaded into HBase via map/reduce jobs. We only need to support report generation queries using map/reduce over a batch (And possibly a single column filter) with the batchId as the start/end scan key. Once we have finished processing a job we are free to remove the data from HBase. We have varied workloads so a job could be made up of 10 rows, 100,000 rows or 1 billion rows with the average falling somewhere around 10 million rows. My question is related to pre-splitting. If we have a billion rows all with the same batchId (Our map/reduce scan key) my understanding is we should perform pre-splitting to create buckets hosted by different regions. If a jobs workload can be so varied would it make sense to have a single table containing all jobs? Or should we create 1 table per job and pre-split the table for the given workload? If we had separate table we could drop them when no longer needed. If we didn't have a separate table per job how should we perform splitting? Should we choose our largest possible workload and split for that? even though 90% of our jobs would fall in the lower bound in terms of row count. Would we experience any issue purging jobs of varying sizes if everything was in a single table? any advice would be greatly appreciated. Thanks
Re: HBase is not running.
Looks like your zookeeper configuration is incorrect in HBase. Check it out. Thank you! Sincerely, Leonid Fedotov Technical Support Engineer On Apr 26, 2013, at 9:59 AM, Yves S. Garret wrote: Hi, thanks for your reply. I did [ hostname ] in my linux OS and this is what I have for a hostname [ ysg.connect ]. This is how my hosts file looks like. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost 192.168.1.6 ysg.connect ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Now, I fired up the shell and this is the result that I got when I tried to execute [ create 'test', 'cf' ]. This is the error that I got: http://bin.cakephp.org/view/1016732333 The weird thing is that after starting the shell, executing that command, having that command error out and keep going and then exiting the command, I checked the logs and... nothing was displayed. It's as if nothing was stored. On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You need to add an entry with your host name and your local IP. As an example, here is mine: 127.0.0.1 localhost 192.168.23.2buldo My host name is buldo. JM 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Hi Jean, this is my /etc/hosts. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You seems to have some network configuration issue with your installation. java.net.BindException: Cannot assign requested address and ip72-215-225-9.at.at.cox.net/72.215.225.9:0 How is your host file configured? You need to have your host name pointing to you local IP (and not 127.0.0.1). 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: My mistake. I thought I had all of those logs. This is what I currently have: http://bin.cakephp.org/view/2112130549 I have $JAVA_HOME set to this: /usr/java/jdk1.7.0_17 I have extracted 0.94 and ran bin/start-hbase.sh Thanks for your help! On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohammad, He is running standalone, so no need to update the zookeeper qorum yet. Yes, can you share the entire hbase-ysg-master-ysg.connect.log file? Not just the first lines. Or what you sent is already all? So what have you done yet? Downloaded 0.94, extracted it, setup the JAVA_HOME and ran bin/start-hbase.sh ? JMS 2013/4/25 Mohammad Tariq donta...@gmail.com: Hello Yves, The log seems to be incomplete. Could you please the complete logs?Have you set the hbase.zookeeper.quorum property properly?Is your Hadoop running fine? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi again. I have 3 log files and only one of them had anything in them, here are the file names. I'm assuming that you're talking about the directory ${APACHE_HBASE_HOME}/logs, yes? Here are the file names: -rw-rw-r--. 1 user user 12465 Apr 25 14:54 hbase-ysg-master-ysg.connect.log -rw-rw-r--. 1 user user 0 Apr 25 14:54 hbase-ysg-master-ysg.connect.out -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit Also, to answer your question about the UI, I tried that URL (I'm doing all of this on my laptop just to learn at the moment) and neither the URL nor localhost:60010 worked. So, the answer to your question is that the UI is not showing up. This could be due to not being far along in the tutorial, perhaps? Thanks again! On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: There is no stupid question ;) Are the log truncated? Anything else after that? Or that's all what you have? For the UI, you can access it with http://192.168.X.X:60010/master-status Replace the X with your own IP. You should see some information about your HBase cluster (even in Standalone mode). JMS 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Here are the logs, what should I be looking for? Seems like everything is fine for the moment, no? http://bin.cakephp.org/view/2144893539 The web UI? What do you mean? Sorry if this is a stupid question, I'm a Hadoop newb. On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Before trying the shell, can you look at the server logs and see if everything is fine? Also, is the web UI working fine? 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Ok, spoke too soon :) . I ran this command [ create 'test', 'cf' ] and this is the result that I got:
Re: Snapshot Export Problem
Hi Jon, I've actually discovered another issue with snapshot export. If you have a region that has recently split and you take a snapshot of that table and try to export it while the children still have references to the files in the split parent, the files will not be transferred and will be counted in the missing total. You end with error messages like: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HLogLink Please let me know if you would like any additional information. Thanks and have a great day, Sean On Wednesday, 24 April, 2013 at 9:19 AM, Sean MacDonald wrote: Hi Jon, No problem. We do have snapshots enabled on the target cluster, and we are using the default hfile archiver settings on both clusters. Thanks, Sean On Tuesday, 23 April, 2013 at 1:54 PM, Jonathan Hsieh wrote: Sean, Thanks for finding this problem. Can you provide some more information so that we can try to duplicate and fix this problem? Are snapshots on on the target cluster? What are the hfile archiver settings in your hbase-site.xml on both clusters? Thanks, Jon. On Mon, Apr 22, 2013 at 4:47 PM, Sean MacDonald s...@opendns.com (mailto:s...@opendns.com) wrote: It looks like you can't export a snapshot to a running cluster or it will start cleaning up files from the archive after a period of time. I have turned off HBase on the destination cluster and the export is working as expected now. Sean On Monday, 22 April, 2013 at 9:22 AM, Sean MacDonald wrote: Hello, I am using HBase 0.94.6 on CDH 4.2 and trying to export a snapshot to another cluster (also CDH 4.2), but this is failing repeatedly. The table I am trying to export is approximately 4TB in size and has 10GB regions. Each of the map jobs runs for about 6 minutes and appears to be running properly, but then fails with a message like the following: 2013-04-22 16:12:50,699 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.archive/queries/533fcbb7858ef34b103a4f8804fa8719/d/651e974dafb64eefb9c49032aec4a35b File does not exist. Holder DFSClient_NONMAPREDUCE_-192704511_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtoc ol $2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) I was able to see the file that the LeaseExpiredException mentions on the destination cluster before the exception happened (it is gone afterwards). Any help that could be provided in resolving this would be greatly appreciated. Thanks and have a great day, Sean -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // j...@cloudera.com (mailto:j...@cloudera.com)
Re: Snapshot Export Problem
Hey Sean, could you provide us the full stack trace of the FileNotFoundException Unable to open link and also the output of: hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -files -stats -snapshot SNAPSHOT_NAME to give us a better idea of what is the state of the snapshot Thanks! On Fri, Apr 26, 2013 at 9:51 PM, Sean MacDonald s...@opendns.com wrote: Hi Jon, I've actually discovered another issue with snapshot export. If you have a region that has recently split and you take a snapshot of that table and try to export it while the children still have references to the files in the split parent, the files will not be transferred and will be counted in the missing total. You end with error messages like: java.io.FileNotFoundException: Unable to open link: org.apache.hadoop.hbase.io.HLogLink Please let me know if you would like any additional information. Thanks and have a great day, Sean On Wednesday, 24 April, 2013 at 9:19 AM, Sean MacDonald wrote: Hi Jon, No problem. We do have snapshots enabled on the target cluster, and we are using the default hfile archiver settings on both clusters. Thanks, Sean On Tuesday, 23 April, 2013 at 1:54 PM, Jonathan Hsieh wrote: Sean, Thanks for finding this problem. Can you provide some more information so that we can try to duplicate and fix this problem? Are snapshots on on the target cluster? What are the hfile archiver settings in your hbase-site.xml on both clusters? Thanks, Jon. On Mon, Apr 22, 2013 at 4:47 PM, Sean MacDonald s...@opendns.com(mailto: s...@opendns.com) wrote: It looks like you can't export a snapshot to a running cluster or it will start cleaning up files from the archive after a period of time. I have turned off HBase on the destination cluster and the export is working as expected now. Sean On Monday, 22 April, 2013 at 9:22 AM, Sean MacDonald wrote: Hello, I am using HBase 0.94.6 on CDH 4.2 and trying to export a snapshot to another cluster (also CDH 4.2), but this is failing repeatedly. The table I am trying to export is approximately 4TB in size and has 10GB regions. Each of the map jobs runs for about 6 minutes and appears to be running properly, but then fails with a message like the following: 2013-04-22 16:12:50,699 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /hbase/.archive/queries/533fcbb7858ef34b103a4f8804fa8719/d/651e974dafb64eefb9c49032aec4a35b File does not exist. Holder DFSClient_NONMAPREDUCE_-192704511_1 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtoc ol $2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689) I was able to see the file that the LeaseExpiredException mentions on the destination cluster before the exception happened (it is gone afterwards). Any help that could be provided in resolving this would be greatly appreciated. Thanks and have a great day, Sean -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // j...@cloudera.com (mailto:j...@cloudera.com)
Re: Schema Design Question
My understanding of your use case is that data for different jobIds would be continuously loaded into the underlying table(s). Looks like you can have one table per job. This way you drop the table after map reduce is complete. In the single table approach, you would delete many rows in the table which is not as fast as dropping the separate table. Cheers On Sat, Apr 27, 2013 at 3:49 AM, Cameron Gandevia cgande...@gmail.comwrote: Hi I am new to HBase, I have been trying to POC an application and have a design questions. Currently we have a single table with the following key design jobId_batchId_bundleId_uniquefileId This is an offline processing system so data would be bulk loaded into HBase via map/reduce jobs. We only need to support report generation queries using map/reduce over a batch (And possibly a single column filter) with the batchId as the start/end scan key. Once we have finished processing a job we are free to remove the data from HBase. We have varied workloads so a job could be made up of 10 rows, 100,000 rows or 1 billion rows with the average falling somewhere around 10 million rows. My question is related to pre-splitting. If we have a billion rows all with the same batchId (Our map/reduce scan key) my understanding is we should perform pre-splitting to create buckets hosted by different regions. If a jobs workload can be so varied would it make sense to have a single table containing all jobs? Or should we create 1 table per job and pre-split the table for the given workload? If we had separate table we could drop them when no longer needed. If we didn't have a separate table per job how should we perform splitting? Should we choose our largest possible workload and split for that? even though 90% of our jobs would fall in the lower bound in terms of row count. Would we experience any issue purging jobs of varying sizes if everything was in a single table? any advice would be greatly appreciated. Thanks
Re: Dual Hadoop/HBase configuration through same client
Looks like the easiest solution is to use separate clients, one for each cluster you want to connect to. Cheers On Sat, Apr 27, 2013 at 6:51 AM, Shahab Yunus shahab.yu...@gmail.comwrote: Hello, This is a follow-up to my previous post a few days back. I am trying to connect to 2 different Hadoop clusters' setups through a same client but I am running into the issue that the config of one overwrites the other. The scenario is that I want to read data from an HBase table from one cluster and write it as a file on HDFS on the other. Individually, if I try to write to them they both work but when I try this through a same Java client, they fail. I have tried loading the core-site.xml through addResource method of the Configuration class but only the first found config file is picked? I have also tried by renaming the config files and then adding them as a resource (again through the addResource method). The situation is compounded by the fact that one cluster is using Kerberos authentication and the other is not? If the Kerberos server's file is found first then authentication failures are faced for the other server when Hadoop tries to find client authentication information. If the 'simple' cluster's config is loaded first then 'Authentication is Required' error is encountered against the Kerberos server. I will gladly provide more information. Is it even possible even if let us say both servers have same security configuration or none? Any ideas? Thanks a million. Regards, Shahab
Re: Schema Design Question
Hi, Interesting use case. I think it depends on job many jobId's you expect to have. If it is on the order of thousands, I would caution against going the one table per jobid approach, since for every table, there is some master overhead, as well as file structures in hdfs. If jobId's are managable, going with separate tables makes sense if you want to efficiently delete all the data related to a job. Also pre-splitting will depend on expected number of jobIds / batchIds and their ranges vs desired number of regions. You would want to keep number of regions hosted by a single region server in the low tens, thus, your splits can be across jobs or within jobs depending on cardinality. Can you share some more? Enis On Fri, Apr 26, 2013 at 2:34 PM, Ted Yu yuzhih...@gmail.com wrote: My understanding of your use case is that data for different jobIds would be continuously loaded into the underlying table(s). Looks like you can have one table per job. This way you drop the table after map reduce is complete. In the single table approach, you would delete many rows in the table which is not as fast as dropping the separate table. Cheers On Sat, Apr 27, 2013 at 3:49 AM, Cameron Gandevia cgande...@gmail.com wrote: Hi I am new to HBase, I have been trying to POC an application and have a design questions. Currently we have a single table with the following key design jobId_batchId_bundleId_uniquefileId This is an offline processing system so data would be bulk loaded into HBase via map/reduce jobs. We only need to support report generation queries using map/reduce over a batch (And possibly a single column filter) with the batchId as the start/end scan key. Once we have finished processing a job we are free to remove the data from HBase. We have varied workloads so a job could be made up of 10 rows, 100,000 rows or 1 billion rows with the average falling somewhere around 10 million rows. My question is related to pre-splitting. If we have a billion rows all with the same batchId (Our map/reduce scan key) my understanding is we should perform pre-splitting to create buckets hosted by different regions. If a jobs workload can be so varied would it make sense to have a single table containing all jobs? Or should we create 1 table per job and pre-split the table for the given workload? If we had separate table we could drop them when no longer needed. If we didn't have a separate table per job how should we perform splitting? Should we choose our largest possible workload and split for that? even though 90% of our jobs would fall in the lower bound in terms of row count. Would we experience any issue purging jobs of varying sizes if everything was in a single table? any advice would be greatly appreciated. Thanks
Re: How practical is it to add a timestamp oracle on Zookeeper
Hi, I presume you have read the percolator paper. The design there uses a single ts oracle, and BigTable itself as the transaction manager. In omid, they also have a TS oracle, but I do not know how scalable it is. But using ZK as the TS oracle would not work, since ZK can scale up to 40-50K requests per second, but depending on the cluster size, you should be getting much more than that. Especially considering all clients doing reads and writes has to obtain a TS. Instead what you want is a TS that can scale to millions of requests per sec. This can be achieved by the technique in the percolator paper, by pre allocating a range by persisting to disk, and an extremely lightweight rpc. I do not know whether Omid provides this. There is a twitter project https://github.com/twitter/snowflake that you might want to look at. Hope this helps. Enis On Sun, Apr 21, 2013 at 9:36 AM, Michel Segel michael_se...@hotmail.comwrote: Time is relative. What does the timestamp mean? Sounds like a simple question, but its not. Is it the time your application says they wrote to HBase? Is it the time HBase first gets the row? Or is it the time that the row was written to the memstore? Each RS has its own clock in addition to your app server. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 16, 2013, at 7:14 AM, yun peng pengyunm...@gmail.com wrote: Hi, All, I'd like to add a global timestamp oracle on Zookeep to assign globally unique timestamp for each Put/Get issued from HBase cluster. The reason I put it on Zookeeper is that each Put/Get needs to go through it and unique timestamp needs some global centralised facility to do it. But I am asking how practical is this scheme, like anyone used in practice? Also, how difficulty is it to extend Zookeeper, or to inject code to the code path of HBase inside Zookeeper. I know HBase has Coprocessor on region server to let programmer to extend without recompiling HBase itself. Does Zk allow such extensibility? Thanks. Regards Yun
Re: HBase is not running.
Hi, but I don't understand what you mean. Did I miss a step in the tutorial? On Fri, Apr 26, 2013 at 4:26 PM, Leonid Fedotov lfedo...@hortonworks.comwrote: Looks like your zookeeper configuration is incorrect in HBase. Check it out. Thank you! Sincerely, Leonid Fedotov Technical Support Engineer On Apr 26, 2013, at 9:59 AM, Yves S. Garret wrote: Hi, thanks for your reply. I did [ hostname ] in my linux OS and this is what I have for a hostname [ ysg.connect ]. This is how my hosts file looks like. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost 192.168.1.6 ysg.connect ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 Now, I fired up the shell and this is the result that I got when I tried to execute [ create 'test', 'cf' ]. This is the error that I got: http://bin.cakephp.org/view/1016732333 The weird thing is that after starting the shell, executing that command, having that command error out and keep going and then exiting the command, I checked the logs and... nothing was displayed. It's as if nothing was stored. On Fri, Apr 26, 2013 at 7:12 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You need to add an entry with your host name and your local IP. As an example, here is mine: 127.0.0.1 localhost 192.168.23.2buldo My host name is buldo. JM 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Hi Jean, this is my /etc/hosts. 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 127.0.0.1 localhost ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 On Thu, Apr 25, 2013 at 5:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Yves, You seems to have some network configuration issue with your installation. java.net.BindException: Cannot assign requested address and ip72-215-225-9.at.at.cox.net/72.215.225.9:0 How is your host file configured? You need to have your host name pointing to you local IP (and not 127.0.0.1). 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: My mistake. I thought I had all of those logs. This is what I currently have: http://bin.cakephp.org/view/2112130549 I have $JAVA_HOME set to this: /usr/java/jdk1.7.0_17 I have extracted 0.94 and ran bin/start-hbase.sh Thanks for your help! On Thu, Apr 25, 2013 at 4:42 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Mohammad, He is running standalone, so no need to update the zookeeper qorum yet. Yes, can you share the entire hbase-ysg-master-ysg.connect.log file? Not just the first lines. Or what you sent is already all? So what have you done yet? Downloaded 0.94, extracted it, setup the JAVA_HOME and ran bin/start-hbase.sh ? JMS 2013/4/25 Mohammad Tariq donta...@gmail.com: Hello Yves, The log seems to be incomplete. Could you please the complete logs?Have you set the hbase.zookeeper.quorum property properly?Is your Hadoop running fine? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Apr 26, 2013 at 2:00 AM, Yves S. Garret yoursurrogate...@gmail.comwrote: Hi again. I have 3 log files and only one of them had anything in them, here are the file names. I'm assuming that you're talking about the directory ${APACHE_HBASE_HOME}/logs, yes? Here are the file names: -rw-rw-r--. 1 user user 12465 Apr 25 14:54 hbase-ysg-master-ysg.connect.log -rw-rw-r--. 1 user user 0 Apr 25 14:54 hbase-ysg-master-ysg.connect.out -rw-rw-r--. 1 user user 0 Apr 25 14:54 SecurityAuth.audit Also, to answer your question about the UI, I tried that URL (I'm doing all of this on my laptop just to learn at the moment) and neither the URL nor localhost:60010 worked. So, the answer to your question is that the UI is not showing up. This could be due to not being far along in the tutorial, perhaps? Thanks again! On Thu, Apr 25, 2013 at 4:22 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: There is no stupid question ;) Are the log truncated? Anything else after that? Or that's all what you have? For the UI, you can access it with http://192.168.X.X:60010/master-status Replace the X with your own IP. You should see some information about your HBase cluster (even in Standalone mode). JMS 2013/4/25 Yves S. Garret yoursurrogate...@gmail.com: Here are the logs, what should I be looking for? Seems like everything is fine for the moment, no? http://bin.cakephp.org/view/2144893539 The web UI? What do you mean? Sorry if this is a stupid question, I'm a Hadoop newb. On Thu, Apr 25, 2013 at 3:19 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Before trying the shell, can you
Re: Dual Hadoop/HBase configuration through same client
Thanks Ted for the response. But the issue is that I want to read from one cluster and write to another. If I will have to clients then how will they communicate with each other? Essentially what am I trying to do here is intra-cluster data copy/exchange. Any other ideas or suggestions? Even if both servers have no security or one has Kerberos or both have authentication how to exchange data between them? I was actually not expecting that I cannot load multiple Hadoop or HBase configurations in 2 different Configuration objects in one application. As mentioned I have tried overwriting properties as well but security/authentication properties are overwritten somehow. Regards, Shahab On Fri, Apr 26, 2013 at 7:43 PM, Ted Yu yuzhih...@gmail.com wrote: Looks like the easiest solution is to use separate clients, one for each cluster you want to connect to. Cheers On Sat, Apr 27, 2013 at 6:51 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Hello, This is a follow-up to my previous post a few days back. I am trying to connect to 2 different Hadoop clusters' setups through a same client but I am running into the issue that the config of one overwrites the other. The scenario is that I want to read data from an HBase table from one cluster and write it as a file on HDFS on the other. Individually, if I try to write to them they both work but when I try this through a same Java client, they fail. I have tried loading the core-site.xml through addResource method of the Configuration class but only the first found config file is picked? I have also tried by renaming the config files and then adding them as a resource (again through the addResource method). The situation is compounded by the fact that one cluster is using Kerberos authentication and the other is not? If the Kerberos server's file is found first then authentication failures are faced for the other server when Hadoop tries to find client authentication information. If the 'simple' cluster's config is loaded first then 'Authentication is Required' error is encountered against the Kerberos server. I will gladly provide more information. Is it even possible even if let us say both servers have same security configuration or none? Any ideas? Thanks a million. Regards, Shahab