How do you check that? -- Marco Gallotta | Mountain View, California Software Engineer, Infrastructure | Loki Studios fb.me/marco.gallotta | twitter.com/marcog [email protected] | +1 (650) 417-3313
Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday 10 August 2012 at 4:44 PM, Mohammad Tariq wrote: > This is pretty strange. I mean everything seems to be in place, but we > are stuck. Please make a check once if your Hdfs is in safemode. > > Regards, > Mohammad Tariq > > > On Sat, Aug 11, 2012 at 5:13 AM, Mohammad Tariq <[email protected] > (mailto:[email protected])> wrote: > > What about fs.default.name (http://fs.default.name)????? > > > > Regards, > > Mohammad Tariq > > > > > > On Sat, Aug 11, 2012 at 5:10 AM, Marco Gallotta <[email protected] > > (mailto:[email protected])> wrote: > > > It's in /var which is persistent across reboots. > > > > > > -- > > > Marco Gallotta | Mountain View, California > > > Software Engineer, Infrastructure | Loki Studios > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) | twitter.com/marcog > > > (http://twitter.com/marcog) > > > [email protected] (mailto:[email protected]) | +1 (650) 417-3313 > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > On Friday 10 August 2012 at 4:31 PM, anil gupta wrote: > > > > > > > Where are you storing your hdfs data? Is it /tmp? If it's /tmp and you > > > > have > > > > rebooted your machined then you will have problems. > > > > > > > > On Fri, Aug 10, 2012 at 4:19 PM, Marco Gallotta <[email protected] > > > > (mailto:[email protected])>wrote: > > > > > > > > > It's a pseudo-distributed cluster, as I plan to add more nodes as we > > > > > start > > > > > gathering more data. > > > > > > > > > > I get the following error when running hbck -repair, and then it > > > > > stalls: > > > > > > > > > > 12/08/10 16:17:27 INFO util.HBaseFsck: Sleeping 10000ms before > > > > > re-checking > > > > > after fix... > > > > > Version: 0.94.0 > > > > > 12/08/10 16:17:37 INFO util.HBaseFsck: Loading regioninfos HDFS > > > > > 12/08/10 16:17:37 INFO util.HBaseFsck: Loading HBase regioninfo from > > > > > HDFS... > > > > > Exception in thread "main" > > > > > java.util.concurrent.RejectedExecutionException > > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1956) > > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816) > > > > > at > > > > > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337) > > > > > at > > > > > org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionDirs(HBaseFsck.java:1059) > > > > > at > > > > > org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:504) > > > > > at > > > > > org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:304) > > > > > at > > > > > org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:377) > > > > > at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3139) > > > > > > > > > > > > > > > > > > > > -- > > > > > Marco Gallotta | Mountain View, California > > > > > Software Engineer, Infrastructure | Loki Studios > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) | > > > > > twitter.com/marcog (http://twitter.com/marcog) > > > > > [email protected] (mailto:[email protected]) | +1 (650) 417-3313 > > > > > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > > > > > > > On Friday 10 August 2012 at 4:09 PM, anil gupta wrote: > > > > > > > > > > > Is it a standalone installation or pseudo-distributed? > > > > > > I faced a similar problem a few days back in a distributed cluster > > > > > > and > > > > > > > > > > > > > > > > > > > > > used > > > > > > hbck -repair option. You might give it a try. > > > > > > > > > > > > ~Anil > > > > > > > > > > > > On Fri, Aug 10, 2012 at 3:39 PM, Mohammad Tariq <[email protected] > > > > > > (mailto:[email protected])(mailto: > > > > > [email protected] (mailto:[email protected]))> wrote: > > > > > > > > > > > > > Could you please share your /etc/hosts file??Meantime, do a manual > > > > > > > compaction and see if ti works. > > > > > > > > > > > > > > Regards, > > > > > > > Mohammad Tariq > > > > > > > > > > > > > > > > > > > > > On Sat, Aug 11, 2012 at 4:07 AM, Marco Gallotta > > > > > > > <[email protected] (mailto:[email protected])(mailto: > > > > > [email protected] (mailto:[email protected]))> > > > > > > > wrote: > > > > > > > > It's not a distributed cluster. I'm not processing enough data > > > > > > > > yet. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So > > > > > > > > > > > > > > the reference to localhost is correct. > > > > > > > > > > > > > > > > -- > > > > > > > > Marco Gallotta | Mountain View, California > > > > > > > > Software Engineer, Infrastructure | Loki Studios > > > > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > twitter.com/marcog (http://twitter.com/marcog) > > > > > > > > [email protected] (mailto:[email protected]) | +1 (650) > > > > > > > > > > > > > > > > > > > > > > > > > > > > 417-3313 > > > > > > > > > > > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > > > > > > > > > > > > > > > > On Friday 10 August 2012 at 3:30 PM, anil gupta wrote: > > > > > > > > > > > > > > > > > Are you running a distributed cluster? > > > > > > > > > If yes, do you have localhost in /etc/hosts file? > > > > > > > > > > > > > > > > > > You are getting reference to localhost in hbck output: > > > > > > > > > ERROR: Region { meta => null, hdfs => > > > > > > > > > hdfs://localhost:9000/hbase/test2/b0d4a5f294809c94fccb3d4ce10c3b23, > > > > > > > > > deployed => } on HDFS, but not listed in META or deployed on > > > > > > > > > any > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > region > > > > > > > > > server > > > > > > > > > > > > > > > > > > ~Anil > > > > > > > > > > > > > > > > > > On Fri, Aug 10, 2012 at 3:08 PM, Marco Gallotta < > > > > > [email protected] (mailto:[email protected])(mailto: > > > > > > > [email protected] (mailto:[email protected]))>wrote: > > > > > > > > > > > > > > > > > > > Here's the output from hbck -details: > > > > > http://pastebin.com/ZxVZEctY > > > > > > > > > > > > > > > > > > > > Extract: > > > > > > > > > > > > > > > > > > > > 6 inconsistencies detected. > > > > > > > > > > Status: INCONSISTENT > > > > > > > > > > > > > > > > > > > > 6 is the number of tables that appear in "list" but cannot > > > > > > > > > > be > > > > > > > operated on > > > > > > > > > > (which btw, includes not being able to run disable/drop on > > > > > > > > > > them - > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > both ops > > > > > > > > > > say table not found). I also just noticed "foo" does not > > > > > > > > > > occur > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > table > > > > > > > > > > list, although I did create it at one point but was able to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > clear it > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > from > > > > > > > > > > .META. when it also was reporting table not found when > > > > > > > > > > trying to > > > > > > > > > > disable/drop it. All these come from when I ^C'ed (i.e. > > > > > > > > > > killed) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > table > > > > > > > > > > creation when I was trying to get lzo compression working > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > table > > > > > > > > > > creation was hanging. > > > > > > > > > > > > > > > > > > > > Is there any way to repair this? I see hbck has repair > > > > > > > > > > options, > > > > > but I > > > > > > > want > > > > > > > > > > to proceed with caution. > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Marco Gallotta | Mountain View, California > > > > > > > > > > Software Engineer, Infrastructure | Loki Studios > > > > > > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > twitter.com/marcog (http://twitter.com/marcog) > > > > > > > > > > [email protected] (mailto:[email protected]) | +1 > > > > > > > > > > (650) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 417-3313 > > > > > > > > > > > > > > > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Friday 10 August 2012 at 2:49 PM, anil gupta wrote: > > > > > > > > > > > > > > > > > > > > > Hi Marco, > > > > > > > > > > > > > > > > > > > > > > Did anything disastrous happen to cluster? > > > > > > > > > > > Can you try using hbck utility of HBase. > > > > > > > > > > > Run: 'hbase hbck -help' to get all the available options. > > > > > > > > > > > > > > > > > > > > > > ~Anil > > > > > > > > > > > > > > > > > > > > > > On Fri, Aug 10, 2012 at 2:22 PM, Marco Gallotta < > > > > > > > [email protected] (mailto:[email protected])(mailto: > > > > > > > > > > [email protected] (mailto:[email protected]))>wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi there > > > > > > > > > > > > > > > > > > > > > > > > I have a few tables which show up in a "list" in the > > > > > > > > > > > > shell, > > > > > but > > > > > > > produce > > > > > > > > > > > > "table not found" when performing any operation on them. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > no > > > > > > > > > > > > reference of them in the .META. table. It seems to be > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > resulting in > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > some of > > > > > > > > > > > > the hbase services being killed every so often. > > > > > > > > > > > > > > > > > > > > > > > > Here are some logs from master (foo is one of the > > > > > > > > > > > > tables not > > > > > > > found): > > > > > > > > > > > > > > > > > > > > > > > > 2012-08-09 20:40:44,301 FATAL > > > > > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > > > > > > Master server abort: loaded coprocessors are: [] > > > > > > > > > > > > 2012-08-09 20:40:44,301 FATAL > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.HMaster: > > > > > > > > > > > > Unexpected state : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > foo,,1343175078663.527bb34f4bb5e40dd42e82054d7c5485. > > > > > > > > > > > > state=PENDING_OPEN, ts=1344570044277, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > server=ip-10-170-150-10.us-west-1.compute.internal,60020,1344559455110 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > .. > > > > > > > > > > > > Cannot transit it to OFFLINE. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There are also a number of the following types of error > > > > > > > > > > > > logs: > > > > > > > > > > > > > > > > > > > > > > > > 2012-08-09 20:10:04,308 ERROR > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Failed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > assignment in: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ip-10-170-150-10.us-west-1.compute.internal,60020,1344559455110 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > due to > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException: > > > > > > > > > > > > Received:OPEN for the > > > > > > > > > > > > region:foo,,1343175078663.527bb34f4bb5e40dd42e82054d7c5485. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ,which we > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > are > > > > > > > > > > > > already trying to OPEN. > > > > > > > > > > > > > > > > > > > > > > > > Any ideas how to find and remove any references to these > > > > > > > non-existent > > > > > > > > > > > > tables? > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Marco Gallotta | Mountain View, California > > > > > > > > > > > > Software Engineer, Infrastructure | Loki Studios > > > > > > > > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > twitter.com/marcog (http://twitter.com/marcog) > > > > > > > > > > > > [email protected] (mailto:[email protected]) | +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (650) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 417-3313 > > > > > > > > > > > > > > > > > > > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Thanks & Regards, > > > > > > > > > > > Anil Gupta > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Thanks & Regards, > > > > > > > > > Anil Gupta > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Thanks & Regards, > > > > > > Anil Gupta > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks & Regards, > > > > Anil Gupta > > > > > > > > > > > > > > > > >
