Have you specified "hadoop.tmp.dir" property in your core-site.xml and
"dfs.data.dir" and "dfs.name.dir" properties in your hdfs-site.xml
files??
If not you will loose all your data along with you meta information as
Anil has said.
Regards,
Mohammad Tariq
On Sat, Aug 11, 2012 at 5:01 AM, anil gupta <[email protected]> wrote:
> Where are you storing your hdfs data? Is it /tmp? If it's /tmp and you have
> rebooted your machined then you will have problems.
>
> On Fri, Aug 10, 2012 at 4:19 PM, Marco Gallotta <[email protected]>wrote:
>
>> It's a pseudo-distributed cluster, as I plan to add more nodes as we start
>> gathering more data.
>>
>> I get the following error when running hbck -repair, and then it stalls:
>>
>> 12/08/10 16:17:27 INFO util.HBaseFsck: Sleeping 10000ms before re-checking
>> after fix...
>> Version: 0.94.0
>> 12/08/10 16:17:37 INFO util.HBaseFsck: Loading regioninfos HDFS
>> 12/08/10 16:17:37 INFO util.HBaseFsck: Loading HBase regioninfo from
>> HDFS...
>> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>> at
>> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1956)
>> at
>> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:816)
>> at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1337)
>> at
>> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionDirs(HBaseFsck.java:1059)
>> at
>> org.apache.hadoop.hbase.util.HBaseFsck.restoreHdfsIntegrity(HBaseFsck.java:504)
>> at
>> org.apache.hadoop.hbase.util.HBaseFsck.offlineHdfsIntegrityRepair(HBaseFsck.java:304)
>> at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:377)
>> at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3139)
>>
>>
>>
>> --
>> Marco Gallotta | Mountain View, California
>> Software Engineer, Infrastructure | Loki Studios
>> fb.me/marco.gallotta | twitter.com/marcog
>> [email protected] | +1 (650) 417-3313
>>
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>
>>
>> On Friday 10 August 2012 at 4:09 PM, anil gupta wrote:
>>
>> > Is it a standalone installation or pseudo-distributed?
>> > I faced a similar problem a few days back in a distributed cluster and
>> used
>> > hbck -repair option. You might give it a try.
>> >
>> > ~Anil
>> >
>> > On Fri, Aug 10, 2012 at 3:39 PM, Mohammad Tariq <[email protected](mailto:
>> [email protected])> wrote:
>> >
>> > > Could you please share your /etc/hosts file??Meantime, do a manual
>> > > compaction and see if ti works.
>> > >
>> > > Regards,
>> > > Mohammad Tariq
>> > >
>> > >
>> > > On Sat, Aug 11, 2012 at 4:07 AM, Marco Gallotta
>> > > <[email protected](mailto:
>> [email protected])>
>> > > wrote:
>> > > > It's not a distributed cluster. I'm not processing enough data yet.
>> So
>> > >
>> > > the reference to localhost is correct.
>> > > >
>> > > > --
>> > > > Marco Gallotta | Mountain View, California
>> > > > Software Engineer, Infrastructure | Loki Studios
>> > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) |
>> twitter.com/marcog (http://twitter.com/marcog)
>> > > > [email protected] (mailto:[email protected]) | +1 (650)
>> 417-3313
>> > > >
>> > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> > > >
>> > > >
>> > > > On Friday 10 August 2012 at 3:30 PM, anil gupta wrote:
>> > > >
>> > > > > Are you running a distributed cluster?
>> > > > > If yes, do you have localhost in /etc/hosts file?
>> > > > >
>> > > > > You are getting reference to localhost in hbck output:
>> > > > > ERROR: Region { meta => null, hdfs =>
>> > > > > hdfs://localhost:9000/hbase/test2/b0d4a5f294809c94fccb3d4ce10c3b23,
>> > > > > deployed => } on HDFS, but not listed in META or deployed on any
>> region
>> > > > > server
>> > > > >
>> > > > > ~Anil
>> > > > >
>> > > > > On Fri, Aug 10, 2012 at 3:08 PM, Marco Gallotta <
>> [email protected] (mailto:[email protected])(mailto:
>> > > [email protected] (mailto:[email protected]))>wrote:
>> > > > >
>> > > > > > Here's the output from hbck -details:
>> http://pastebin.com/ZxVZEctY
>> > > > > >
>> > > > > > Extract:
>> > > > > >
>> > > > > > 6 inconsistencies detected.
>> > > > > > Status: INCONSISTENT
>> > > > > >
>> > > > > > 6 is the number of tables that appear in "list" but cannot be
>> > > operated on
>> > > > > > (which btw, includes not being able to run disable/drop on them -
>> > > > >
>> > > >
>> > >
>> > > both ops
>> > > > > > say table not found). I also just noticed "foo" does not occur
>> in a
>> > > > >
>> > > >
>> > >
>> > > table
>> > > > > > list, although I did create it at one point but was able to
>> clear it
>> > > > >
>> > > >
>> > >
>> > > from
>> > > > > > .META. when it also was reporting table not found when trying to
>> > > > > > disable/drop it. All these come from when I ^C'ed (i.e. killed)
>> table
>> > > > > > creation when I was trying to get lzo compression working and
>> table
>> > > > > > creation was hanging.
>> > > > > >
>> > > > > > Is there any way to repair this? I see hbck has repair options,
>> but I
>> > > want
>> > > > > > to proceed with caution.
>> > > > > >
>> > > > > > --
>> > > > > > Marco Gallotta | Mountain View, California
>> > > > > > Software Engineer, Infrastructure | Loki Studios
>> > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) |
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > twitter.com/marcog (http://twitter.com/marcog)
>> > > > > > [email protected] (mailto:[email protected]) | +1 (650)
>> > > > >
>> > > >
>> > >
>> > > 417-3313
>> > > > > >
>> > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> > > > > >
>> > > > > >
>> > > > > > On Friday 10 August 2012 at 2:49 PM, anil gupta wrote:
>> > > > > >
>> > > > > > > Hi Marco,
>> > > > > > >
>> > > > > > > Did anything disastrous happen to cluster?
>> > > > > > > Can you try using hbck utility of HBase.
>> > > > > > > Run: 'hbase hbck -help' to get all the available options.
>> > > > > > >
>> > > > > > > ~Anil
>> > > > > > >
>> > > > > > > On Fri, Aug 10, 2012 at 2:22 PM, Marco Gallotta <
>> > > [email protected] (mailto:[email protected])(mailto:
>> > > > > > [email protected] (mailto:[email protected]))>wrote:
>> > > > > > >
>> > > > > > > > Hi there
>> > > > > > > >
>> > > > > > > > I have a few tables which show up in a "list" in the shell,
>> but
>> > > produce
>> > > > > > > > "table not found" when performing any operation on them.
>> There is
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > no
>> > > > > > > > reference of them in the .META. table. It seems to be
>> resulting in
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > some of
>> > > > > > > > the hbase services being killed every so often.
>> > > > > > > >
>> > > > > > > > Here are some logs from master (foo is one of the tables not
>> > > found):
>> > > > > > > >
>> > > > > > > > 2012-08-09 20:40:44,301 FATAL
>> > > org.apache.hadoop.hbase.master.HMaster:
>> > > > > > > > Master server abort: loaded coprocessors are: []
>> > > > > > > > 2012-08-09 20:40:44,301 FATAL
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > org.apache.hadoop.hbase.master.HMaster:
>> > > > > > > > Unexpected state :
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > foo,,1343175078663.527bb34f4bb5e40dd42e82054d7c5485.
>> > > > > > > > state=PENDING_OPEN, ts=1344570044277,
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > server=ip-10-170-150-10.us-west-1.compute.internal,60020,1344559455110
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > ..
>> > > > > > > > Cannot transit it to OFFLINE.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > There are also a number of the following types of error logs:
>> > > > > > > >
>> > > > > > > > 2012-08-09 20:10:04,308 ERROR
>> > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Failed
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > assignment in:
>> > > > > > > >
>> ip-10-170-150-10.us-west-1.compute.internal,60020,1344559455110
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > due to
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > >
>> org.apache.hadoop.hbase.regionserver.RegionAlreadyInTransitionException:
>> > > > > > > > Received:OPEN for the
>> > > > > > > > region:foo,,1343175078663.527bb34f4bb5e40dd42e82054d7c5485.
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > ,which we
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > are
>> > > > > > > > already trying to OPEN.
>> > > > > > > >
>> > > > > > > > Any ideas how to find and remove any references to these
>> > > non-existent
>> > > > > > > > tables?
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Marco Gallotta | Mountain View, California
>> > > > > > > > Software Engineer, Infrastructure | Loki Studios
>> > > > > > > > fb.me/marco.gallotta (http://fb.me/marco.gallotta) |
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > twitter.com/marcog (http://twitter.com/marcog)
>> > > > > > > > [email protected] (mailto:[email protected]) | +1
>> (650)
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> > > 417-3313
>> > > > > > > >
>> > > > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Thanks & Regards,
>> > > > > > > Anil Gupta
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Thanks & Regards,
>> > > > > Anil Gupta
>> > > > >
>> > > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>> >
>>
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta