It doesn't straight up shut down, although you can certainly put enough pressure on the namenode or even individual datanodes to make them non-responsive. The errors you are getting are pretty specific about the state of things though, and they claim the process is actively shutting down.
Any chance you have bad hard drives? Datanodes will shut down after a while if they repeatedly fail to write to a disk (I am told that's been fixed in later versions, and it will keep going as long as it's got at least one hard drive, but at least in CDH2 this still happens). The hadoop user list is probably a better place to troubleshoot this, it's not a Pig issue. D On Thu, Jan 27, 2011 at 3:25 PM, Robert Waddell <[email protected]>wrote: > Dmitriy, > > Thanks for the reply. > > I dont manage the cluster but its unlikely to be someone manually cutting > off hdfs. The particular job it happened on was outputing a large amount of > records, does hdfs shut down under load or on low storage or ... ? > > Robert. > On 27 Jan 2011 23:20, "Dmitriy Ryaboy" <[email protected]> wrote: > > Robert, > > Are you managing your own cluster? Any chance someone / something is > > shutting down HDFS? > > > > On Thu, Jan 27, 2011 at 3:04 PM, Robert Waddell > > <[email protected]>wrote: > > > >> Sorry to repeat, but would appreciate any insight into the trouble I > have > >> been having ... > >> > >> Hey Guys, > >> > >> I was just wondering if any of you might have come across the FileSystem > >> closed error message as below: > >> > >> java.io.IOException: Filesystem closed at > >> org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:230) at > >> org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65) at > >> > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1748) > >> at > >> java.io.DataInputStream.read(DataInputStream.java:132) at > >> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:221) at > >> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) at > >> java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92) at > >> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:105) at > >> XMLLoaderBufferedPositionedInputStream.read(LinkLoader.java:166) at > >> > >> > > XMLLoaderBufferedPositionedInputStream.collectUntilEndTag(LinkLoader.java:202) > >> at > XMLLoaderBufferedPositionedInputStream.collectTag(LinkLoader.java:352) > >> at > >> LinkLoader$XMLFileRecordReader.getCurrentValue(LinkLoader.java:772) at > >> LinkLoader.getNext(LinkLoader.java:498) at > >> > >> > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:142) > >> at > >> > >> > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) > >> at > org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at > >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at > >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at > >> org.apache.hadoop.mapred.Child.main(Child.java:170) > >> > >> accomponied by the syslog: > >> > >> 2011-01-27 12:27:32,943 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > >> Initializing JVM Metrics with processName=MAP, sessionId= 2011-01-27 > >> 12:27:35,144 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 > >> 2011-01-27 12:27:35,914 INFO org.apache.hadoop.mapred.MapTask: data > buffer > >> = > >> 79691776/99614720 2011-01-27 12:27:35,914 INFO > >> org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 > 2011-01-27 > >> 12:27:48,830 INFO org.apache.hadoop.mapred.MapTask: Starting flush of > map > >> output 2011-01-27 12:27:48,864 WARN > org.apache.hadoop.mapred.TaskTracker: > >> > >> Error running child java.lang.IllegalStateException: Shutdown in > progress > >> at > >> java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39) > at > >> java.lang.Runtime.addShutdownHook(Runtime.java:192) at > >> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1478) > at > >> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) at > >> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:197) at > >> org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:168) at > >> > >> > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:213) > >> at > >> > >> > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:289) > >> at > >> > >> > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) > >> at > >> > >> > > org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107) > >> at > >> > >> > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221) > >> at > >> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129) > >> at > >> > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:549) > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623) at > >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at > >> org.apache.hadoop.mapred.Child.main(Child.java:170) > >> > >> This error kills the task and makes the job fail, does anyone know why > this > >> might be being caused or what it means ? > >> > >> Thanks, > >> > >> Robert. > >> >
