Hi,
I try to run some test with the pseudodistributed recommender job at AWS
using one of the late 0.7 snapshots.
Every time I run the jobs processing stops a few seconds. Some hours
later I find warning that some processes can't be found.
Did someone observe something similar one AWS?
I attached a logfile containing the relevant messages.
Bye,
Oliver
2012-06-27 21:39:11,763 INFO org.apache.hadoop.util.NativeCodeLoader (main):
Loaded the native-hadoop library
2012-06-27 21:39:12,960 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl
(main): Source name ugi already exists!
2012-06-27 21:39:13,153 INFO org.apache.hadoop.mapred.ReduceTask (main): Host
name: domU-12-31-39-06-AC-42.compute-1.internal
2012-06-27 21:39:14,731 INFO org.apache.hadoop.util.ProcessTree (main): setsid
exited with exit code 0
2012-06-27 21:39:14,776 INFO org.apache.hadoop.mapred.Task (main): Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@82d603
2012-06-27 21:39:14,862 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy
(main): Snappy native library is available
2012-06-27 21:39:14,862 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy
(main): Snappy native library loaded
2012-06-27 21:39:14,881 INFO org.apache.hadoop.mapred.ReduceTask (main):
ShuffleRamManager: MemoryLimit=272498688, MaxSingleShuffleLimit=68124672
2012-06-27 21:39:14,938 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,939 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,940 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,941 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,942 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,944 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,956 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,960 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,964 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,995 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,997 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:14,999 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,001 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,036 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,038 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,041 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,043 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,045 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,064 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,065 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new decompressor
2012-06-27 21:39:15,068 INFO org.apache.hadoop.mapred.ReduceTask (Thread for
merging on-disk files): attempt_201206272135_0001_r_000000_0 Thread started:
Thread for merging on-disk files
2012-06-27 21:39:15,068 INFO org.apache.hadoop.mapred.ReduceTask (Thread for
merging on-disk files): attempt_201206272135_0001_r_000000_0 Thread waiting:
Thread for merging on-disk files
2012-06-27 21:39:15,069 INFO org.apache.hadoop.mapred.ReduceTask (Thread for
merging in memory files): attempt_201206272135_0001_r_000000_0 Thread started:
Thread for merging in memory files
2012-06-27 21:39:15,069 INFO org.apache.hadoop.mapred.ReduceTask (main):
attempt_201206272135_0001_r_000000_0 Need another 2 map output(s) where 0 is
already in progress
2012-06-27 21:39:15,070 INFO org.apache.hadoop.mapred.ReduceTask (main):
attempt_201206272135_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup
hosts)
2012-06-27 21:39:15,070 INFO org.apache.hadoop.mapred.ReduceTask (Thread for
polling Map Completion Events): attempt_201206272135_0001_r_000000_0 Thread
started: Thread for polling Map Completion Events
2012-06-27 21:39:20,071 INFO org.apache.hadoop.mapred.ReduceTask (main):
attempt_201206272135_0001_r_000000_0 Scheduled 2 outputs (0 slow hosts and0 dup
hosts)
2012-06-27 21:39:22,149 INFO org.apache.hadoop.mapred.ReduceTask (Thread for
polling Map Completion Events): GetMapEventsThread exiting
2012-06-27 21:39:22,149 INFO org.apache.hadoop.mapred.ReduceTask (main):
getMapsEventsThread joined.
2012-06-27 21:39:22,150 INFO org.apache.hadoop.mapred.ReduceTask (main): Closed
ram manager
2012-06-27 21:39:22,152 INFO org.apache.hadoop.mapred.ReduceTask (main):
Interleaved on-disk merge complete: 0 files left.
2012-06-27 21:39:22,152 INFO org.apache.hadoop.mapred.ReduceTask (main):
In-memory merge complete: 2 files left.
2012-06-27 21:39:22,183 INFO org.apache.hadoop.mapred.Merger (main): Merging 2
sorted segments
2012-06-27 21:39:22,185 INFO org.apache.hadoop.mapred.Merger (main): Down to
the last merge-pass, with 2 segments left of total size: 1026561 bytes
2012-06-27 21:39:22,211 INFO org.apache.hadoop.io.compress.CodecPool (main):
Got brand-new compressor
2012-06-27 21:39:22,645 INFO org.apache.hadoop.mapred.ReduceTask (main): Merged
2 segments, 1026561 bytes to disk to satisfy reduce memory limit
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.ReduceTask (main):
Merging 1 files, 89170 bytes from disk
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.ReduceTask (main):
Merging 0 segments, 0 bytes from memory into reduce
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.Merger (main): Merging 1
sorted segments
2012-06-27 21:39:22,649 INFO org.apache.hadoop.mapred.Merger (main): Down to
the last merge-pass, with 1 segments left of total size: 89166 bytes
2012-06-27 21:39:22,780 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem
(main): Creating new file
's3://thotti-trash/001/2012-06-27-T-21-38-06-262+0000-AbstractCityBlockSimilarityJob/part-r-00000.gz'
in S3
2012-06-27 21:39:22,782 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem
(main): Outputstream for key
'001/2012-06-27-T-21-38-06-262+0000-AbstractCityBlockSimilarityJob/part-r-00000.gz'
writing to tempfile '/mnt/var/lib/hadoop/s3/output-3143824437320689581.tmp'
2012-06-27 21:39:22,787 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory
(main): Successfully loaded & initialized native-zlib library
2012-06-27 21:39:22,901 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem
(main): Opening 's3://thotti-hadoop-datasets/001/titles-001.txt' for reading
2012-06-27 21:39:27,391 INFO
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Creating
FileDataModel for file
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201206272135_0001/attempt_201206272135_0001_r_000000_0/work/tmp/mahout-taste-hadoop2083471312744858636txt
2012-06-27 21:39:27,449 INFO
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Reading file
info...
2012-06-27 21:39:52,172 INFO
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Processed
1000000 lines
2012-06-27 21:39:58,884 INFO
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Read lines:
1280685
2012-06-27 21:40:00,577 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 10000
users
2012-06-27 21:40:02,680 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 20000
users
2012-06-27 21:40:04,378 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 30000
users
2012-06-27 21:40:04,813 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 40000
users
2012-06-27 21:40:08,268 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 50000
users
2012-06-27 21:40:08,601 INFO
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 57060
users
2012-06-28 02:27:47,013 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 02:51:14,577 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 02:59:58,992 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 03:55:41,973 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 04:26:30,730 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 05:58:51,747 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process
2012-06-28 10:42:57,140 WARN org.apache.hadoop.util.ProcfsBasedProcessTree
(communication thread): Error reading the stream java.io.IOException: No such
process