Hi,

I try to run some test with the pseudodistributed recommender job at AWS using one of the late 0.7 snapshots.

Every time I run the jobs processing stops a few seconds. Some hours later I find warning that some processes can't be found.

Did someone observe something similar one AWS?

I attached a logfile containing the relevant messages.

Bye,

Oliver
2012-06-27 21:39:11,763 INFO org.apache.hadoop.util.NativeCodeLoader (main): 
Loaded the native-hadoop library
2012-06-27 21:39:12,960 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl 
(main): Source name ugi already exists!
2012-06-27 21:39:13,153 INFO org.apache.hadoop.mapred.ReduceTask (main): Host 
name: domU-12-31-39-06-AC-42.compute-1.internal
2012-06-27 21:39:14,731 INFO org.apache.hadoop.util.ProcessTree (main): setsid 
exited with exit code 0
2012-06-27 21:39:14,776 INFO org.apache.hadoop.mapred.Task (main):  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@82d603
2012-06-27 21:39:14,862 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy 
(main): Snappy native library is available
2012-06-27 21:39:14,862 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy 
(main): Snappy native library loaded
2012-06-27 21:39:14,881 INFO org.apache.hadoop.mapred.ReduceTask (main): 
ShuffleRamManager: MemoryLimit=272498688, MaxSingleShuffleLimit=68124672
2012-06-27 21:39:14,938 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,939 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,940 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,941 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,942 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,944 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,956 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,960 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,964 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,995 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,997 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:14,999 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,001 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,036 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,038 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,041 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,043 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,045 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,064 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,065 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new decompressor
2012-06-27 21:39:15,068 INFO org.apache.hadoop.mapred.ReduceTask (Thread for 
merging on-disk files): attempt_201206272135_0001_r_000000_0 Thread started: 
Thread for merging on-disk files
2012-06-27 21:39:15,068 INFO org.apache.hadoop.mapred.ReduceTask (Thread for 
merging on-disk files): attempt_201206272135_0001_r_000000_0 Thread waiting: 
Thread for merging on-disk files
2012-06-27 21:39:15,069 INFO org.apache.hadoop.mapred.ReduceTask (Thread for 
merging in memory files): attempt_201206272135_0001_r_000000_0 Thread started: 
Thread for merging in memory files
2012-06-27 21:39:15,069 INFO org.apache.hadoop.mapred.ReduceTask (main): 
attempt_201206272135_0001_r_000000_0 Need another 2 map output(s) where 0 is 
already in progress
2012-06-27 21:39:15,070 INFO org.apache.hadoop.mapred.ReduceTask (main): 
attempt_201206272135_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup 
hosts)
2012-06-27 21:39:15,070 INFO org.apache.hadoop.mapred.ReduceTask (Thread for 
polling Map Completion Events): attempt_201206272135_0001_r_000000_0 Thread 
started: Thread for polling Map Completion Events
2012-06-27 21:39:20,071 INFO org.apache.hadoop.mapred.ReduceTask (main): 
attempt_201206272135_0001_r_000000_0 Scheduled 2 outputs (0 slow hosts and0 dup 
hosts)
2012-06-27 21:39:22,149 INFO org.apache.hadoop.mapred.ReduceTask (Thread for 
polling Map Completion Events): GetMapEventsThread exiting
2012-06-27 21:39:22,149 INFO org.apache.hadoop.mapred.ReduceTask (main): 
getMapsEventsThread joined.
2012-06-27 21:39:22,150 INFO org.apache.hadoop.mapred.ReduceTask (main): Closed 
ram manager
2012-06-27 21:39:22,152 INFO org.apache.hadoop.mapred.ReduceTask (main): 
Interleaved on-disk merge complete: 0 files left.
2012-06-27 21:39:22,152 INFO org.apache.hadoop.mapred.ReduceTask (main): 
In-memory merge complete: 2 files left.
2012-06-27 21:39:22,183 INFO org.apache.hadoop.mapred.Merger (main): Merging 2 
sorted segments
2012-06-27 21:39:22,185 INFO org.apache.hadoop.mapred.Merger (main): Down to 
the last merge-pass, with 2 segments left of total size: 1026561 bytes
2012-06-27 21:39:22,211 INFO org.apache.hadoop.io.compress.CodecPool (main): 
Got brand-new compressor
2012-06-27 21:39:22,645 INFO org.apache.hadoop.mapred.ReduceTask (main): Merged 
2 segments, 1026561 bytes to disk to satisfy reduce memory limit
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.ReduceTask (main): 
Merging 1 files, 89170 bytes from disk
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.ReduceTask (main): 
Merging 0 segments, 0 bytes from memory into reduce
2012-06-27 21:39:22,646 INFO org.apache.hadoop.mapred.Merger (main): Merging 1 
sorted segments
2012-06-27 21:39:22,649 INFO org.apache.hadoop.mapred.Merger (main): Down to 
the last merge-pass, with 1 segments left of total size: 89166 bytes
2012-06-27 21:39:22,780 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem 
(main): Creating new file 
's3://thotti-trash/001/2012-06-27-T-21-38-06-262+0000-AbstractCityBlockSimilarityJob/part-r-00000.gz'
 in S3
2012-06-27 21:39:22,782 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem 
(main): Outputstream for key 
'001/2012-06-27-T-21-38-06-262+0000-AbstractCityBlockSimilarityJob/part-r-00000.gz'
 writing to tempfile '/mnt/var/lib/hadoop/s3/output-3143824437320689581.tmp'
2012-06-27 21:39:22,787 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory 
(main): Successfully loaded & initialized native-zlib library
2012-06-27 21:39:22,901 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem 
(main): Opening 's3://thotti-hadoop-datasets/001/titles-001.txt' for reading
2012-06-27 21:39:27,391 INFO 
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Creating 
FileDataModel for file 
/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201206272135_0001/attempt_201206272135_0001_r_000000_0/work/tmp/mahout-taste-hadoop2083471312744858636txt
2012-06-27 21:39:27,449 INFO 
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Reading file 
info...
2012-06-27 21:39:52,172 INFO 
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Processed 
1000000 lines
2012-06-27 21:39:58,884 INFO 
org.apache.mahout.cf.taste.impl.model.file.FileDataModel (main): Read lines: 
1280685
2012-06-27 21:40:00,577 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 10000 
users
2012-06-27 21:40:02,680 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 20000 
users
2012-06-27 21:40:04,378 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 30000 
users
2012-06-27 21:40:04,813 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 40000 
users
2012-06-27 21:40:08,268 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 50000 
users
2012-06-27 21:40:08,601 INFO 
org.apache.mahout.cf.taste.impl.model.GenericDataModel (main): Processed 57060 
users
2012-06-28 02:27:47,013 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 02:51:14,577 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 02:59:58,992 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 03:55:41,973 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 04:26:30,730 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 05:58:51,747 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process
2012-06-28 10:42:57,140 WARN org.apache.hadoop.util.ProcfsBasedProcessTree 
(communication thread): Error reading the stream java.io.IOException: No such 
process

Reply via email to