Re: Is intermediate data produced by mappers always flushed to disk ?
The only way to do something like this is get them mapers to use something like /dev/shm as there storage folder that's 100% memory outside of that everything is flushed because the mapper exits when its done the tasktracker is the one delivering the output to the reduce task. Billy "paula_ta" wrote in message news:23617347.p...@talk.nabble.com... Is it possible that some intermediate data produced by mappers and written to the local file system resides in memory in the file system cache and is never flushed to disk ? Eventually reducers will retrieve this data via HTTP - possibly without the data ever being written to disk ? thanks Paula -- View this message in context: http://www.nabble.com/Is-intermediate-data-produced-by-mappers-always-flushed-to-disk---tp23617347p23617347.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoop & Python
I used streaming and php before to work with processing data with a data set of about 1TB with out any problems at all. Billy "s d" wrote in message news:24b53fa00905191035w41b115c1q94502ee82be43...@mail.gmail.com... Thanks. So in the overall scheme of things, what is the general feeling about using python for this? I like the ease of deploying and reading python compared with Java but want to make sure using python over hadoop is scalable & is standard practice and not something done only for prototyping and small scale tests. On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard wrote: Streaming is slightly slower than native Java jobs. Otherwise Python works great in streaming. Alex On Tue, May 19, 2009 at 8:36 AM, s d wrote: > Hi, > How robust is using hadoop with python over the streaming protocol? Any > disadvantages (performance? flexibility?) ? It just strikes me that python > is so much more convenient when it comes to deploying and crunching > text > files. > Thanks, >
Re: Regarding Capacity Scheduler
I am seeing the the same problem posted on the list on the 11th and have not any reply. Billy - Original Message - From: "Manish Katyal" Newsgroups: gmane.comp.jakarta.lucene.hadoop.user To: Sent: Wednesday, May 13, 2009 11:48 AM Subject: Regarding Capacity Scheduler I'm experimenting with the Capacity scheduler (0.19.0) in a multi-cluster environment. I noticed that unlike the mappers, the reducers are not pre-empted? I have two queues (high and low) that are each running big jobs (70+ maps each). The scheduler splits the mappers as per the queue guaranteed-capacity (5/8ths for the high and the rest for the low). However, the reduce jobs are not interleaved -- the reduce job in the high queue is blocked waiting for the reduce job in the low queue to complete. Is this a bug or by design? *Low queue:* Guaranteed Capacity (%) : 37.5 Guaranteed Capacity Maps : 3 Guaranteed Capacity Reduces : *3* User Limit : 100 Reclaim Time limit : 300 Number of Running Maps : 3 Number of Running Reduces : *7* Number of Waiting Maps : 131 Number of Waiting Reduces : 0 Priority Supported : NO *High queue:* Guaranteed Capacity (%) : 62.5 Guaranteed Capacity Maps : 5 Guaranteed Capacity Reduces : 5 User Limit : 100 Reclaim Time limit : 300 Number of Running Maps : 4 Number of Running Reduces : *0* Number of Waiting Maps : 68 Number of Waiting Reduces : *7* Priority Supported : NO
Capacity Scheduler?
Does the Capacity Scheduler not recover reduce tasks in the setting mapred.capacity-scheduler.queue.{name}.reclaim-time-limit? on my test it only recovers map task if it can not get its full Guaranteed Capacity. Billy
Re: How to do load control of MapReduce
Might try setting the tasktrackers linux nice level to say 5 or 10 leavening dfs and hbase setting to 0 Billy "zsongbo" wrote in message news:fa03480d0905110549j7f09be13qd434ca41c9f84...@mail.gmail.com... Hi all, Now, if we have a large dataset to process by MapReduce. The MapReduce will take machine resources as many as possible. So when one such a big MapReduce job are running, the cluster would become very busy and almost cannot do anything else. For example, we have a HDFS+MapReduc+HBase cluster. There are a large dataset in HDFS to be processed by MapReduce periodically, the workload is CPU and I/O heavy. And the cluster also provide other service for query (query HBase and read files in HDFS). So, when the job is running, the query latency will become very long. Since the MapReduce job is not time sensitive, I want to control the load of MapReduce. Do you have some advices ? Thanks in advance. Schubert
Re: Logging in Hadoop Stream jobs
When I was looking to capture debugging data about my scripts I would just write to stderr stream in php it like fwrite(STDERR,"message you want here"); then it get captured in the task logs when you view the detail of each task. Billy "Mayuran Yogarajah" wrote in message news:4a049154.6070...@casalemedia.com... How do people handle logging in a Hadoop stream job? I'm currently looking at using syslog for this but would like to know of other ways people are doing this currently. thanks
Re: Sequence of Streaming Jobs
In php I run exec commands with the job commands and it has a variable that stores the exit status code. Billy "Mayuran Yogarajah" wrote in message news:49fc975a.3030...@casalemedia.com... Billy Pearson wrote: I done this with and array of commands for the jobs in a php script checking the return of the job to tell if it failed or not. Billy I have this same issue.. How do you check if a job failed or not? You mentioned checking the return code? How are you doing that ? thanks "Dan Milstein" wrote in message news:58d66a11-b59c-49f8-b72f-7507482c3...@hubteam.com... If I've got a sequence of streaming jobs, each of which depends on the output of the previous one, is there a good way to launch that sequence? Meaning, I want step B to only start once step A has finished. From within Java JobClient code, I can do submitJob/runJob, but is there any sort of clean way to do this for a sequence of streaming jobs? Thanks, -Dan Milstein
Re: Sequence of Streaming Jobs
I done this with and array of commands for the jobs in a php script checking the return of the job to tell if it failed or not. Billy "Dan Milstein" wrote in message news:58d66a11-b59c-49f8-b72f-7507482c3...@hubteam.com... If I've got a sequence of streaming jobs, each of which depends on the output of the previous one, is there a good way to launch that sequence? Meaning, I want step B to only start once step A has finished. From within Java JobClient code, I can do submitJob/runJob, but is there any sort of clean way to do this for a sequence of streaming jobs? Thanks, -Dan Milstein
Re: How to run many jobs at the same time?
The only way I know of is try using different Scheduling Queue's for each group Billy "nguyenhuynh.mr" wrote in message news:49ee6e56.7080...@gmail.com... Tom White wrote: You need to start each JobControl in its own thread so they can run concurrently. Something like: Thread t = new Thread(jobControl); t.start(); Then poll the jobControl.allFinished() method. Tom On Tue, Apr 21, 2009 at 10:02 AM, nguyenhuynh.mr wrote: Hi all! I have some jobs: job1, job2, job3,... . Each job working with the group. To control jobs, I have JobControllers, each JobController control jobs follow the specified group. Example: - Have 2 Group: g1 and g2 -> 2 JobController: jController1, jcontroller2 + jController1 contains jobs: job1, job2, job3, ... + jController2 contains jobs: job1, job2, job3, ... * To run jobs, I sue: for (i=0; i<2; i++){ jCtrl[i]= new jController(group i); jCtrl[i].run(); } * I want jController1 and jController2 run parallel. But actual, when jController1 finished, jController2 begin run. Why? Please help me! * P/s: jController use org.apache.hadoop.mapred.jobcontrol.JobControl Thanks, cheer, Nguyen. Thanks for your response! I have used Thread to start JobControl, some things like: public class JobController{ public JobController(String g){ . } public run(){ Job j1 = new Job(..); Job j2 =new Job(..); JobControl jc = new JobControl("group1"); Threat t=new Thread(jc); t.start(); while(! jc.allFinish()){ // Display state } } } * To run the code some like: JobController[] jController=new JController[2]; for (int i=0; i<2; i++){ jController[i]=new JobController(group[i]); JCOntroller[i].run(); } * But not parallel run :( ! Please help me! Thanks, Best regards, Nguyen,
Re: NameNode resilency
Not 100% sure but I thank they plan on using zookeeper to help with namenode fail over but that may have changed. Billy "Stas Oskin" wrote in message news:77938bc20904110243u7a2baa6dw6d710e4e51ae0...@mail.gmail.com... Hi. I wonder, what Hadoop community uses in order to make NameNode resilient to failures? I mean, what High-Availability measures are taken to keep the HDFS available even in case of NameNode failure? So far I read a possible solution using DRBD, and another one using carp. Both of them had the downside of keeping a passive machine aside taking the IP of the NameNode. Perhaps there is a way to keep only a passive NameNode service on another machine (which does other tasks), taking the IP only when the main has failed? That of course until the human operator restores the main node to action.\ Regards.
Re: Reduce task attempt retry strategy
I seen the same thing happening on 0.19.branch. When a task fails on the reduce end it always retries on the same node until it kills the job for to many failed tries on one reduce task. I am running a cluster of 7 nodes. Billy "Stefan Will" wrote in message news:c5ff7f91.18c09%stefan.w...@gmx.net... Hi, I had a flaky machine the other day that was still accepting jobs and sending heartbeats, but caused all reduce task attempts to fail. This in turn caused the whole job to fail because the same reduce task was retried 3 times on that particular machine. Perhaps I¹m confusing this with the block placement strategy in hdfs, but I always thought that the framework would retry jobs on a different machine if retries on the original machine keep failing. E.g. I would have expected to retry once or twice on the same machine, but then switch to a different one to minimize the likelihood of getting stuck on a bad machine. What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for improving on this in the future ? Thanks, Stefan
Re: hdfs-doubt
Your client doesn't have to be on the namenode it can be on any system that can access the namenode and the datanodes. Hadoop uses 64MB block to store files so file sizes >= 64mb should be as efficient as 128MB or 1GB file sizes. more reading and information here: http://wiki.apache.org/hadoop/HadoopPresentations http://wiki.apache.org/hadoop/HadoopArticles Billy "sree deepya" wrote in message news:258814520903281055u540dc36do71aa26403dd03...@mail.gmail.com... Hi sir/madam, I am SreeDeepya,doing Mtech in IIIT.I am working on a project named cost effective and scalable storage server.Our main goal of the project is to be able to store images in a server and the data can be upto petabytes.For that we are using HDFS.I am new to hadoop and am just learning about it. Can you please clarify some of the doubts I have. At present we configured one datanode and one namenode.Jobtracker is running on namenode and tasktracker on datanode.Now namenode also acts as client.Like we are writing programs in the namenode to store or retrieve images.My doubts are 1.Can we put the client and namenode in two separate systems? 2.Can we access the images from the datanode of hadoop cluster from a machine in which hdfs is not there? 3.At present we may not have data upto petabytes but will be in gigabytes.Is hadoop still efficient in storing mega and giga bytes of data Thanking you, Yours sincerely, SreeDeepya
Re: Reducer handing at 66%
66% is the start of the reduce function so its likely a endless loop there burning the cpu cycles "Amandeep Khurana" wrote in message news:35a22e220903271631i25ff749bx5814348e66ff4...@mail.gmail.com... I have a MR job running on approximately 15 lines of data in a text file. The reducer hangs at 66% and at that moment the cpu usage is at 100%. After 600 seconds, it kills the job... What could be going wrong and where should I look for the problem? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: Typical hardware configurations
I run 10 node cluster with 2 cores 2.4Ghz with 4Gb Ram and dual 250GB drives per node. I run on used 32 bit servers so I can only run 2GB hbase but I still have memory left for tasktracker and datanode. more files in hadoop = more memory used on the namenode. hbase master is lightly loaded so I run my on the same node as namenode My personal option is a large memory 64bit machine can not be fully loaded with hbase at this time but will give you better performance. Maybe if you have lots of MR jobs or need the netter response then it would be worth it. I thank there is still some open issues on to many open file handles etc that can limit larger server to not be fully used to there capacity. Thank in terms of google they stick with low (cheap to replace) hard drive sizes medium memory and cost/performance cpus but have lots of them. Billy "Amandeep Khurana" wrote in message news:35a22e220903272207s30f26310y3ecbec723b83e...@mail.gmail.com... What are the typical hardware config for a node that people are using for Hadoop and HBase? I am setting up a new 10 node cluster which will have HBase running as well that will be feeding my front end directly. Currently, I had a 3 node cluster with 2 GB of RAM on the slaves and 4 GB of RAM on the master. This didnt work very well due to the RAM being a little low. I got some config details from the powered by page on the Hadoop wiki, but nothing like that for Hbase. Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz
Re: reduce task failing after 24 hours waiting
mapred.jobtracker.retirejob.interval is not in the default config should this not be in the config? Billy "Amar Kamat" wrote in message news:49caff11.8070...@yahoo-inc.com... Amar Kamat wrote: Amareshwari Sriramadasu wrote: Set mapred.jobtracker.retirejob.interval This is used to retire completed jobs. and mapred.userlog.retain.hours to higher value. This is used to discard user logs. As Amareshwari pointed out, this might be the cause. Can you increase this value and try? Amar By default, their values are 24 hours. These might be the reason for failure, though I'm not sure. Thanks Amareshwari Billy Pearson wrote: I am seeing on one of my long running jobs about 50-60 hours that after 24 hours all active reduce task fail with the error messages java.io.IOException: Task process exit with nonzero status of 255. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) Is there something in the config that I can change to stop this? Every time with in 1 min of 24 hours they all fail at the same time. waist a lot of resource downloading the map outputs and merging them again. What is the state of the reducer (copy or sort)? Check jobtracker/task-tracker logs to see what is the state of these reducers and whether it issued a kill signal. Either jobtracker/tasktracker is issuing a kill signal or the reducers are committing suicide. Were there any failures on the reducer side while pulling the map output? Also what is the nature of the job? How fast the maps finish? Amar Billy
Re: reduce task failing after 24 hours waiting
lues are 24 hours. These might be the reason for failure, though I'm not sure. Thanks Amareshwari Billy Pearson wrote: I am seeing on one of my long running jobs about 50-60 hours that after 24 hours all active reduce task fail with the error messages java.io.IOException: Task process exit with nonzero status of 255. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) Is there something in the config that I can change to stop this? Every time with in 1 min of 24 hours they all fail at the same time. waist a lot of resource downloading the map outputs and merging them again. What is the state of the reducer (copy or sort)? Check jobtracker/task-tracker logs to see what is the state of these reducers and whether it issued a kill signal. Either jobtracker/tasktracker is issuing a kill signal or the reducers are committing suicide. Were there any failures on the reducer side while pulling the map output? Also what is the nature of the job? How fast the maps finish? Amar Billy
reduce task failing after 24 hours waiting
I am seeing on one of my long running jobs about 50-60 hours that after 24 hours all active reduce task fail with the error messages java.io.IOException: Task process exit with nonzero status of 255. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) Is there something in the config that I can change to stop this? Every time with in 1 min of 24 hours they all fail at the same time. waist a lot of resource downloading the map outputs and merging them again. Billy
Re: intermediate results not getting compressed
I opened a issue here https://issues.apache.org/jira/browse/HADOOP-5539 If you would like to comment on it. Billy "Stefan Will" wrote in message news:c5e7dc6d.1840d%stefan.w...@gmx.net... I noticed this too. I think the compression only applies to the final mapper and reducer outputs, but not any intermediate files produced. The reducer will decompress the map output files after copying them, and then compress its own output only after it has finished. I wonder if this is by design, or just an oversight. -- Stefan From: Billy Pearson Reply-To: Date: Wed, 18 Mar 2009 22:14:07 -0500 To: Subject: Re: intermediate results not getting compressed I can run head on the map.out files and I get compressed garbish but I run head on a intermediate file and I can read the data in the file clearly so compression is not getting passed but I am setting the CompressMapOutput to true by default in my hadoop-site.conf file. Billy "Billy Pearson" wrote in message news:gpscu3$66...@ger.gmane.org... the intermediate.X files are not getting compresses for some reason not sure why I download and build the latest branch for 0.19 o.a.h.mapred.Merger.class line 432 new Writer(conf, fs, outputFile, keyClass, valueClass, codec); this seams to use the codec defined above but for some reasion its not working correctly the compression is not passing from the map output files to the on disk merge of the intermediate.X files tail task report from one server: 2009-03-18 19:19:02,643 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1730 files left. 2009-03-18 19:19:02,645 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 3 files left. 2009-03-18 19:19:02,650 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 3 segments, 39835369 bytes in memory for intermediate, on-disk merge 2009-03-18 19:19:03,878 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1730 files, 70359998581 bytes from disk 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.Merger: Merging 1733 sorted segments 2009-03-18 19:19:04,161 INFO org.apache.hadoop.mapred.Merger: Merging 22 intermediate segments out of a total of 1733 2009-03-18 19:21:43,693 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1712 2009-03-18 19:27:07,033 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1683 2009-03-18 19:33:27,669 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1654 2009-03-18 19:40:38,243 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1625 2009-03-18 19:48:08,151 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1596 2009-03-18 19:57:16,300 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1567 2009-03-18 20:07:34,224 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1538 2009-03-18 20:17:54,715 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1509 2009-03-18 20:28:49,273 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1480 2009-03-18 20:39:28,830 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1451 2009-03-18 20:50:23,706 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1422 2009-03-18 21:01:36,818 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1393 2009-03-18 21:13:09,509 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1364 2009-03-18 21:25:17,304 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1335 2009-03-18 21:36:48,536 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1306 See the size of the files is about ~70GB (70359998581) these are compressed at this points its went from 1733 file to 1306 left to merge and the intermediate.X files are well over 200Gb at this point and we are not even close to done. If compression is working we should not see task failing at this point in the task becuase lack of hard drvie space sense as we merge we delete the merged file from the output folder. I only see this happening when there are to many files left that did not get merged durring the shuffle stage and it starts on disk mergeing. the task that complete the merges and keep it below the io.sort size in my case 30 skips the on disk merge and complete useing normal hard drive space. Anyone care to take a look? This job takes two or more days to get to this point so getting kind of a pain in the butt to run and watch the reduces fail and the job keep failing no matt
Re: intermediate results not getting compressed
open issue https://issues.apache.org/jira/browse/HADOOP-5539 Billy "Billy Pearson" wrote in message news:cecf0598d9ca40a08e777568361de...@billypc... How are you concluding that the intermediate output is compressed from the map, but not in the reduce? -C my hadoop-site.xml mapred.compress.map.output true Should the job outputs be compressed? mapred.output.compression.type BLOCK If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. from the job.xml mapred.output.compress = false // final output mapred.compress.map.output = true // map output + I can head the files from comand line and read the key / value in the reduce intermediate merges but not the map.out files.
Re: intermediate results not getting compressed
If CompressMapOutput then it should carry all the way to the reduce including map.out files and intermediate I added some logging to the Merger I have to wait until some more jobs finish before I can rebuild and restart to see the logging but that will confirm weather or not the codec is null when it gets to line 432 and the writer is created for the intermediate files. if its null I will open a issue. Billy "Stefan Will" wrote in message news:c5e7dc6d.1840d%stefan.w...@gmx.net... I noticed this too. I think the compression only applies to the final mapper and reducer outputs, but not any intermediate files produced. The reducer will decompress the map output files after copying them, and then compress its own output only after it has finished. I wonder if this is by design, or just an oversight. -- Stefan From: Billy Pearson Reply-To: Date: Wed, 18 Mar 2009 22:14:07 -0500 To: Subject: Re: intermediate results not getting compressed I can run head on the map.out files and I get compressed garbish but I run head on a intermediate file and I can read the data in the file clearly so compression is not getting passed but I am setting the CompressMapOutput to true by default in my hadoop-site.conf file. Billy "Billy Pearson" wrote in message news:gpscu3$66...@ger.gmane.org... the intermediate.X files are not getting compresses for some reason not sure why I download and build the latest branch for 0.19 o.a.h.mapred.Merger.class line 432 new Writer(conf, fs, outputFile, keyClass, valueClass, codec); this seams to use the codec defined above but for some reasion its not working correctly the compression is not passing from the map output files to the on disk merge of the intermediate.X files tail task report from one server: 2009-03-18 19:19:02,643 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1730 files left. 2009-03-18 19:19:02,645 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 3 files left. 2009-03-18 19:19:02,650 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 3 segments, 39835369 bytes in memory for intermediate, on-disk merge 2009-03-18 19:19:03,878 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1730 files, 70359998581 bytes from disk 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.Merger: Merging 1733 sorted segments 2009-03-18 19:19:04,161 INFO org.apache.hadoop.mapred.Merger: Merging 22 intermediate segments out of a total of 1733 2009-03-18 19:21:43,693 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1712 2009-03-18 19:27:07,033 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1683 2009-03-18 19:33:27,669 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1654 2009-03-18 19:40:38,243 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1625 2009-03-18 19:48:08,151 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1596 2009-03-18 19:57:16,300 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1567 2009-03-18 20:07:34,224 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1538 2009-03-18 20:17:54,715 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1509 2009-03-18 20:28:49,273 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1480 2009-03-18 20:39:28,830 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1451 2009-03-18 20:50:23,706 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1422 2009-03-18 21:01:36,818 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1393 2009-03-18 21:13:09,509 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1364 2009-03-18 21:25:17,304 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1335 2009-03-18 21:36:48,536 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1306 See the size of the files is about ~70GB (70359998581) these are compressed at this points its went from 1733 file to 1306 left to merge and the intermediate.X files are well over 200Gb at this point and we are not even close to done. If compression is working we should not see task failing at this point in the task becuase lack of hard drvie space sense as we merge we delete the merged file from the output folder. I only see this happening when there are to many files left that did not get merged durring the shuffle stage and it starts on disk mergeing. the task that complete the merges a
Re: intermediate results not getting compressed
How are you concluding that the intermediate output is compressed from the map, but not in the reduce? -C my hadoop-site.xml mapred.compress.map.output true Should the job outputs be compressed? mapred.output.compression.type BLOCK If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. from the job.xml mapred.output.compress = false // final output mapred.compress.map.output = true // map output + I can head the files from comand line and read the key / value in the reduce intermediate merges but not the map.out files.
Re: intermediate results not getting compressed
I can run head on the map.out files and I get compressed garbish but I run head on a intermediate file and I can read the data in the file clearly so compression is not getting passed but I am setting the CompressMapOutput to true by default in my hadoop-site.conf file. Billy "Billy Pearson" wrote in message news:gpscu3$66...@ger.gmane.org... the intermediate.X files are not getting compresses for some reason not sure why I download and build the latest branch for 0.19 o.a.h.mapred.Merger.class line 432 new Writer(conf, fs, outputFile, keyClass, valueClass, codec); this seams to use the codec defined above but for some reasion its not working correctly the compression is not passing from the map output files to the on disk merge of the intermediate.X files tail task report from one server: 2009-03-18 19:19:02,643 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1730 files left. 2009-03-18 19:19:02,645 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 3 files left. 2009-03-18 19:19:02,650 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 3 segments, 39835369 bytes in memory for intermediate, on-disk merge 2009-03-18 19:19:03,878 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1730 files, 70359998581 bytes from disk 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.Merger: Merging 1733 sorted segments 2009-03-18 19:19:04,161 INFO org.apache.hadoop.mapred.Merger: Merging 22 intermediate segments out of a total of 1733 2009-03-18 19:21:43,693 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1712 2009-03-18 19:27:07,033 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1683 2009-03-18 19:33:27,669 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1654 2009-03-18 19:40:38,243 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1625 2009-03-18 19:48:08,151 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1596 2009-03-18 19:57:16,300 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1567 2009-03-18 20:07:34,224 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1538 2009-03-18 20:17:54,715 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1509 2009-03-18 20:28:49,273 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1480 2009-03-18 20:39:28,830 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1451 2009-03-18 20:50:23,706 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1422 2009-03-18 21:01:36,818 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1393 2009-03-18 21:13:09,509 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1364 2009-03-18 21:25:17,304 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1335 2009-03-18 21:36:48,536 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1306 See the size of the files is about ~70GB (70359998581) these are compressed at this points its went from 1733 file to 1306 left to merge and the intermediate.X files are well over 200Gb at this point and we are not even close to done. If compression is working we should not see task failing at this point in the task becuase lack of hard drvie space sense as we merge we delete the merged file from the output folder. I only see this happening when there are to many files left that did not get merged durring the shuffle stage and it starts on disk mergeing. the task that complete the merges and keep it below the io.sort size in my case 30 skips the on disk merge and complete useing normal hard drive space. Anyone care to take a look? This job takes two or more days to get to this point so getting kind of a pain in the butt to run and watch the reduces fail and the job keep failing no matter what. I can post the tail of this task long when it fails to show you how far it gets before it runs out of space. before redcue on disk merge starts the disk are about 35-40% used on 500GB Drive and two taks runnning at the same time. Billy Pearson
Re: intermediate results not getting compressed
the intermediate.X files are not getting compresses for some reason not sure why I download and build the latest branch for 0.19 o.a.h.mapred.Merger.class line 432 new Writer(conf, fs, outputFile, keyClass, valueClass, codec); this seams to use the codec defined above but for some reasion its not working correctly the compression is not passing from the map output files to the on disk merge of the intermediate.X files tail task report from one server: 2009-03-18 19:19:02,643 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 1730 files left. 2009-03-18 19:19:02,645 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 3 files left. 2009-03-18 19:19:02,650 INFO org.apache.hadoop.mapred.ReduceTask: Keeping 3 segments, 39835369 bytes in memory for intermediate, on-disk merge 2009-03-18 19:19:03,878 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1730 files, 70359998581 bytes from disk 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2009-03-18 19:19:03,909 INFO org.apache.hadoop.mapred.Merger: Merging 1733 sorted segments 2009-03-18 19:19:04,161 INFO org.apache.hadoop.mapred.Merger: Merging 22 intermediate segments out of a total of 1733 2009-03-18 19:21:43,693 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1712 2009-03-18 19:27:07,033 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1683 2009-03-18 19:33:27,669 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1654 2009-03-18 19:40:38,243 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1625 2009-03-18 19:48:08,151 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1596 2009-03-18 19:57:16,300 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1567 2009-03-18 20:07:34,224 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1538 2009-03-18 20:17:54,715 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1509 2009-03-18 20:28:49,273 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1480 2009-03-18 20:39:28,830 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1451 2009-03-18 20:50:23,706 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1422 2009-03-18 21:01:36,818 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1393 2009-03-18 21:13:09,509 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1364 2009-03-18 21:25:17,304 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1335 2009-03-18 21:36:48,536 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 1306 See the size of the files is about ~70GB (70359998581) these are compressed at this points its went from 1733 file to 1306 left to merge and the intermediate.X files are well over 200Gb at this point and we are not even close to done. If compression is working we should not see task failing at this point in the task becuase lack of hard drvie space sense as we merge we delete the merged file from the output folder. I only see this happening when there are to many files left that did not get merged durring the shuffle stage and it starts on disk mergeing. the task that complete the merges and keep it below the io.sort size in my case 30 skips the on disk merge and complete useing normal hard drive space. Anyone care to take a look? This job takes two or more days to get to this point so getting kind of a pain in the butt to run and watch the reduces fail and the job keep failing no matter what. I can post the tail of this task long when it fails to show you how far it gets before it runs out of space. before redcue on disk merge starts the disk are about 35-40% used on 500GB Drive and two taks runnning at the same time. Billy Pearson
Re: intermediate results not getting compressed
Watching a second job with more reduce task running looks like the in-memory merges are working correctly with compression. The task I was watching failed and was running again it Shuffle all the map output files then started the merged after all was copied so non was merged in memory it was closed before the merging started. If it helps the name of the output files is intermediate.x and is stored in folder mapred/local/job-taskname/intermediate.x while the in-memory merges are stored mapred/local/taskTracker/jobcache/job-name/taskname/ The non compressed ones are the intermediate.x file above. Billy "Chris Douglas" wrote in message news:9bb78c3a-efab-45c3-8cc3-25aab60df...@yahoo-inc.com... My problem is the output from merging the intermediate map output files is not compresses so I lose all the benefit of compressing the map file output to save disk space because the merged map output files are no longer compressed. It should still be compressed, unless there's some bizarre regression. More segments will be around simultaneously (since the segments not yet merged are still on disk), which clearly puts pressure on intermediate storage, but if the map outputs are compressed, then the merged map outputs at the reduce must also be compressed. There's no place in the intermediate format to store compression metadata, so either all are or none are. Intermediate merges should also follow the compression spec of the initiating merger, too (o.a.h.mapred.Merger: 447). How are you concluding that the intermediate output is compressed from the map, but not in the reduce? -C - Original Message - From: "Chris Douglas" > Newsgroups: gmane.comp.jakarta.lucene.hadoop.user To: Sent: Tuesday, March 17, 2009 12:33 AM Subject: Re: intermediate results not getting compressed I am running 0.19.1-dev, r744282. I have searched the issues but found nothing about the compression. AFAIK, there are no open issues that prevent intermediate compression from working. The following might be useful: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Data+Compression Should the intermediate results not be compressed also if the map output files are set to be compressed? These are controlled by separate options. FileOutputFormat::setCompressOutput enables/disables compression on the final output JobConf::setCompressMapOutput enables/disables compression of the intermediate output If not then why do we have the map compression option just to save network traffic? That's part of it. Also to save on disk bandwidth and intermediate space. -C
Re: intermediate results not getting compressed
I understand that I got CompressMapOutput set and it works the maps outputs are compressed but on the reduce end it downloads x files then merges the x file in to one intermediate file to keep the number of files to a minimal <= io.sort.factor. My problem is the output from merging the intermediate map output files is not compresses so I lose all the benefit of compressing the map file output to save disk space because the merged map output files are no longer compressed. Note there are two different type of intermediate files the map outputs then one the reduce merges the map outputs to meet the set io.sort.factor. Billy - Original Message - From: "Chris Douglas" Newsgroups: gmane.comp.jakarta.lucene.hadoop.user To: Sent: Tuesday, March 17, 2009 12:33 AM Subject: Re: intermediate results not getting compressed I am running 0.19.1-dev, r744282. I have searched the issues but found nothing about the compression. AFAIK, there are no open issues that prevent intermediate compression from working. The following might be useful: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Data+Compression Should the intermediate results not be compressed also if the map output files are set to be compressed? These are controlled by separate options. FileOutputFormat::setCompressOutput enables/disables compression on the final output JobConf::setCompressMapOutput enables/disables compression of the intermediate output If not then why do we have the map compression option just to save network traffic? That's part of it. Also to save on disk bandwidth and intermediate space. -C
intermediate results not getting compressed
I am running a large streaming job that processes that about 3TB of data I am seeing large jumps in hard drive space usage in the reduce part of the jobs I tracked the problem down. The job is set to compress map outputs but looking at the intermediate files on the local drives the intermediate files are not getting compressed during/after merges. I am going from having say 2Gb of mapfile.out files to having one intermediate.X file sizing 100-350% larger then the map files. I have looked at one of the files and confirmed that it is not getting compressed as I can read the data in it. if it was only one merge then it would not be a problem but when you are merging 70-100 of these you use tons of GB's and my task are starting to die as they run out of hard drive space end the end kill the job. I am running 0.19.1-dev, r744282. I have searched the issues but found nothing about the compression. Should the intermediate results not be compressed also if the map output files are set to be compressed? If not then why do we have the map compression option just to save network traffic?
Re: Hadoop job using multiple input files
If it was me I would prefix the map values outputs with a: and n:. a: for address and n: for number then on the reduce you could test the value to see if its the address or the name with if statements no need to worry about which one comes first just make sure they both have been set before output on the reduce. Billy "Amandeep Khurana" wrote in message news:35a22e220902061646m941a545o554b189ed5bdb...@mail.gmail.com... Ok. I was able to get this to run but have a slight problem. *File 1* 1 10 2 20 3 30 3 35 4 40 4 45 4 49 5 50 *File 2* a 10 123 b 20 21321 c 45 2131 d 40 213 I want to join the above two based on the second column of file 1. Here's what I am getting as the output. *Output* 1 a 123 b 21321 2 3 3 4 d 213 c 2131 4 4 5 The ones in red are in the format I want it. The ones in blue have their order reversed. How can I get them to be in the correct order too? Basically, the order in which the iterator iterates over the values is not consistent. How can I get this to be consistent? Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 6, 2009 at 2:58 PM, Amandeep Khurana wrote: Ok. Got it. Now, how would my reducer know whether the name is coming first or the address? Is it going to be in the same order in the iterator as the files are read (alphabetically) in the mapper? Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Fri, Feb 6, 2009 at 5:22 AM, Jeff Hammerbacher wrote: You put the files into a common directory, and use that as your input to the MapReduce job. You write a single Mapper class that has an "if" statement examining the map.input.file property, outputting "number" as the key for both files, but "address" for one and "name" for the other. By using a commone key ("number"), you'll ensure that the name and address make it to the same reducer after the shuffle. In the reducer, you'll then have the relevant information (in the values) you need to create the name, address pair. On Fri, Feb 6, 2009 at 2:17 AM, Amandeep Khurana wrote: > Thanks Jeff... > I am not 100% clear about the first solution you have given. How do I get > the multiple files to be read and then feed into a single reducer? I should > have multiple mappers in the same class and have different job configs for > them, run two separate jobs with one outputing the key as > (name,number) and > the other outputing the value as (number, address) into the reducer? > Not clear what I'll be doing with the map.intput.file here... > > Amandeep > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Fri, Feb 6, 2009 at 1:55 AM, Jeff Hammerbacher > > >wrote: > > > Hey Amandeep, > > > > You can get the file name for a task via the "map.input.file" property. > For > > the join you're doing, you could inspect this property and ouput (number, > > name) and (number, address) as your (key, value) pairs, depending on the > > file you're working with. Then you can do the combination in your > reducer. > > > > You could also check out the join package in contrib/utils ( > > > > > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/contrib/utils/join/package-summary.html > > ), > > but I'd say your job is simple enough that you'll get it done faster with > > the above method. > > > > This task would be a simple join in Hive, so you could consider > > using > Hive > > to manage the data and perform the join. > > > > Later, > > Jeff > > > > On Fri, Feb 6, 2009 at 1:34 AM, Amandeep Khurana > > > wrote: > > > > > Is it possible to write a map reduce job using multiple input > > > files? > > > > > > For example: > > > File 1 has data like - Name, Number > > > File 2 has data like - Number, Address > > > > > > Using these, I want to create a third file which has something > > > like - > > Name, > > > Address > > > > > > How can a map reduce job be written to do this? > > > > > > Amandeep > > > > > > > > > > > > Amandeep Khurana > > > Computer Science Graduate Student > > > University of California, Santa Cruz > > > > > >
Re: hadoop balanceing data
did not thank about that good points I found a way to keep it from happening I set dfs.datanode.du.reserved in the config file "Hairong Kuang" wrote in message news:c59f9164.ed09%hair...@yahoo-inc.com... %Remaining is much more fluctuate than %dfs used. This is because dfs shares the disks with mapred and mapred tasks may use a lot of disks temporally. So trying to keep the same %free is impossible most of the time. Hairong On 1/19/09 10:28 PM, "Billy Pearson" wrote: Why do we not use the Remaining % in place of use Used % when we are selecting datanode for new data and when running the balancer. form what I can tell we are using the use % used and we do not factor in non DFS Used at all. I see a datanode with only a 60GB hard drive fill up completely 100% before the other servers that have 130+GB hard drives get half full. Seams like Trying to keep the same % free on the drives in the cluster would be more optimal in production. I know this still may not be perfect but would be nice if we tried. Billy
hadoop balanceing data
Why do we not use the Remaining % in place of use Used % when we are selecting datanode for new data and when running the balancer. form what I can tell we are using the use % used and we do not factor in non DFS Used at all. I see a datanode with only a 60GB hard drive fill up completely 100% before the other servers that have 130+GB hard drives get half full. Seams like Trying to keep the same % free on the drives in the cluster would be more optimal in production. I know this still may not be perfect but would be nice if we tried. Billy
Re: Namenode BlocksMap on Disk
Doug: If we use the heap as a cache and you have a large cluster then you will have the memory on the NN to handle keeping all the namespace in memory. We are looking for a way to support smaller clusters also that might over run there heap size causing the cluster to crash. So if the NN has the room to cache all the namespace then the larger clusters will not see any disk hits once the namespace is fully loaded in to memory. Billy "Doug Cutting" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Dennis Kubes wrote: 2) Besides possible slight degradation in performance, is there a reason why the BlocksMap shouldn't or couldn't be stored on disk? I think the assumption is that it would be considerably more than slight degradation. I've seen the namenode benchmarked at over 50,000 opens per second. If file data is on disk, and the namespace is considerably bigger than RAM, then a seek would be required per access. At 10MS/seek, that would give only 100 opens per second, or 500x slower. Flash storage today peaks at around 5k seeks/second. For smaller clusters the namenode might not need to be able to perform 50k opens/second, but for larger clusters we do not want the namenode to become a bottleneck. Doug
Re: Namenode BlocksMap on Disk
I would like to see something like this also I run 32bit servers so I am limited on how much memory I can use for heap. Besides just storing to disk I would like to see some sort of cache like a block cache that will cache parts the BlocksMap this would help reduce the hits to disk for lookups and still give us the ability to lower the memory requirement for the namenode. Billy "Dennis Kubes" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] From time to time a message pops up on the mailing list about OOM errors for the namenode because of too many files. Most recently there was a 1.7 million file installation that was failing. I know the simple solution to this is to have a larger java heap for the namenode. But the non-simple way would be to convert the BlocksMap for the NameNode to be stored on disk and then queried and updated for operations. This would eliminate memory problems for large file installations but also might degrade performance slightly. Questions: 1) Is there any current work to allow the namenode to store on disk versus is memory? This could be a configurable option. 2) Besides possible slight degradation in performance, is there a reason why the BlocksMap shouldn't or couldn't be stored on disk? I am willing to put forth the work to make this happen. Just want to make sure I am not going down the wrong path to begin with. Dennis
Re: The Case of a Long Running Hadoop System
If I understand the secondary namenode merges the edits log in to the fsimage and reduces the edit log size. Which is likely the root of your problems 8.5G seams large and likely putting a strain on your master servers memory and io bandwidth Why do you not have a secondary namenode? If you do not have the memory on the master I would look in to stopping a datanode/tasktracker on a server and loading the secondary namenode on it Let it run for a while and watch your log for the secondary namenode you should see your edit log get smaller I am not an expert but that would be my first action. Billy "Abhijit Bagri" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, This is a long mail as I have tried to put in as much details as might help any of the Hadoop dev/users to help us out. The gist is this: We have a long running Hadoop system (masters not restarted for about 3 months). We have recently started seeing the DFS responding very slowly which has resulted in failures on a system which depends on Hadoop. Further, the DFS seems to be an unstable state (i.e if fsck is a good representation which I believe it is). The edits file These are the details (skip/return here later and jump to the questions at the end of the mail for a quicker read) : Hadoop Version: 0.15.3 on 32 bit systems. Number of slaves: 12 Slaves heap size: 1G Namenode heap: 2G Jobtracker heap: 2G The namenode and jobtrackers have not been restarted for about 3 months. We did restart slaves(all of them within a few hours) a few times for some maintaineance in between though. We do not have a secondary namenode in place. There is another system X which talks to this hadoop cluster. X writes to the Hadoop DFS and submits jobs to the Jobtracker. The number of jobs submitted to Hadoop so far is over 650,000 ( I am using the job id for jobs for this), each job may rad/write to multiple files and has several dependent libraries which it loads from Distributed Cache. Recently, we started seeing that there were several timeouts happening while X tries to read/write to the DFS. This in turn results in DFS becoming very slow in response. The writes are especially slow. The trace we get in the logs are: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at java.io.DataInputStream.readShort(DataInputStream.java:284) at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.endBlock(DFSClient.java:1660) at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.close(DFSClient.java:1733) at org.apache.hadoop.fs.FSDataOutputStream $PositionCache.close(FSDataOutputStream.java:49) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java: 64) ... Also, datanode logs show a lot of traces like these: 2008-11-14 21:21:49,429 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.io.IOException: Block blk_-1310124865741110666 is valid, and cannot be written to. at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:551) at org.apache.hadoop.dfs.DataNode $BlockReceiver.(DataNode.java:1257) at org.apache.hadoop.dfs.DataNode $DataXceiver.writeBlock(DataNode.java:901) at org.apache.hadoop.dfs.DataNode $DataXceiver.run(DataNode.java:804) at java.lang.Thread.run(Thread.java:595) and these 2008-11-14 21:21:50,695 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Error in deleting blocks. at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java: 719) at org.apache.hadoop. The first one seems to be same as HADOOP-1845. We have been seeing this for a long time, but as HADOOP-1845 says, it hasn't been creating any problems. We have thus ignored this for a while, but this recent problems have raised eyebrows again if this really is a non-issue. We also saw a spike in number of TCP connections on the system X. Also, a lsof on the system X shows a lot of connections to datanodes in the CLOSE_WAIT state. While investigating the issue, the fsck output started saying that DFS is corrupt. The output is: Total size:152873519485 B Total blocks: 1963655 (avg. block size 77851 B) Total dirs:1390820 Total files: 1962680 Over-replicated blocks:111 (0.0061619785 %) Under-replicated blocks: 13 (6.6203077E-4 %) Target replication factor: 3 Real replication factor: 3.503 The filesystem under path '/' is CORRUPT After running a fsck -move, the system went back to healthy state. However, after some time it again started showing corrupt and fsk - move restored it to a healthy state. We are considering restarting the masters as a possible solution. I have earlier had issue with restarting Master with 2G heap and edits file size of about 2.5G. So, we decided to lookup the size of the
Re: Any Way to Skip Mapping?
I need the Reduce to Sort so I can merge the records and output in a sorted order. I do not need to join any data just merge rows together so I do not thank the join will be any help. I am storing the data like >> with a sorted map as the value. and on the merge I need to take all the rows that have the same key and merge all the sorted maps together and output one row that has all the data for that key something like what hbase is doing but without the in memory index's Maybe it will be come an option later down the row to skip the maps and let the reduce Shuffle directly from the inputSplits. Billy "Owen O'Malley" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] If you don't need a sort, which is what it sounds like, Hadoop supports that by turning off the reduce. That is done by setting the number of reduces to 0. This typically is much faster than if you need the sort. It also sounds like you may need/want the library that does map-side joins. http://tinyurl.com/43j5pp -- Owen
Any Way to Skip Mapping?
I have a job that merges multi output directories of MR jobs that run over time. The output of them are all the same and the MR that merges them uses a mapper that just outputs the same key,value as its is given so basically the same as the IdentityMapper The Problem I am seeing is as I add more and more data in to the records I see longer and longer map times Is there a way to skip the mapping of the inputsplits and just let the reduces copy from hdfs? on my job here is an example 20 reduce task run at once but my job reduce number is 60 First round of reduces to run finishes the shuffle in (2hrs, 29mins, 25sec) this includes the map times second and third round of reduces have the map inputs ready for copy so no wait (13mins, 30secs) So right now I am wasting 2hrs, 10mins for the IdentityMapper to run on all the inputs and this will get longger and longer as the amount of data incresses. I need to reduce to merge and sort the inputs does anyone know of a better what to do a merge like this in a MR job?
Re: Improving locality of table access...
generate a patch and post it here https://issues.apache.org/jira/browse/HBASE-675 Billy "Arthur van Hoff" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi, Below is some code for improving the read performance of large tables by processing each region on the host holding that region. We measured 50-60% lower network bandwidth. To use this class instead of org.apache.hadoop.hbase.mapred.TableInputFormat class use: jobconf.setInputFormat(ellerdale.mapreduce.TableInputFormatFix); Please send me feedback, if you can think off better ways to do this. -- Arthur van Hoff - Grand Master of Alphabetical Order The Ellerdale Project, Menlo Park, CA [EMAIL PROTECTED], 650-283-0842 -- TableInputFormatFix.java -- /** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ // Author: Arthur van Hoff, [EMAIL PROTECTED] package ellerdale.mapreduce; import java.io.*; import java.util.*; import org.apache.hadoop.io.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.util.*; import org.apache.hadoop.conf.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.hbase.*; import org.apache.hadoop.hbase.mapred.*; import org.apache.hadoop.hbase.client.*; import org.apache.hadoop.hbase.client.Scanner; import org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*; // // Attempt to fix the localized nature of table segments. // Compute table splits so that they are processed locally. // Combine multiple splits to avoid the number of splits exceeding numSplits. // Sort the resulting splits so that the shortest ones are processed last. // The resulting savings in network bandwidth are significant (we measured 60%). // public class TableInputFormatFix extends TableInputFormat { public static final int ORIGINAL= 0; public static final int LOCALIZED= 1; public static final int OPTIMIZED= 2;// not yet functional // // A table split with a location. // static class LocationTableSplit extends TableSplit implements Comparable { String location; public LocationTableSplit() { } public LocationTableSplit(byte [] tableName, byte [] startRow, byte [] endRow, String location) { super(tableName, startRow, endRow); this.location = location; } public String[] getLocations() { return new String[] {location}; } public void readFields(DataInput in) throws IOException { super.readFields(in); this.location = Bytes.toString(Bytes.readByteArray(in)); } public void write(DataOutput out) throws IOException { super.write(out); Bytes.writeByteArray(out, Bytes.toBytes(location)); } public int compareTo(Object other) { LocationTableSplit otherSplit = (LocationTableSplit)other; int result = Bytes.compareTo(getStartRow(), otherSplit.getStartRow()); return result; } public String toString() { return location.substring(0, location.indexOf('.')) + ": " + Bytes.toString(getStartRow()) + "-" + Bytes.toString(getEndRow()); } } // // A table split with a location that covers multiple regions. // static class MultiRegionTableSplit extends LocationTableSplit { byte[][] regions; public MultiRegionTableSplit() { } public MultiRegionTableSplit(byte[] tableName, String location, byte[][] regions) throws IOException { super(tableName, regions[0], regions[regions.length-1], location); this.location = location; this.regions = regions; } public void readFields(DataInput in) throws IOException { super.readFields(in); int n = in.readInt(); regions = new byte[n][]; for (int i = 0 ; i < n ; i++) { regions[i] = Bytes.readByteArray(in); } } public void write(DataOutput out) throws IOException { super.write(out); out.writeInt(regions.length); for (int i = 0 ; i < regions.length ; i++) { Bytes.writeByteArray(out, regions[i]); } } public String toString() { String str = location.substring(0, location.indexOf('.')) + ": "; for (int i = 0 ; i < regions.length ; i += 2) { if (i > 0) { str += ", "; } str += Bytes.toString(regi
Re: Maps running after reducers complete successfully?
Do we not have an option to store the map results in hdfs? Billy "Owen O'Malley" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] It isn't optimal, but it is the expected behavior. In general when we lose a TaskTracker, we want the map outputs regenerated so that any reduces that need to re-run (including speculative execution). We could handle it as a special case if: 1. We didn't lose any running reduces. 2. All of the reduces (including speculative tasks) are done with shuffling. 3. We don't plan on launching any more speculative reduces. If all 3 hold, we don't need to re-run the map tasks. Actually doing so, would be a pretty involved patch to the JobTracker/Schedulers. -- Owen
Re: Can hadoop sort by values rather than keys?
Might be able to use InverseMapper.class To help flip the key/value to value/key Billy "Jeremy Chow" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hi list, The default way hadoop doing its sorting is by keys , can it sort by values rather than keys? Regards, Jeremy -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
Re: adding nodes while computing
You should be able to add nodes to the cluster while jobs are running the jobtracker should start assigning task to the tasktrackers and dfs should start using the nodes for storage But map data files are stored on the slaves and copied to the reduce task so if a node goes down during a MR job then the maps will have to be ran again. Billy "Francois Berenger" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Hello, Is this possible to add slaves with IP address not known in advance to an Hadoop cluster while a computation is going on? And the reverse capability: is it possible to cleanly permanently remove a slave node from the hadoop cluster? Thank you, François.
Re: Is Hadoop the thing for us ?
I do not totally understand you job you are running but if each simulation can run independent of each other then you could run a map reduce job that will spread the simulation's over many servers so each one can run one or more at the same time this will give you a level of protection on servers going down and take care of the work on spreading out the work to server also this should be able to handle more then the 100K simulation mark you stated you would like to run. You would just need to write a the input code to handle splitting the simulations into splits that the MR framework could work with. Billy "Igor Nikolic" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Thank you for your comment, it did confirm my suspicions. You framed the problem correctly. I will probably invest a bit of time studying the framework anyway, to see if a rewrite is interesting, since we hit scaling limitations on our Agent scheduler framework. Our main computational load is the massive amount of agent reasoning ( think JbossRules) and inter-agent communication ( they need to sell and buy stuff to each other) so I am not sure if it is at all possible to break it down to small tasks, specially if this needs to happen across CPU's, the latency is going to kill us. Thanks igor John Martyniak wrote: I am new to Hadoop. So take this information with a grain of salt. But the power of Hadoop is breaking down big problems into small pieces and spreading it across many (thousands) of machines, in effect creating a massively parallel processing engine. But in order to take advantage of that functionality you must write your application to take advantage of it, using the Hadoop frameworks. So if I understand your dilemma correctly. I do not think that Hadoop is for you, unless you want to re-write your app to take advantage of it. And I suspect that if you have access to a traditional cluster, that will be a better alternative for you. Hope that this helps some. -John On Wed, Jun 25, 2008 at 7:33 AM, Igor Nikolic <[EMAIL PROTECTED]> wrote: Hello list We will be getting access to a cluster soon, and I was wondering whether this I should use Hadoop ? Or am I better of with the usual batch schedulers such as ProActive etc ? I am not a CS/CE person, and from reading the website I can not get a sense of whether hadoop is for me. A little background: We have a relatively large agent based simulation ( 20+ MB jar) that needs to be swept across very large parameter spaces. Agents communicate only within the simulation, so there is no interprocess communication. The parameter vector is max 20 long , the simulation may take 5-10 minutes on a normal desktop and it might return a few mb of raw data. We need 10k-100K runs, more if possible. Thanks for advice, even a short yes/no is welcome Greetings Igor -- ir. Igor Nikolic PhD Researcher Section Energy & Industry Faculty of Technology, Policy and Management Delft University of Technology, The Netherlands Tel: +31152781135 Email: [EMAIL PROTECTED] Web: http://www.igornikolic.com wiki server: http://wiki.tudelft.nl -- ir. Igor Nikolic PhD Researcher Section Energy & Industry Faculty of Technology, Policy and Management Delft University of Technology, The Netherlands Tel: +31152781135 Email: [EMAIL PROTECTED] Web: http://www.igornikolic.com wiki server: http://wiki.tudelft.nl
Re: how to write data to one file on HDF S by some clients Synchronized?
https://issues.apache.org/jira/browse/HADOOP-1700 "过佳" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Does HDFS support it?I need it to be synchronized , e.g. I call many clients to write a lots of IntWritable to one file. Best. Jarvis.
MR input Format Type mismatch
2008-06-21 20:30:18,928 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.LongWritable, recieved org.apache.hadoop.io.Text at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419) at com.compspy.mapred.RecordImport$MapClass.map(RecordImport.java:96) at com.compspy.mapred.RecordImport$MapClass.map(RecordImport.java:1) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) I am trying to run a MR job to take a file made with php as an import I have the lines formated like this Number\tText\r\n So each row has a line number from 0 - X a tab then the Text then a \r\n for a newline for each record I do not need the line number but been getting errors like above so I reformated my file so it would have one trying to get this to work Is there somethign I am missing should there be "" around the text or <> around the numbers etc..? The Text is going to be used to make Hbase updates/inserts in the reduce so I want to output Text,RowResults If I do not get the type mismatch I get a casting error when I have tryed different things I can reformat my input file if need to just need to get my mapper to work. How can I input my file in to a MR job? my mapper below I am trying to use public static class MapClass extends MapReduceBase implements Mapper { public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { // do some work here to get the below items from the input output.collect(rowkey, newvals); } }
Re: Ec2 and MR Job question
My second question is about the ec2 machines has anyone solved the hostname problem in a automated way? Example if I launch a ec2 server to run a task tracker the hostname reported back to my local cluster with its internal address the local reduce task can not access the map files on the ec2 machine because with the default hostname. I get a error: WARN org.apache.hadoop.mapred.ReduceTask: java.net.UnknownHostException: domU-12-31-39-00-A4-05.compute-1.internal Is there a automated way to start a tasktracker on a ec2 machine with it useing the public hostname so the local task can get the maps from the ec2 machines? example something like bin/hadoop-daemon.sh start tasktracker host=ec2-xx-xx-xx-xx.z-2.compute-1.amazonaws.com That I can run to start just the tasktracker with the correct hostname What I am trying to do is build a custom ami image that I can just launch when need to add extra cpu power to my cluster and to automatically start the tasktracker vi a shell script that can be ran at startup. Billy "Billy Pearson" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] I have a question someone may have answered here before but I can not find the answer. Assuming I have a cluster of servers hosting a large amount of data I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run the maps on a group of EC2 servers and run the reduces on the local cluster of 10 machines. The problem I am seeing is the map outputs, if I run the maps on EC2 they are stored local on the instance What I am looking to do is have the map output files stored in hdfs so I can kill the EC2 instances sense I do not need them for the reduces. The only way I can thank to do this is run two jobs one maper and store the output on hdfs and then run a second job to run the reduces from the map outputs store on the hfds. Is there away to make the mappers store the final output in hdfs?
Re: Ec2 and MR Job question
I understand how to run it as two jobs my only question is Is there away to make the mappers store the final output in hdfs? so I can kill the ec2 machines without waiting to the reduce stage ends! Billy "Chris K Wensel" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] well, to answer your last question first, just set the # reducers to zero. but you can't just run reducers without mappers (as far as I know, having never tried). so your local job will need to run identity mappers in order to feed your reducers. http://hadoop.apache.org/core/docs/r0.16.4/api/org/apache/hadoop/mapred/lib/IdentityMapper.html ckw On Jun 14, 2008, at 1:31 PM, Billy Pearson wrote: I have a question someone may have answered here before but I can not find the answer. Assuming I have a cluster of servers hosting a large amount of data I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run the maps on a group of EC2 servers and run the reduces on the local cluster of 10 machines. The problem I am seeing is the map outputs, if I run the maps on EC2 they are stored local on the instance What I am looking to do is have the map output files stored in hdfs so I can kill the EC2 instances sense I do not need them for the reduces. The only way I can thank to do this is run two jobs one maper and store the output on hdfs and then run a second job to run the reduces from the map outputs store on the hfds. Is there away to make the mappers store the final output in hdfs? -- Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
Ec2 and MR Job question
I have a question someone may have answered here before but I can not find the answer. Assuming I have a cluster of servers hosting a large amount of data I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run the maps on a group of EC2 servers and run the reduces on the local cluster of 10 machines. The problem I am seeing is the map outputs, if I run the maps on EC2 they are stored local on the instance What I am looking to do is have the map output files stored in hdfs so I can kill the EC2 instances sense I do not need them for the reduces. The only way I can thank to do this is run two jobs one maper and store the output on hdfs and then run a second job to run the reduces from the map outputs store on the hfds. Is there away to make the mappers store the final output in hdfs?
Re: Streaming --counters question
Streaming works on stdin and stdout so unless there was a way to capture the stdout as a counter I do not see any other way to report the to the jobtracker. Unless there was a url the task could call on the jobtracker to update counters. Billy "Miles Osborne" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] Is there support for counters in streaming? In particular, it would be nice to be able to access these after a job has run. Thanks! Miles -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.