MapReduce Traces
Hi all, I'm looking for some real-world Mapreduce traces (jobhistory) to analyze the characteristics. But I couldn't found any except for SWIM https://github.com/SWIMProjectUCB/SWIM/wiki/Workloads-repository that contains several traces from Facebook. However, they're incomplete and lack job output paths information. Is there any other publicly available Mapreduce traces? Thanks! Lixiang
Re: MapReduce Traces
That looks like some nice sample applications, but I'd like more realistic and larger scale traces. Thanks anyway. On Sat, Apr 25, 2015 at 10:12 PM, Ted Yu yuzhih...@gmail.com wrote: Have you looked at http://nuage.cs.washington.edu/repository.php ? Cheers On Sat, Apr 25, 2015 at 2:43 AM, Lixiang Ao aolixi...@gmail.com wrote: Hi all, I'm looking for some real-world Mapreduce traces (jobhistory) to analyze the characteristics. But I couldn't found any except for SWIM https://github.com/SWIMProjectUCB/SWIM/wiki/Workloads-repository that contains several traces from Facebook. However, they're incomplete and lack job output paths information. Is there any other publicly available Mapreduce traces? Thanks! Lixiang
Re: Namenode HA failover time
I am curious about this, too. On Sat, Nov 29, 2014 at 2:35 PM, Alice 6900848...@gmail.com wrote: Hi,all: Namenode HA (NFS, QJM) is available in hadoop 2.x (HDFS-1623). It provides fast failover for Namenode, but I can't find any description on how long does it take to recover from failure. Could any one tell me? Thanks.
Re: Benchmark Failure
Checked the logs, and turned out to be configuration problem. Just set dfs.namenode.fs-limits.min-block-size to 1 and it's fixed Thanks. On Wed, Mar 19, 2014 at 2:51 PM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Seems to be this is issue, which is logged..Please check following jirafor sameHope you also facing same issue... https://issues.apache.org/jira/browse/HDFS-4929 Thanks Regards Brahma Reddy Battula -- *From:* Lixiang Ao [aolixi...@gmail.com] *Sent:* Tuesday, March 18, 2014 10:34 AM *To:* user@hadoop.apache.org *Subject:* Re: Benchmark Failure the version is release 2.2.0 2014年3月18日 上午12:26于 Lixiang Ao aolixi...@gmail.com写道: Hi all, I'm running jobclient tests(on single node), other tests like TestDFSIO, mrbench succeed except nnbench. I got a lot of Exceptions but without any explanation(see below). Could anyone tell me what might went wrong? Thanks! 14/03/17 23:54:22 INFO hdfs.NNBench: Waiting in barrier for: 112819 ms 14/03/17 23:54:23 INFO mapreduce.Job: Job job_local2133868569_0001 running in uber mode : false 14/03/17 23:54:23 INFO mapreduce.Job: map 0% reduce 0% 14/03/17 23:54:28 INFO mapred.LocalJobRunner: hdfs:// 0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125 map 14/03/17 23:54:29 INFO mapreduce.Job: map 6% reduce 0% 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close (1000 Exceptions) . . . results: File System Counters FILE: Number of bytes read=18769411 FILE: Number of bytes written=21398315 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=11185 HDFS: Number of bytes written=19540 HDFS: Number of read operations=325 HDFS: Number of large read operations=0 HDFS: Number of write operations=13210 Map-Reduce Framework Map input records=12 Map output records=95 Map output bytes=1829 Map output materialized bytes=2091 Input split bytes=1538 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=0 Reduce input records=95 Reduce output records=8 Spilled Records=214 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=211 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=4401004544 File Input Format Counters Bytes Read=1490 File Output Format Counters Bytes Written=170 14/03/17 23:56:18 INFO hdfs.NNBench: -- NNBench -- : 14/03/17 23:56:18 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 14/03/17 23:56:18 INFO hdfs.NNBench:Date time: 2014-03-17 23:56:18,619 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: Test Operation: create_write 14/03/17 23:56:18 INFO hdfs.NNBench: Start time: 2014-03-17 23:56:15,521 14/03/17 23:56:18 INFO hdfs.NNBench:Maps to run: 12 14/03/17 23:56:18 INFO hdfs.NNBench: Reduces to run: 6 14/03/17 23:56:18 INFO hdfs.NNBench: Block Size (bytes): 1 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes to write: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes per checksum: 1 14/03/17 23:56:18 INFO hdfs.NNBench:Number of files: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: Replication factor: 3 14/03/17 23:56:18 INFO hdfs.NNBench: Successful file operations: 0 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: # maps that missed the barrier: 11 14/03/17 23:56:18 INFO hdfs.NNBench: # exceptions: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench:TPS: Create/Write/Close: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 14/03/17 23:56:18 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 14/03/17 23:56:18 INFO hdfs.NNBench:Avg Lat (ms): Close: NaN 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 14/03/17
Benchmark Failure
Hi all, I'm running jobclient tests(on single node), other tests like TestDFSIO, mrbench succeed except nnbench. I got a lot of Exceptions but without any explanation(see below). Could anyone tell me what might went wrong? Thanks! 14/03/17 23:54:22 INFO hdfs.NNBench: Waiting in barrier for: 112819 ms 14/03/17 23:54:23 INFO mapreduce.Job: Job job_local2133868569_0001 running in uber mode : false 14/03/17 23:54:23 INFO mapreduce.Job: map 0% reduce 0% 14/03/17 23:54:28 INFO mapred.LocalJobRunner: hdfs:// 0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125 map 14/03/17 23:54:29 INFO mapreduce.Job: map 6% reduce 0% 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close (1000 Exceptions) . . . results: File System Counters FILE: Number of bytes read=18769411 FILE: Number of bytes written=21398315 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=11185 HDFS: Number of bytes written=19540 HDFS: Number of read operations=325 HDFS: Number of large read operations=0 HDFS: Number of write operations=13210 Map-Reduce Framework Map input records=12 Map output records=95 Map output bytes=1829 Map output materialized bytes=2091 Input split bytes=1538 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=0 Reduce input records=95 Reduce output records=8 Spilled Records=214 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=211 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=4401004544 File Input Format Counters Bytes Read=1490 File Output Format Counters Bytes Written=170 14/03/17 23:56:18 INFO hdfs.NNBench: -- NNBench -- : 14/03/17 23:56:18 INFO hdfs.NNBench: Version: NameNode Benchmark 0.4 14/03/17 23:56:18 INFO hdfs.NNBench:Date time: 2014-03-17 23:56:18,619 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: Test Operation: create_write 14/03/17 23:56:18 INFO hdfs.NNBench: Start time: 2014-03-17 23:56:15,521 14/03/17 23:56:18 INFO hdfs.NNBench:Maps to run: 12 14/03/17 23:56:18 INFO hdfs.NNBench: Reduces to run: 6 14/03/17 23:56:18 INFO hdfs.NNBench: Block Size (bytes): 1 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes to write: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes per checksum: 1 14/03/17 23:56:18 INFO hdfs.NNBench:Number of files: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: Replication factor: 3 14/03/17 23:56:18 INFO hdfs.NNBench: Successful file operations: 0 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: # maps that missed the barrier: 11 14/03/17 23:56:18 INFO hdfs.NNBench: # exceptions: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench:TPS: Create/Write/Close: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 14/03/17 23:56:18 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 14/03/17 23:56:18 INFO hdfs.NNBench:Avg Lat (ms): Close: NaN 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 1131 14/03/17 23:56:18 INFO hdfs.NNBench:RAW DATA: Longest Map Time (ms): 1.395071776653E12 14/03/17 23:56:18 INFO hdfs.NNBench:RAW DATA: Late maps: 11 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: # of exceptions: 1000 14/03/17 23:56:18 INFO hdfs.NNBench:
Run multiple HDFS instances
Hi all, Can I run mutiple HDFS instances, that is, n seperate namenodes and n datanodes, on a single machine? I've modified core-site.xml and hdfs-site.xml to avoid port and file conflicting between HDFSes, but when I started the second HDFS, I got the errors: Starting namenodes on [localhost] localhost: namenode running as process 20544. Stop it first. localhost: datanode running as process 20786. Stop it first. Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 21074. Stop it first. Is there a way to solve this? Thank you in advance, Lixiang Ao
Run multiple HDFS instances
I modified sbin/hadoop-daemon.sh, where HADOOP_PID_DIR is set. It works! Everything looks fine now. Seems direct command hdfs namenode gives a better sense of control :) Thanks a lot. 在 2013年4月18日星期四,Harsh J 写道: Yes you can but if you want the scripts to work, you should have them use a different PID directory (I think its called HADOOP_PID_DIR) every time you invoke them. I instead prefer to start the daemons up via their direct command such as hdfs namenode and so and move them to the background, with a redirect for logging. On Thu, Apr 18, 2013 at 2:34 PM, Lixiang Ao aolixi...@gmail.com wrote: Hi all, Can I run mutiple HDFS instances, that is, n seperate namenodes and n datanodes, on a single machine? I've modified core-site.xml and hdfs-site.xml to avoid port and file conflicting between HDFSes, but when I started the second HDFS, I got the errors: Starting namenodes on [localhost] localhost: namenode running as process 20544. Stop it first. localhost: datanode running as process 20786. Stop it first. Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 21074. Stop it first. Is there a way to solve this? Thank you in advance, Lixiang Ao -- Harsh J
Re: Run multiple HDFS instances
Actually I'm trying to do something like combining multiple namenodes so that they present themselves to clients as a single namespace, implementing basic namenode functionalities. 在 2013年4月18日星期四,Chris Embree 写道: Glad you got this working... can you explain your use case a little? I'm trying to understand why you might want to do that. On Thu, Apr 18, 2013 at 11:29 AM, Lixiang Ao aolixi...@gmail.comjavascript:_e({}, 'cvml', 'aolixi...@gmail.com'); wrote: I modified sbin/hadoop-daemon.sh, where HADOOP_PID_DIR is set. It works! Everything looks fine now. Seems direct command hdfs namenode gives a better sense of control :) Thanks a lot. 在 2013年4月18日星期四,Harsh J 写道: Yes you can but if you want the scripts to work, you should have them use a different PID directory (I think its called HADOOP_PID_DIR) every time you invoke them. I instead prefer to start the daemons up via their direct command such as hdfs namenode and so and move them to the background, with a redirect for logging. On Thu, Apr 18, 2013 at 2:34 PM, Lixiang Ao aolixi...@gmail.com wrote: Hi all, Can I run mutiple HDFS instances, that is, n seperate namenodes and n datanodes, on a single machine? I've modified core-site.xml and hdfs-site.xml to avoid port and file conflicting between HDFSes, but when I started the second HDFS, I got the errors: Starting namenodes on [localhost] localhost: namenode running as process 20544. Stop it first. localhost: datanode running as process 20786. Stop it first. Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 21074. Stop it first. Is there a way to solve this? Thank you in advance, Lixiang Ao -- Harsh J
Re: Run multiple HDFS instances
Not really, fereration provides seperate namespaces, but I want it looks like one namespace. My basic idea is to maintain a map from files to namenodes, it receive RPC calls from client and forward them to specific namenode that in charge of the file. It's challenging for me but I'll figure out whether it works. 在 2013年4月19日星期五,Hemanth Yamijala 写道: Are you trying to implement something like namespace federation, that's a part of Hadoop 2.0 - http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-project-dist/hadoop-hdfs/Federation.html On Thu, Apr 18, 2013 at 10:02 PM, Lixiang Ao aolixi...@gmail.comjavascript:_e({}, 'cvml', 'aolixi...@gmail.com'); wrote: Actually I'm trying to do something like combining multiple namenodes so that they present themselves to clients as a single namespace, implementing basic namenode functionalities. 在 2013年4月18日星期四,Chris Embree 写道: Glad you got this working... can you explain your use case a little? I'm trying to understand why you might want to do that. On Thu, Apr 18, 2013 at 11:29 AM, Lixiang Ao aolixi...@gmail.comwrote: I modified sbin/hadoop-daemon.sh, where HADOOP_PID_DIR is set. It works! Everything looks fine now. Seems direct command hdfs namenode gives a better sense of control :) Thanks a lot. 在 2013年4月18日星期四,Harsh J 写道: Yes you can but if you want the scripts to work, you should have them use a different PID directory (I think its called HADOOP_PID_DIR) every time you invoke them. I instead prefer to start the daemons up via their direct command such as hdfs namenode and so and move them to the background, with a redirect for logging. On Thu, Apr 18, 2013 at 2:34 PM, Lixiang Ao aolixi...@gmail.com wrote: Hi all, Can I run mutiple HDFS instances, that is, n seperate namenodes and n datanodes, on a single machine? I've modified core-site.xml and hdfs-site.xml to avoid port and file conflicting between HDFSes, but when I started the second HDFS, I got the errors: Starting namenodes on [localhost] localhost: namenode running as process 20544. Stop it first. localhost: datanode running as process 20786. Stop it first. Starting secondary namenodes [0.0.0.0] 0.0.0.0: secondarynamenode running as process 21074. Stop it first. Is there a way to solve this? Thank you in advance, Lixiang Ao -- Harsh J