Increase node-mappers capacity in single node
Hi, I tried changing mapreduce.job.maps to be more than 2 , but since I'm running in pseudo distributed mode, JobTracker is local and hence this property is not changed. I'm running on a 12 core machine and would like to make use of that ... Is there a way to trick Hadoop? I also tried using my virtual machine name instead of localhost, but no luck. Please help, Thanks, Mark
Can not access hadoop cluster from outside
Hi all, I meet a wried problem that I can not access hadoop cluster from outside. I have a client machine, and I can telnet namenode's port 9000 in this client machine , but I can not access the namenode through command hadoop fs 10.249.68.39:9000 -ls / It tells me Bad connection to FS. command aborted. exception: Call to / 10.249.68.39:9000 failed on local exception: java.io.IOException: Connection reset by peer Has anyone meet this problem before ? I guess maybe some network configuration problem, but not sure what's wrong. Thanks -- Best Regards Jeff Zhang
Re: Can not access hadoop cluster from outside
What is your ${fs.default.name} set to? On Fri, May 27, 2011 at 12:29 PM, Jeff Zhang zjf...@gmail.com wrote: Hi all, I meet a wried problem that I can not access hadoop cluster from outside. I have a client machine, and I can telnet namenode's port 9000 in this client machine , but I can not access the namenode through command hadoop fs 10.249.68.39:9000 -ls / It tells me Bad connection to FS. command aborted. exception: Call to / 10.249.68.39:9000 failed on local exception: java.io.IOException: Connection reset by peer Has anyone meet this problem before ? I guess maybe some network configuration problem, but not sure what's wrong. Thanks -- Best Regards Jeff Zhang -- Harsh J
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On Thu, May 26, 2011 at 07:01PM, Xu, Richard wrote: 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Is your DFS up running, by any chance? Cos
Re: java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal
On 05/26/2011 07:45 PM, subhransu wrote: Hello Geeks, I am a new bee to use hadoop and i am currently installed hadoop-0.20.203.0 I am running the sample programs part of this package but getting this error Any pointer to fix this ??? ~/Hadoop/hadoop-0.20.203.0 788 bin/hadoop jar hadoop-examples-0.20.203.0.jar sort java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal at org.apache.hadoop.security.UserGroupInformation.clinit(UserGroupInformation.java:246) at java.lang.J9VMInternals.initializeImpl(Native Method) at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:437) at org.apache.hadoop.examples.Sort.run(Sort.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Sort.main(Sort.java:187) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at you're running the IBM JVM. https://issues.apache.org/jira/browse/HADOOP-7211 Go to the IBM web site and download their slightly-modified version of Hadoop that works with their JVM, or switch to the Sun JVM, which is the only one that Hadoop is rigorously tested on. Sorry. -steve
RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
That setting is 3. From: DAN [mailto:chaidong...@163.com] Sent: Thursday, May 26, 2011 10:23 PM To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT] Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi, Richard Pay attention to Not able to place enough replicas, still in need of 1. Pls confirm right setting of dfs.replication in hdfs-site.xml. Good luck! Dan -- At 2011-05-27 08:01:37,Xu, Richard richard...@citi.commailto:richard...@citi.com wrote: Hi Folks, We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now. Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release. It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version. 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs: 2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2. Also, Configured Capacity is 0, cannot put any file to HDFS. 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing: 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting . 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904 I have tried all suggestions found so far, including 1) remove hadoop-name and hadoop-data folders and reformat namenode; 2) clean up all temp files/folders under /tmp; But nothing works. Your help is greatly appreciated. Thanks, RX
Re: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
First you need to make sure that your dfs daemons are running. You can start you namenode and datanode separately on the master and slave nodes, and see what happens with the following commands: hadoop namenode hadoop datanode The chancess are that your data node can not be started correctly. Let us know your error logs if there are errors. HTH~ Thanks Simon 2011/5/27 Xu, Richard richard...@citi.com That setting is 3. From: DAN [mailto:chaidong...@163.com] Sent: Thursday, May 26, 2011 10:23 PM To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT] Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi, Richard Pay attention to Not able to place enough replicas, still in need of 1. Pls confirm right setting of dfs.replication in hdfs-site.xml. Good luck! Dan -- At 2011-05-27 08:01:37,Xu, Richard richard...@citi.commailto: richard...@citi.com wrote: Hi Folks, We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now. Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release. It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version. 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs: 2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/ jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.infocould only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2. Also, Configured Capacity is 0, cannot put any file to HDFS. 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing: 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting . 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904 I have tried all suggestions found so far, including 1) remove hadoop-name and hadoop-data folders and reformat namenode; 2) clean up all temp files/folders under /tmp; But nothing works. Your help is greatly appreciated. Thanks, RX -- Regards, Simon
Error while trying to connect use s3 with Haddop in pseudo mode
I am trying to use Amazon s3 with Hadoop pseudo mode. I am getting some errors in the log for datanode , namenode , jobtracker etc. I did hadoop namenode -format before starting the hadoop services. Please help. I am able to use the hadoop and list the directories in my s3 bucket. I am using Cloudera CDH3 version. Here is the errors i am getting for different services. / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = ip-edited STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011 / 2011-05-27 12:57:58,329 INFO org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already set up for Hadoop, not re-installing. 2011-05-27 12:57:58,345 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.default.name): s3n://bucketnameedited is not of scheme 'hdfs'. at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:220) at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:205) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:325) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:280) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1533) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1473) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1491) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1616) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1626) 2011-05-27 12:57:58,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at ip-edited / 2011-05-27 12:58:05,472 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ipedited STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011 / 2011-05-27 12:58:05,785 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.default.name): s3n://bucketname is not of scheme 'hdfs'. at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:220) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:461) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217) 2011-05-27 12:58:05,786 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at ipedited 2011-05-27 12:58:04,074 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = ip-edited STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u0 STARTUP_MSG: build = -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14; compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011 / 2011-05-27 12:58:04,590 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2011-05-27 12:58:04,593 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2011-05-27 12:58:04,594 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2011-05-27 12:58:04,601 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2011-05-27 12:58:04,602 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2011-05-27 12:58:04,724 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as mapred 2011-05-27 12:58:04,780 INFO
Re:RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Hi, Richard You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. Good Luck Dan At 2011-05-27 19:34:10,Xu, Richard richard...@citi.com wrote: That setting is 3. From: DAN [mailto:chaidong...@163.com] Sent: Thursday, May 26, 2011 10:23 PM To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT] Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi, Richard Pay attention to Not able to place enough replicas, still in need of 1. Pls confirm right setting of dfs.replication in hdfs-site.xml. Good luck! Dan -- At 2011-05-27 08:01:37,Xu, Richard richard...@citi.commailto:richard...@citi.com wrote: Hi Folks, We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now. Because of the incompatibility issue between hbase and hadoop, we have to stick with hadoop 0.20.2-append release. It is very straight forward to make hadoop-0.20.203 running, but stuck for several days with hadoop-0.20.2, even the official release, not the append version. 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and jobtracker logs: 2011-05-26 12:30:29,169 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n odes, instead of 1 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2. Also, Configured Capacity is 0, cannot put any file to HDFS. 3. in datanode server, no error in logs, but tasktracker logs has the following suspicious thing: 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41904: starting 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 41904: starting 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 41904: starting . 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 63 on 41904: starting 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:41904 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904 I have tried all suggestions found so far, including 1) remove hadoop-name and hadoop-data folders and reformat namenode; 2) clean up all temp files/folders under /tmp; But nothing works. Your help is greatly appreciated. Thanks, RX
Re: Increase node-mappers capacity in single node
Hello Mark, This is due to a default configuration (tasktracker slots, as we generally call it) and is covered in the FAQ: http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F On Fri, May 27, 2011 at 11:56 AM, Mark question markq2...@gmail.com wrote: Hi, I tried changing mapreduce.job.maps to be more than 2 , but since I'm running in pseudo distributed mode, JobTracker is local and hence this property is not changed. I'm running on a 12 core machine and would like to make use of that ... Is there a way to trick Hadoop? I also tried using my virtual machine name instead of localhost, but no luck. Please help, Thanks, Mark -- Harsh J
Re: web site doc link broken
Am not sure if someone's already fixed this, but I head to the first link and click Learn About, and it gets redirected to the current/ just fine. There's only one such link on the page as well. On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote: Th Hadoop Common home page: http://hadoop.apache.org/common/ has a broken link (Learn About) to the docs. It tries to use: http://hadoop.apache.org/common/docs/stable/ which doesn't exist (404). It should probably be: http://hadoop.apache.org/common/docs/current/ Or, someone has deleted the stable docs, which I can't help you with. :-) Thanks. -- Harsh J
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
Using own InputSplit
I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line abcd might get split into ab and cd. How can one prevent this in hadoop and pig? I am looking for some examples where I can see how I can specify my own split so that it logically splits based on the record delimiter and not the block size. For some reason I am not able to get right examples online.
Re: Using own InputSplit
Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line abcd might get split into ab and cd. How can one prevent this in hadoop and pig? I am looking for some examples where I can see how I can specify my own split so that it logically splits based on the record delimiter and not the block size. For some reason I am not able to get right examples online. -- Harsh J
Re: Using own InputSplit
thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :) On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote: Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line abcd might get split into ab and cd. How can one prevent this in hadoop and pig? I am looking for some examples where I can see how I can specify my own split so that it logically splits based on the record delimiter and not the block size. For some reason I am not able to get right examples online. -- Harsh J
Re: Using own InputSplit
The query fit into mapreduce-user, since it primarily dealt with how Map/Reduce operates over data, just to clarify :) On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :) On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote: Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line abcd might get split into ab and cd. How can one prevent this in hadoop and pig? I am looking for some examples where I can see how I can specify my own split so that it logically splits based on the record delimiter and not the block size. For some reason I am not able to get right examples online. -- Harsh J -- Harsh J
Re: Using own InputSplit
Actually this link confused me http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input Clearly, logical splits based on input-size is insufficient for many applications since record boundaries must be respected. In such cases, the application should implement a RecordReader, who is responsible for respecting record-boundaries and presents a record-oriented view of the logical InputSplit to the individual task. But it looks like application doesn't need to do that since it's done default? Or am I misinterpreting this entirely? On Fri, May 27, 2011 at 10:08 AM, Mohit Anchlia mohitanch...@gmail.com wrote: thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :) On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote: Mohit, Please do not cross-post a question to multiple lists unless you're announcing something. What you describe, does not happen; and the way the splitting is done for Text files is explained in good detail here: http://wiki.apache.org/hadoop/HadoopMapReduce Hope this solves your doubt :) On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For eg: Line abcd might get split into ab and cd. How can one prevent this in hadoop and pig? I am looking for some examples where I can see how I can specify my own split so that it logically splits based on the record delimiter and not the block size. For some reason I am not able to get right examples online. -- Harsh J
Re: Using own InputSplit
Mohit, On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Actually this link confused me http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input Clearly, logical splits based on input-size is insufficient for many applications since record boundaries must be respected. In such cases, the application should implement a RecordReader, who is responsible for respecting record-boundaries and presents a record-oriented view of the logical InputSplit to the individual task. But it looks like application doesn't need to do that since it's done default? Or am I misinterpreting this entirely? For any type of InputFormat Hadoop provides along with itself, it should already handle this for you (Text Files (say, \n-ended), Sequence Files, Avro Datafiles). If you have a custom file format that defines its own record delimiter character(s); you would surely need to write your own InputFormat that splits across properly (the wiki still helps on how to manage the reads across the first split and the subsequents). -- Harsh J
How to copy over using dfs
If I have to overwrite a file I generally use hadoop dfs -rm file hadoop dfs -copyFromLocal or -put file Is there a command to overwrite/replace the file instead of doing rm first?
RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Friday, May 27, 2011 11:42 AM To: common-user@hadoop.apache.org Cc: Xu, Richard [ICG-IT] Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
Add more to that: I also tried start 0.20.2 on a linux machine in distributed mode, same error. I had successfully started 0.20.203 on this linux machine with same config. Seems that it is not related to Solaris. Could it caused by port? I checked a few, did not find anyone blocked. -Original Message- From: Xu, Richard [ICG-IT] Sent: Friday, May 27, 2011 4:18 PM To: 'Allen Wittenauer'; 'common-user@hadoop.apache.org' Subject: RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? -Original Message- From: Allen Wittenauer [mailto:a...@apache.org] Sent: Friday, May 27, 2011 11:42 AM To: common-user@hadoop.apache.org Cc: Xu, Richard [ICG-IT] Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster On May 27, 2011, at 7:26 AM, DAN wrote: You see you have 2 Solaris servers for now, and dfs.replication is setted as 3. These don't match. That doesn't matter. HDFS will basically flag any files written with a warning that they are under-replicated. The problem is that the datanode processes aren't running and/or aren't communicating to the namenode. That's what the java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 means. It should also be pointed out that writing to /tmp (the default) is a bad idea. This should get changed. Also, since you are running Solaris, check the FAQ on some settings you'll need to do in order to make Hadoop's broken username detection to work properly, amongst other things.
Has anyone else seen out of memory errors at the start of combiner tasks?
I have a job that uses an identity mapper and the same code for both the combiner and the reducer. In a small percentage of combiner tasks, after a few seconds I get errors that look like this: FATAL mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError: Java heap space org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781) org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Those tasks fail, though then subsequently restart and complete successfully. Eventually the whole job completes successfully. Nevertheless this happens consistently enough that it is clearly a problem with my code rather than a transient glitch on my cluster. From the stack it looks like the out of memory error is happening before any of my combiner code has had a chance to run. If I don't specify a combiner class and run everything through reducers, there are no out of memory errors and everything works fine. Obviously I have a bug, but I'm wondering if anyone has seen this particular failure mode before and has insights into why it is happening. My hypothesis is that I have some memory usage within the combiner/reducer code that doesn't scale to the largest inputs my job is getting. This is a problem for combiners and not reducers because more combiners than reducers run on a single task tracker node. The problematic job is not the one that's failing during initialization but one that is running at the same time on the same node and chewing up all the memory. Does this hypothesis sound plausible?
Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster
On May 27, 2011, at 1:18 PM, Xu, Richard wrote: Hi Allen, Thanks a lot for your response. I agree with you that it does not matter with replication settings. What really bothered me is same environment, same configures, hadoop 0.20.203 takes us 3 mins, why 0.20.2 took 3 days. Can you pls. shed more light on how to make Hadoop's broken username detection to work properly? It's in the FAQ so that I don't have to do that. http://wiki.apache.org/hadoop/FAQ Also, check your logs. All your logs. Not just the namenode log.
Re: How to copy over using dfs
I don't think so, becauseI read somewhere that this is to insure the safety of the produced data. Hence Hadoop will force you to do this to know what exactly is happening. Mark On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia mohitanch...@gmail.comwrote: If I have to overwrite a file I generally use hadoop dfs -rm file hadoop dfs -copyFromLocal or -put file Is there a command to overwrite/replace the file instead of doing rm first?
Re: web site doc link broken
I also got the following from learn about : Not Found The requested URL /common/docs/stable/ was not found on this server. -- Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at hadoop.apache.orgPort 80 Mark On Fri, May 27, 2011 at 8:03 AM, Harsh J ha...@cloudera.com wrote: Am not sure if someone's already fixed this, but I head to the first link and click Learn About, and it gets redirected to the current/ just fine. There's only one such link on the page as well. On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote: Th Hadoop Common home page: http://hadoop.apache.org/common/ has a broken link (Learn About) to the docs. It tries to use: http://hadoop.apache.org/common/docs/stable/ which doesn't exist (404). It should probably be: http://hadoop.apache.org/common/docs/current/ Or, someone has deleted the stable docs, which I can't help you with. :-) Thanks. -- Harsh J
Re: web site doc link broken
Alright, I see its stable/ now. Weird, is my cache playing with me? On Sat, May 28, 2011 at 5:08 AM, Mark question markq2...@gmail.com wrote: I also got the following from learn about : Not Found The requested URL /common/docs/stable/ was not found on this server. -- Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at hadoop.apache.orgPort 80 Mark On Fri, May 27, 2011 at 8:03 AM, Harsh J ha...@cloudera.com wrote: Am not sure if someone's already fixed this, but I head to the first link and click Learn About, and it gets redirected to the current/ just fine. There's only one such link on the page as well. On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote: Th Hadoop Common home page: http://hadoop.apache.org/common/ has a broken link (Learn About) to the docs. It tries to use: http://hadoop.apache.org/common/docs/stable/ which doesn't exist (404). It should probably be: http://hadoop.apache.org/common/docs/current/ Or, someone has deleted the stable docs, which I can't help you with. :-) Thanks. -- Harsh J -- Harsh J
Re: How to copy over using dfs
Mohit, On Sat, May 28, 2011 at 12:58 AM, Mohit Anchlia mohitanch...@gmail.com wrote: If I have to overwrite a file I generally use hadoop dfs -rm file hadoop dfs -copyFromLocal or -put file Is there a command to overwrite/replace the file instead of doing rm first? There's no command available right now to do this (best to write a wrapper script acting as a command, or a custom shell utility program?) That said, https://issues.apache.org/jira/browse/HDFS-1608 covers the addition of -f/-overwrite feature and it may be available in future releases (possibly 0.23+). -- Harsh J