Fwd: Need help
Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek
Re: Need help
Does that mean hadoop is not scalable wrt heterogeneous environment? and one more question is can we run different application on the same hadoop cluster . Thanks. Regards, Ashish On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.comwrote: Hadoop has always been reasonably agnostic wrt hardware and homogeneity. There are optimizations in configuration for near homogeneous machines. On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com wrote: Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Need help
Can you tell few of the challenges in configuring heterogeneous cluster...or can pass on some link where I would get some information regarding challenges in running Hadoop on heterogeneous hardware One more things is How about running different applications on the same Hadoop cluster?and what challenges are involved in it ? Thanks, Regards, Ashish On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop jason.had...@gmail.comwrote: I don't know anyone who has a completely homogeneous cluster. So hadoop is scalable across heterogeneous environments. I stated that configuration is simpler if the machines are similar (There are optimizations in configuration for near homogeneous machines.) On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com wrote: Does that mean hadoop is not scalable wrt heterogeneous environment? and one more question is can we run different application on the same hadoop cluster . Thanks. Regards, Ashish On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.com wrote: Hadoop has always been reasonably agnostic wrt hardware and homogeneity. There are optimizations in configuration for near homogeneous machines. On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com wrote: Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Need help
Hello Everybody, How can we handle different applications having different requirement being run on the same hadoop cluster ? What are the various approaches to solve such problem.. if possible please mention some of those ideas. Does such implementation exists ? Thanks , Regards, Ashish On Thu, Jun 18, 2009 at 9:36 PM, jason hadoop jason.had...@gmail.comwrote: For me, I like to have one configuration file that I distribute to all of the machines in my cluster via rsync. In there are things like the number of tasks per node to run, and where to store dfs data and local temporary data, and the limits to storage for the machines. If the machines are very different, it becomes important to tailor the configuration file per machine or type of machine. At this point, you are pretty much going to have to spend the time, reading through the details of configuring a hadoop cluster. On Thu, Jun 18, 2009 at 8:33 AM, ashish pareek pareek...@gmail.com wrote: Can you tell few of the challenges in configuring heterogeneous cluster...or can pass on some link where I would get some information regarding challenges in running Hadoop on heterogeneous hardware One more things is How about running different applications on the same Hadoop cluster?and what challenges are involved in it ? Thanks, Regards, Ashish On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop jason.had...@gmail.com wrote: I don't know anyone who has a completely homogeneous cluster. So hadoop is scalable across heterogeneous environments. I stated that configuration is simpler if the machines are similar (There are optimizations in configuration for near homogeneous machines.) On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com wrote: Does that mean hadoop is not scalable wrt heterogeneous environment? and one more question is can we run different application on the same hadoop cluster . Thanks. Regards, Ashish On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.com wrote: Hadoop has always been reasonably agnostic wrt hardware and homogeneity. There are optimizations in configuration for near homogeneous machines. On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com wrote: Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Need help
Hadoop can be run on a hardware heterogeneous cluster. Currently, Hadoop clusters really only run well on Linux although you can run a Hadoop client on non-Linux machines. You will need to have a special configuration for each of the machine in your cluster based on their hardware profile. Ideally, you'll be able to group the machines in your cluster into classes of machines (e.g. machines with 1GB of RAM and 2 core versus 4GB of RAM and 4 core) to reduce the burden of managing multiple configurations. If you are talking about a Hadoop cluster that is completely heterogeneous (each machine is completely different), the management overhead could be high. Configuration variables like mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum should be set based on the number of cores/memory in each machine. Variables like mapred.child.java.opts need to be set differently based on the amount of memory the machine has (e.g. -Xmx250m). You should have at least 250MB of memory dedicated to each task although more is better. It's also wise to make sure that each task has the same amount of memory regardless of the machine it's scheduled on; otherwise, tasks might succeed or fail based on which machine gets the task. This asymmetry will make debugging harder. You can use our online configurator (http://www.cloudera.com/configurator/ ), to generate optimized configurations for each class of machines in your cluster. It will ask simple question about your configuration and then produce a hadoop-site.xml file. Good luck! -Matt On Jun 18, 2009, at 8:33 AM, ashish pareek wrote: Can you tell few of the challenges in configuring heterogeneous cluster...or can pass on some link where I would get some information regarding challenges in running Hadoop on heterogeneous hardware One more things is How about running different applications on the same Hadoop cluster?and what challenges are involved in it ? Thanks, Regards, Ashish On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop jason.had...@gmail.comwrote: I don't know anyone who has a completely homogeneous cluster. So hadoop is scalable across heterogeneous environments. I stated that configuration is simpler if the machines are similar (There are optimizations in configuration for near homogeneous machines.) On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com wrote: Does that mean hadoop is not scalable wrt heterogeneous environment? and one more question is can we run different application on the same hadoop cluster . Thanks. Regards, Ashish On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.com wrote: Hadoop has always been reasonably agnostic wrt hardware and homogeneity. There are optimizations in configuration for near homogeneous machines. On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com wrote: Hello, I am doing my master my final year project is on Hadoop ...so I would like to know some thing about Hadoop cluster i.e, Do new version of Hadoop are able to handle heterogeneous hardware.If you have any informantion regarding these please mail me as my project is in heterogenous environment. Thanks! Reagrds, Ashish Pareek -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: I need help
Razen Alharbi wrote: Thanks everybody, The issue was that hadoop writes all the outputs to stderr instead of stdout and i don't know why. I would really love to know why the usual hadoop job progress is written to stderr. because there is a line in log4.properties telling it to do just that? log4j.appender.console.target=System.err -- Steve Loughran http://www.1060.org/blogxter/publish/5 Author: Ant in Action http://antbook.org/
Re: I need help
Thanks everybody, The issue was that hadoop writes all the outputs to stderr instead of stdout and i don't know why. I would really love to know why the usual hadoop job progress is written to stderr. Thanks again. Razen Razen Alharbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Thanks, Razen -- View this message in context: http://www.nabble.com/I-need-help-tp23273273p23307094.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
I need help
Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Thanks, Razen
Re: I need help
Razen Al Harbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Why not just invoke the Hadoop job submission calls yourself, no need to exec anything? Look at org.apache.hadoop.util.RunJar to see what you need to do. Avoid calling RunJar.main() directly as - it calls System.exit() when it wants to exit with an error - it adds shutdown hooks -steve
Re: I need help
Hi, Is that command available for all nodes? Did you try as below? ;) Process proc = rt.exec(/bin/hostname); .. output.collect(hostname, disk usage); On Tue, Apr 28, 2009 at 6:13 PM, Razen Al Harbi razen.alha...@yahoo.com wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Thanks, Razen -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org
Re: I need help
Thanks for the reply, -Steve: I know that I can use the JobClient to run or submit jobs; however, for the time being I need to exec the job as a separate process. -Edward: The forked job is not executed from witin a map or reduce so I dont need to do data collection. It seems for some reason the output of the reduce tasks is not written to stdout because when I tried to direct the output to a tmp file using the following command (hadoop jar GraphClean args tmp), nothing was written to the file and the output still goes to the screen. Regards, Razen Razen Alharbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Thanks, Razen -- View this message in context: http://www.nabble.com/I-need-help-tp23273273p23284528.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: I need help
Why not read the output result after job done? And, if you wanted see the log4j log, you need to set the stdout option to log4jproperties. On Wed, Apr 29, 2009 at 4:35 AM, Razen Alharbi razen.alha...@yahoo.com wrote: Thanks for the reply, -Steve: I know that I can use the JobClient to run or submit jobs; however, for the time being I need to exec the job as a separate process. -Edward: The forked job is not executed from witin a map or reduce so I dont need to do data collection. It seems for some reason the output of the reduce tasks is not written to stdout because when I tried to direct the output to a tmp file using the following command (hadoop jar GraphClean args tmp), nothing was written to the file and the output still goes to the screen. Regards, Razen Razen Alharbi wrote: Hi all, I am writing an application in which I create a forked process to execute a specific Map/Reduce job. The problem is that when I try to read the output stream of the forked process I get nothing and when I execute the same job manually it starts printing the output I am expecting. For clarification I will go through the simple code snippet: Process p = rt.exec(hadoop jar GraphClean args); BufferedReader reader = new BufferedReader(new InputStreamReader(p.getInputStream())); String line = null; check = true; while(check){ line = reader.readLine(); if(line != null){// I know this will not finish it's only for testing. System.out.println(line); } } If I run this code nothing shows up. But if execute the command (hadoop jar GraphClean args) from the command line it works fine. I am using hadoop 0.19.0. Thanks, Razen -- View this message in context: http://www.nabble.com/I-need-help-tp23273273p23284528.html Sent from the Hadoop core-user mailing list archive at Nabble.com. -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org
Re: hadoop need help please suggest
Sorry for the inconvenience caused ...I will not spam core dev. Scale we are thinking in terms of more nodes in coming future can go to petabytes of data Can you please give some pointers for handling the same issue, i am quite new to hadoop Regards, Snehal Raghu Angadi wrote: What is scale you are thinking of? (10s, 100s or more nodes)? The memory for metadata at NameNode you mentioned is that main issue with small files. There are multiple alternatives for the dealing with that. This issue is discussed many times here. Also please use core-user@ id alone for asking for help.. you don't need to send to core-devel@ Raghu. snehal nagmote wrote: Hello Sir, I have some doubts, please help me. we have requirement of scalable storage system, we have developed one agro-advisory system in which farmers will sent the crop pictures particularly in sequential manner some 6-7 photos of 3-4 kb each would be stored in storage server and these photos would be read sequentially by scientist to detect the problem, writing to images would not be done. So for storing these images we are using hadoop file system, is it feasible to use hadoop file system for the same purpose. As also the images are of only 3-4 kb and hadoop reads the data in blocks of size 64 mb how can we increase the performance, what could be the tricks and tweaks that should be done to use hadoop for such kind of purpose. Next problem is as hadoop stores all the metadata in memory,can we use some mechanism to store the files in the block of some greater size because as the files would be of small size,so it will store the lots metadata and will overflow the main memory please suggest what could be done regards, Snehal -- View this message in context: http://www.nabble.com/hadoop-need-help-please-suggest-tp22666530p22721718.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Need Help hdfs -How to minimize access Time
Hey Snehal (removing the core-dev list; please only post to one at a time), The access time should be fine, but it depends on what you define as an acceptable access time. If this is not acceptable, I'd suggest putting it behind a web cache like Squid. The best way to find out is to use the system as a prototype and to evaluate it based on your requirements. Hadoop is useful for small data, but optimized and originally designed only for big data. The primary downfall of the small files is that it may cost more per file in terms of memory. Hadoop as a solution may be overkill, however, if your total storage size is never going to grow very large. We currently use HDFS for mostly random access. Brian On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote: Hello Sir, I am doing mtech in iiit hyderabad , I am doing research project whose aim is to develop the scalable storage system For esagu. The esagu is all about taking the crop images from the fields and store it in the filesystem and then those images would be accessed by agricultural scientist to detect the problem, So currently many fields in the A.P. are using this system,it may go beyond A.Pso we require storage system 1)My problem is we are using hadoop for the storage, but hadoop retrieves (reads/writes) in 64 mb chunk . these images stored would be very small size say max 2 to 3 mb, So access time would be larger in case of accessing images, Can you suggest how this access time can be reduced.Is there anyother thing we could do to improve the performance like building our own cache, To what extent it would be feasible or helpful in such kind of application. 2)Second is does hadoop would be useful for small small data like this, if not what tricks we could do to make it usable for such knid of application Please help, Thanks in advance Regards, Snehal Nagmote IIIT Hyderabad
Re: hadoop need help please suggest
What is scale you are thinking of? (10s, 100s or more nodes)? The memory for metadata at NameNode you mentioned is that main issue with small files. There are multiple alternatives for the dealing with that. This issue is discussed many times here. Also please use core-user@ id alone for asking for help.. you don't need to send to core-devel@ Raghu. snehal nagmote wrote: Hello Sir, I have some doubts, please help me. we have requirement of scalable storage system, we have developed one agro-advisory system in which farmers will sent the crop pictures particularly in sequential manner some 6-7 photos of 3-4 kb each would be stored in storage server and these photos would be read sequentially by scientist to detect the problem, writing to images would not be done. So for storing these images we are using hadoop file system, is it feasible to use hadoop file system for the same purpose. As also the images are of only 3-4 kb and hadoop reads the data in blocks of size 64 mb how can we increase the performance, what could be the tricks and tweaks that should be done to use hadoop for such kind of purpose. Next problem is as hadoop stores all the metadata in memory,can we use some mechanism to store the files in the block of some greater size because as the files would be of small size,so it will store the lots metadata and will overflow the main memory please suggest what could be done regards, Snehal
Need Help hdfs -How to minimize access Time
Hello Sir, I am doing mtech in iiit hyderabad , I am doing research project whose aim is to develop the scalable storage system For esagu. The esagu is all about taking the crop images from the fields and store it in the filesystem and then those images would be accessed by agricultural scientist to detect the problem, So currently many fields in the A.P. are using this system,it may go beyond A.Pso we require storage system 1)My problem is we are using hadoop for the storage, but hadoop retrieves (reads/writes) in 64 mb chunk . these images stored would be very small size say max 2 to 3 mb, So access time would be larger in case of accessing images, Can you suggest how this access time can be reduced.Is there anyother thing we could do to improve the performance like building our own cache, To what extent it would be feasible or helpful in such kind of application. 2)Second is does hadoop would be useful for small small data like this, if not what tricks we could do to make it usable for such knid of application Please help, Thanks in advance Regards, Snehal Nagmote IIIT Hyderabad
hadoop need help please suggest
Hello Sir, I have some doubts, please help me. we have requirement of scalable storage system, we have developed one agro-advisory system in which farmers will sent the crop pictures particularly in sequential manner some 6-7 photos of 3-4 kb each would be stored in storage server and these photos would be read sequentially by scientist to detect the problem, writing to images would not be done. So for storing these images we are using hadoop file system, is it feasible to use hadoop file system for the same purpose. As also the images are of only 3-4 kb and hadoop reads the data in blocks of size 64 mb how can we increase the performance, what could be the tricks and tweaks that should be done to use hadoop for such kind of purpose. Next problem is as hadoop stores all the metadata in memory,can we use some mechanism to store the files in the block of some greater size because as the files would be of small size,so it will store the lots metadata and will overflow the main memory please suggest what could be done regards, Snehal
extreme nubbie need help setting up hadoop
Good afternoon all, I work tech and an extreme nubbie at hadoop. I could sure use some help. I have a professor wanting hadoop installed on multiple Linux computers in a lab. The computers are running CentOS 5. I know i have something configured wrong and am not sure where to go. I am following the instructions at http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/ I get to the part Testing Your Hadoop Cluster but when i use the command hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' It hangs. Could anyone be kind enough to point me to a step by step instillation and configuration website? Thank you Brian
Re: extreme nubbie need help setting up hadoop
Hello Brian , Here is the Hadoop project Wiki link which covers detailed Hadoop setup and running your first program on Single node as well as on multiple nodes . Also below are some more useful links to start understanding and using Hadoop. http://hadoop.apache.org/core/docs/current/quickstart.html http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster) http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster) If you still have difficulties running Hadoop based programs please reply with error output so that experts can comment. - Ravi On 2/3/09 10:08 AM, bjday bj...@cse.usf.edu wrote: Good afternoon all, I work tech and an extreme nubbie at hadoop. I could sure use some help. I have a professor wanting hadoop installed on multiple Linux computers in a lab. The computers are running CentOS 5. I know i have something configured wrong and am not sure where to go. I am following the instructions at http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/ I get to the part Testing Your Hadoop Cluster but when i use the command hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' It hangs. Could anyone be kind enough to point me to a step by step instillation and configuration website? Thank you Brian Ravi Phulari Yahoo! IM : ravescorp |Office Phone: (408)-336-0806 | --
Need help regarding DataNodeCluster
Hello, I tried using DataNodeCluster to simulate a set of 3000 datanodes. The namenode is already running. When I gave the below command : $ java DataNodeCluster -n 3000 -simulated -inject 1 1 -d someDirectory I am getting the following error: Starting 3000 Simulated Data Nodes that will connect to Name Node at gs301850.inktomisearch.com:50830 Starting DataNode 0 with dfs.data.dir: someDirectory/dfs/data/data1,someDirectory/dfs/data/data2 08/12/22 11:42:33 INFO datanode.DataNode: Registered FSDatasetStatusMBean 08/12/22 11:42:33 INFO datanode.DataNode: Opened info server at 60195 08/12/22 11:42:33 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s 08/12/22 11:42:33 INFO datanode.DataNode: Periodic Block Verification is disabled because verifcation is supported only with FSDataset. 08/12/22 11:42:33 INFO http.HttpServer: Version Jetty/5.1.4 08/12/22 11:42:33 INFO util.Container: Started HttpContext[/static,/static] 08/12/22 11:42:34 INFO util.Credential: Checking Resource aliases 08/12/22 11:42:34 INFO util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@1f03691 08/12/22 11:42:34 INFO http.SocketListener: Started SocketListener on 127.0.0.1:60196 08/12/22 11:42:34 INFO datanode.DataNode: Waiting for threadgroup to exit, active threads is 0 Error creating data node:java.io.IOException: Problem starting http server The system just hangs after the above message. Can anyone please let me know why I am getting the above error? Thanks in advance Ramya
RE: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi Mafish, Thanks for your suggestions. Finally I could resolve the issue. The *site.xml in namenode had ds.default.name as localhost where as in data nodes it were the actual ip. I changed the local host to actual ip in name node and it started working. Regards, Sourav -Original Message- From: Mafish Liu [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 7:37 PM To: core-user@hadoop.apache.org Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi, souravm: I don't know exactly what's wrong with your configuration from your post and I guest the possible causes are: 1. Make sure firewall on namenode is off or the port of 9000 is free to connect in your firewall configuration. 2. Namenode. Check the namenode start up log to see if namenode starts up correctly, or try run 'jps' on your namenode to see if there is process called namenode. May this help. On Tue, Sep 16, 2008 at 10:41 PM, souravm [EMAIL PROTECTED] wrote: Hi, Tha namenode in machine 1 has started. I can see the following log. Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: localhost/127.0.0.1:9000 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 112 msecs 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving safe mode after 0 secs. 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified username@namenode server ip. I'm thinking there might be a problem as in log file of data node I can see the message '2008-09-16 14:38:51,501 INFO org.apache.hadoop.ipc.RPC: Server at /192.168.1.102:9000 not available yet, Z...' Any help ? Regards, Sourav
Re: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi: You need to configure your nodes to ensure that node 1 can connect to node 2 without password. On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote: Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after it asks for password for connecting to machine 2) and starts datanode in machine 2 (as the console message says). However - 1. When I go to http://machine1:50070 - it does not show the data node at all. It says 0 data node configured 2. In the log file in machine 2 what I see is - / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.17.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 / 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 1 time(s). 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 2 time(s). 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 3 time(s). 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 4 time(s). 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 5 time(s). 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 6 time(s). 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 7 time(s). 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 8 time(s). 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 9 time(s). 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 10 time(s). 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at / 17.229.23.77:9000 not available yet, Z... ... and this retyring gets on repeating The hadoop-site.xmls are like this - 1. In machine 1 - configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namedfs.name.dir/name value/Users/souravm/hdpn/value /property property namemapred.job.tracker/name valuelocalhost:9001/value /property property namedfs.replication/name value1/value /property /configuration 2. In machine 2 configuration property namefs.default.name/name valuehdfs://machine1 ip:9000/value /property property namedfs.data.dir/name value/Users/nirdosh/hdfsd1/value /property property namedfs.replication/name value1/value /property /configuration The slaves file in machine 1 has single entry - user name@ip of machine2 The exact steps I did - 1. Reformat the namenode in machine 1 2. execute start-dfs.sh in machine 1 3. Then I try to see whether the datanode is created through http://machine 1 ip:50070 Any pointer to resolve this issue would be appreciated. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
Re: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi, I tried the way u suggested. I setup ssh without password. So now namenode can connect to datanode without password - the start-dfs.sh script does not ask for any password. However, even with this fix I still face the same problem. Regards, Sourav - Original Message - From: Mafish Liu [EMAIL PROTECTED] To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Mon Sep 15 23:26:10 2008 Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... Hi: You need to configure your nodes to ensure that node 1 can connect to node 2 without password. On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote: Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after it asks for password for connecting to machine 2) and starts datanode in machine 2 (as the console message says). However - 1. When I go to http://machine1:50070 - it does not show the data node at all. It says 0 data node configured 2. In the log file in machine 2 what I see is - / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.17.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 / 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 1 time(s). 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 2 time(s). 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 3 time(s). 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 4 time(s). 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 5 time(s). 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 6 time(s). 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 7 time(s). 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 8 time(s). 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 9 time(s). 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 10 time(s). 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at / 17.229.23.77:9000 not available yet, Z... ... and this retyring gets on repeating The hadoop-site.xmls are like this - 1. In machine 1 - configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namedfs.name.dir/name value/Users/souravm/hdpn/value /property property namemapred.job.tracker/name valuelocalhost:9001/value /property property namedfs.replication/name value1/value /property /configuration 2. In machine 2 configuration property namefs.default.name/name valuehdfs://machine1 ip:9000/value /property property namedfs.data.dir/name value/Users/nirdosh/hdfsd1/value /property property namedfs.replication/name value1/value /property /configuration The slaves file in machine 1 has single entry - user name@ip of machine2 The exact steps I did - 1. Reformat the namenode in machine 1 2. execute start-dfs.sh in machine 1 3. Then I try to see whether the datanode is created through http://machine 1 ip:50070 Any pointer to resolve this issue would be appreciated. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content
Re: Need help in hdfs configuration fully distributed way in Mac OSX...
check the namenode's log in machine1 to see if your namenode started successfully :) On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote: Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after it asks for password for connecting to machine 2) and starts datanode in machine 2 (as the console message says). However - 1. When I go to http://machine1:50070 - it does not show the data node at all. It says 0 data node configured 2. In the log file in machine 2 what I see is - / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = rc0902b-dhcp169.apple.com/17.229.22.169 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.17.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008 / 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 1 time(s). 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 2 time(s). 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 3 time(s). 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 4 time(s). 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 5 time(s). 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 6 time(s). 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 7 time(s). 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 8 time(s). 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 9 time(s). 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /17.229.23.77:9000. Already tried 10 time(s). 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at / 17.229.23.77:9000 not available yet, Z... ... and this retyring gets on repeating The hadoop-site.xmls are like this - 1. In machine 1 - configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namedfs.name.dir/name value/Users/souravm/hdpn/value /property property namemapred.job.tracker/name valuelocalhost:9001/value /property property namedfs.replication/name value1/value /property /configuration 2. In machine 2 configuration property namefs.default.name/name valuehdfs://machine1 ip:9000/value /property property namedfs.data.dir/name value/Users/nirdosh/hdfsd1/value /property property namedfs.replication/name value1/value /property /configuration The slaves file in machine 1 has single entry - user name@ip of machine2 The exact steps I did - 1. Reformat the namenode in machine 1 2. execute start-dfs.sh in machine 1 3. Then I try to see whether the datanode is created through http://machine 1 ip:50070 Any pointer to resolve this issue would be appreciated. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
RE: Need help in hdfs configuration fully distributed way in Mac OSX...
Hi, Tha namenode in machine 1 has started. I can see the following log. Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: localhost/127.0.0.1:9000 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 112 msecs 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving safe mode after 0 secs. 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started [EMAIL PROTECTED] 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting Is there a specific way to provide the master name in masters file (in hadoop/conf) in datanode ? I've currently specified username@namenode server ip. I'm thinking there might be a problem as in log file of data node I can see the message '2008-09-16 14:38:51,501 INFO org.apache.hadoop.ipc.RPC: Server at /192.168.1.102:9000 not available yet, Z...' Any help ? Regards, Sourav From: Samuel Guo [EMAIL PROTECTED] Sent: Tuesday, September 16, 2008 5:49 AM To: core-user@hadoop.apache.org Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX... check the namenode's log in machine1 to see if your namenode started successfully :) On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote: Hi All, I'm facing a problem in configuring hdfs in a fully distributed way in Mac OSX. Here is the topology - 1. The namenode is in machine 1 2. There is 1 datanode in machine 2 Now when I execute start-dfs.sh from machine 1, it connects to machine 2 (after it asks for password for connecting to machine 2) and starts datanode in machine 2 (as the console message says). However - 1. When I go to http://machine1:50070 - it does not show the data node at all. It says 0 data node configured 2. In the log file in machine 2 what I see is - / STARTUP_MSG: Starting DataNode
Re: Need help to setup Hadoop on Fedora Core 6
I tried this. Frankly, the hardest part was getting Java set up on that machine. GIJ got in the way of -everything-, causing me much frustration and furious anger. Even if you install sun java, it's possible that all the symbolic links don't point to sun java, but rather gij. I'm not sure if this is the case for you, but if you do a: /usr/sbin/alternatives --config java and you don't see two options, something is messed up. Also, the ubuntu tutorial on how to setup hadoop on a single node can be applied to fedora.. you just need to find the associated fedora packages, and use yum. Hope this helps. -SM PS - How my Fedora saga ended: Since I had the option of reformatting, and Fedora and I have had lots of disagreements in the past, I switched to a linux distro I was more comfortable with and took it from there. Best of luck. On Thu, Jul 24, 2008 at 6:03 PM, hadoop hadoop-chetan [EMAIL PROTECTED] wrote: Hello Folks I somebody has successfully installed Hadoop on FC 6, Please Help !!! Just bootstrapping into the Haddop madness and was attempting to install hadoop on Fedora Core 6. Tried all sorts of things but couldn't get past this error which is not starting the reduce tasks 2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200807241301_0001_r_00_0: java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:334) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) Before you ask, here are the details: 1. Running hadoop as a single node cluster 2. Disabled IPV6 3. Using Hadoop version */hadoop-0.17.1/* 4. enabled ssh to access local machine 5. Master and Slaves are set to localhost 6. Created simple sample file and loaded into DFS 7. Encountered error when I was running the sample with the wordcount example provided with the package 8. Here is my hadoop-site.xml configuration property namehadoop.tmp.dir/name value/tmp/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.map.tasks/name value1/value description define mapred.map tasks to be number of slave hosts /description /property property namemapred.reduce.tasks/name value1/value description define mapred.reduce tasks to be number of slave hosts /description /property property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namemapred.child.java.opts/name value-Xmx1800m/value descriptionJava opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] /description /property /configuration
Re: Configuration: I need help.
Allen Wittenauer wrote: On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Keep in mind that even with a secondary name node, you still have a SPOF. If the NameNode process dies, so does your HDFS. There's always a SPOF. it just moves. Sometimes it moves out of your own infrastructure, and then you have big problems :)
Configuration: I need help.
Seeing as there is no search function on the archives, I'm relegated to asking a possibly redundant question or four: I have, as a sample setup: idx1-trackerJobTracker idx2-namenode NameNode idx3-slave DataTracker ... idx20-slave DataTracker Q1: Can I put the same hadoop-site.xml file on all machines or do I need to configure each machine separately? Q2: My current setup does not seem to find a primary namenode, but instead wants to put idx1 and idx2 as secondary namenodes; as a result, I am not getting anything usable on any of the web addresses (50030, 50050, 50070, 50090). Q3: Possibly connected to Q1: The current setup seems to go out and start on all machines (masters/slaves); when I say bin/start-mapred.sh on the JobTracker, I get the answer jobtracker running...kill it first. Q4: Do I even *need* a secondary namenode? IWBN if I did not have to maintain three separate configuration files (jobtracker/namenode/datatracker). -- James Graham (Greywolf) | 650.930.1138|925.768.4053 * [EMAIL PROTECTED] | Check out what people are saying about SearchMe! -- click below http://www.searchme.com/stack/109aa
Re: Configuration: I need help.
Hi James, You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: James Graham (Greywolf) [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, August 6, 2008 1:37:20 PM Subject: Configuration: I need help. Seeing as there is no search function on the archives, I'm relegated to asking a possibly redundant question or four: I have, as a sample setup: idx1-trackerJobTracker idx2-namenode NameNode idx3-slave DataTracker ... idx20-slaveDataTracker Q1: Can I put the same hadoop-site.xml file on all machines or do I need to configure each machine separately? Q2: My current setup does not seem to find a primary namenode, but instead wants to put idx1 and idx2 as secondary namenodes; as a result, I am not getting anything usable on any of the web addresses (50030, 50050, 50070, 50090). Q3: Possibly connected to Q1: The current setup seems to go out and start on all machines (masters/slaves); when I say bin/start-mapred.sh on the JobTracker, I get the answer jobtracker running...kill it first. Q4: Do I even *need* a secondary namenode? IWBN if I did not have to maintain three separate configuration files (jobtracker/namenode/datatracker). -- James Graham (Greywolf) | 650.930.1138|925.768.4053 * [EMAIL PROTECTED] | Check out what people are saying about SearchMe! -- click below http://www.searchme.com/stack/109aa
Re: Configuration: I need help.
Thus spake Otis Gospodnetic:: Hi James, You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Okay, thank you! good to know (even though the documentation seems to state that secondary (NN) is a misnomer, since it never takes over for the primary NN. Now I have something interesting going on. Given the following configuration file, what am I doing wrong? When I type start-dfs.sh on the namenode, as instructed in the docs, I end up with, effectively, Address already in use; shutting down NameNode. I do not understand this. It's like it's trying to start it twice; netstat shows no port 50070 in use after shutdown. I feel like an idiot trying to wrap my mind around this! What the heck am I doing wrong? configuration !-- HOST:PORT MAPPINGS -- property namedfs.secondary.http.address/name value0.0.0.0:50090/value description The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. /description /property property namedfs.datanode.address/name value0.0.0.0:50010/value description The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. /description /property property namedfs.datanode.http.address/name value0.0.0.0:50075/value description The datanode http server address and port. If the port is 0 then the server will start on a free port. /description /property property namedfs.http.address/name valueidx2-r70:50070/value description The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. /description /property property namemapred.job.tracker/name valueidx1-r70:50030/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.job.tracker.http.address/name valueidx1-r70:50030/value description The job tracker http server address and port the server will listen on. If the port is 0 then the server will start on a free port. /description /property property namefs.default.name/name valuehdfs://idx2-r70:50070//value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration ### -- James Graham (Greywolf) | 650.930.1138|925.768.4053 * [EMAIL PROTECTED] | Check out what people are saying about SearchMe! -- click below http://www.searchme.com/stack/109aa
Re: Configuration: I need help.
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: You can put the same hadoop-site.xml on all machines. Yes, you do want a secondary NN - a single NN is a SPOF. Browser the archives a few days back to find an email from Paul about DRBD (disk replication) to avoid this SPOF. Keep in mind that even with a secondary name node, you still have a SPOF. If the NameNode process dies, so does your HDFS.
Re: Configuration: I need help.
Thus spake James Graham (Greywolf):: Now I have something interesting going on. Given the following configuration file, what am I doing wrong? When I type start-dfs.sh on the namenode, as instructed in the docs, I end up with, effectively, Address already in use; shutting down NameNode. I do not understand this. It's like it's trying to start it twice; netstat shows no port 50070 in use after shutdown. I feel like an idiot trying to wrap my mind around this! What the heck am I doing wrong? Never mind. declaring multiple services at the same port never works. -- James Graham (Greywolf) | 650.930.1138|925.768.4053 * [EMAIL PROTECTED] | Check out what people are saying about SearchMe! -- click below http://www.searchme.com/stack/109aa
Re: Hadoop and Fedora Core 6 Adventure, Need Help ASAP
Hello Folks I somebody has successfully installed Hadoop on FC 6, Please Help !!! Just bootstrapping into the Haddop madness and was attempting to install hadoop on Fedora Core 6. Tried all sorts of things but couldn't get past this error which is not starting the reduce tasks 2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200807241301_0001_r_00_0: java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:334) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) Before you ask, here are the details: 1. Running hadoop as a single node cluster 2. Disabled IPV6 3. Using Hadoop version */hadoop-0.17.1/* 4. enabled ssh to access local machine 5. Master and Slaves are set to localhost 6. Created simple sample file and loaded into DFS 7. Encountered error when I was running the sample with the wordcount example provided with the package 8. Here is my hadoop-site.xml configuration property namehadoop.tmp.dir/name value/tmp/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.map.tasks/name value1/value description define mapred.map tasks to be number of slave hosts /description /property property namemapred.reduce.tasks/name value1/value description define mapred.reduce tasks to be number of slave hosts /description /property property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namemapred.child.java.opts/name value-Xmx1800m/value descriptionJava opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] /description /property /configuration
Need help to setup Hadoop on Fedora Core 6
Hello Folks I somebody has successfully installed Hadoop on FC 6, Please Help !!! Just bootstrapping into the Haddop madness and was attempting to install hadoop on Fedora Core 6. Tried all sorts of things but couldn't get past this error which is not starting the reduce tasks 2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_200807241301_0001_r_00_0: java.lang.NullPointerException at java.util.Hashtable.get(Hashtable.java:334) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) Before you ask, here are the details: 1. Running hadoop as a single node cluster 2. Disabled IPV6 3. Using Hadoop version */hadoop-0.17.1/* 4. enabled ssh to access local machine 5. Master and Slaves are set to localhost 6. Created simple sample file and loaded into DFS 7. Encountered error when I was running the sample with the wordcount example provided with the package 8. Here is my hadoop-site.xml configuration property namehadoop.tmp.dir/name value/tmp/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://localhost:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property property namemapred.job.tracker/name valuelocalhost:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property property namemapred.map.tasks/name value1/value description define mapred.map tasks to be number of slave hosts /description /property property namemapred.reduce.tasks/name value1/value description define mapred.reduce tasks to be number of slave hosts /description /property property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property property namemapred.child.java.opts/name value-Xmx1800m/value descriptionJava opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED] /description /property /configuration
Re: Need Help
hemal patel wrote: Hello , Can u help me to solve this problem.. When I am trying to run this program it give me error like this. bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' 08/05/12 17:32:59 INFO mapred.FileInputFormat: Total input paths to process : 12 java.io.IOException: Not a file: hdfs://localhost:9000/user/hemal/input/conf at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:170) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155). And also one more error [EMAIL PROTECTED]:~/hadoop-0.15.3 bin/hadoop jar usr/hemal/wordconut.jar Two things to check 1) The jar file should be on the local disk and not on the DFS. It looks like 'usr/hemal/wordconut.jar' is the dfs path. So the command would look like bin/hadoop jar /local/path/to/jar/job.jar args So if you have the jar file in your home folder then you can use ~/job.jar or /home/user-name/job.jar 2) Also make sure its wordconut.jar and not wordcount.jar Amar P.S Changing the mailing list to core-user. WordCount /usr/hemal/wordconut/input /usr/hemal/wordcount/output Exception in thread main java.io.IOException: Error opening job jar: usr/hemal/wordconut.jar at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) Please help me out. Thanks Hemal
I need help to set HADOOP
Hello!!: I am trying to set hadoop two pc. I have in conf/master: master master.visid.com and in conf/slave: master.visid.com slave3.visid.com. When i execute bin/start-dfs.sh and bin/start-mapred.sh in logs/hadoop-hadoop-datanode-slave3.visid.com.log is displayed the next error: STARTUP_MSG: Starting DataNode STARTUP_MSG: host = slave3.visid.com/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.2 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 642481; compiled by 'hadoopqa' on Sat Mar 29 01:59:04 UTC 2008 / 2008-05-08 21:15:40,133 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 1 time(s). 2008-05-08 21:15:41,133 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 2 time(s). 2008-05-08 21:15:42,134 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 3 time(s). 2008-05-08 21:15:43,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 4 time(s). 2008-05-08 21:15:44,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 5 time(s). 2008-05-08 21:15:45,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 6 time(s). 2008-05-08 21:15:46,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 7 time(s). 2008-05-08 21:15:47,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 8 time(s). 2008-05-08 21:15:48,138 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 9 time(s). 2008-05-08 21:15:49,138 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master.visid.com/192.168.46.242:54310. Already tried 10 time(s). 2008-05-08 21:15:50,141 ERROR org.apache.hadoop.dfs.DataNode: java.net.NoRouteToHostException: No route to host at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:161) at org.apache.hadoop.ipc.Client.getConnection(Client.java:578) at org.apache.hadoop.ipc.Client.call(Client.java:501) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:278) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:315) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:260) at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:207) at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162) at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512) at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456) at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477) at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673) in the master.visid.com i execute jps and all services is running: 3984 DataNode 4148 SecondaryNameNode 4373 TaskTracker 4461 Jps 3873 NameNode 4246 JobTracker but in the slave3.visid.com none service is running: jps here is my file config - hadoop-site.xml: ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/nutch/filesystem/hadoop-datastore/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuemaster.visid.com:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property
need help
Hi I started using hadoop very recently I am struct with the basic example when i am trying to run bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' i am getting output as 08/04/09 21:23:12 INFO mapred.FileInputFormat: Total input paths to process : 2 java.io.IOException: Not a file: hdfs://localhost:9000/user/Administrator/input/ conf at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja va:170) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) at org.apache.hadoop.examples.WordCount.run(WordCount.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.WordCount.main(WordCount.java:153) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra mDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:49) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) Where are there libraries residing..? How should i configure this.? Thanks Regards, Krishna. Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/