HOD is locked on Ringmaster at : xxx
Good afternoon everybody, I've got a problem with HOD. When I try to allocate a new cluster with the below command, HOD create a job with Torque and after it's locked on INFO/20 hadoop:541 - Cluster Id 61777.co-admin, DEBUG/10 hadoop:545 - Ringmaster at: xxx : hod allocate -d myDirectory -n 3 And if I run : hod info -d myDirectory, HOD returns no cluster. Where is the problem? Thank You for your help. -- View this message in context: http://old.nabble.com/HOD-is-locked-on-Ringmaster-at-%3A-xxx-tp31739953p31739953.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Benchmarks with different workloads
Hi , I am looking out for Hadoop benchmarks that could characterize the following workloads : 1) IO intensive workload 2) CPU intensive workload 3) Mixed (IO + CPU) workloads Some one please throw some pointers on these!! Thanks, Matthew
Re: Benchmarks with different workloads
You could try SWIM [1]. -Cristina [1] Yanpei Chen, Archana Ganapathi, Rean Griffith, Randy Katz . SWIM - Statistical Workload Injector for MapReduce. Available at: http://www.eecs.berkeley.edu/~ychen2/SWIM.html -- Forwarded message -- From: Matthew John tmatthewjohn1...@gmail.com To: common-user common-user@hadoop.apache.org Date: Tue, 31 May 2011 20:01:25 +0530 Subject: Benchmarks with different workloads Hi , I am looking out for Hadoop benchmarks that could characterize the following workloads : 1) IO intensive workload 2) CPU intensive workload 3) Mixed (IO + CPU) workloads Some one please throw some pointers on these!! Thanks, Matthew
Hadoop project - help needed
Hello dear forum, i am working on a project on apache Hadoop, i am totally new to this software and i need some help understanding the basic features! To sum up, for my project i have configured hadoop so that it runs 3 datanodes on one machine. The project's main goal is, to use both Flickr API (flickr.com) libraries and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a Flickr group and returns photos' info from that group. In order to do that, i have 3 flickr accounts, each one with a different api key. I dont need any help on the flickr side of the code, ofcourse. But what i dont understand, is how to use the Mapper and Reducer part of the code. What input do i have to give the Map() function? do i have to contain this whole info downloading process in the map() function? In a few words, how do i convert my code so that it runs distributedly on hadoop? thank u! -- View this message in context: http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Starting a Hadoop job outside the cluster
I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Hadoop project - help needed
Parismav, So you are more or less trying to scrape some data in a distributed way. Well there are several things that you could do, just be careful I am not sure the terms of service for the flickr APIs so make sure that you are not violating them by downloading too much data. You probably want to use the map input data to be command/control for what the mappers do. I would probably put in a format like ACCOUT INFO\tGROUP INFO\n Then you could use the N-line input format so that each mapper will process one line out of the file. Something like (This is just psudo code) MapperLong, String, ?, ? { map(Long offset, String line,...) { String parts = line.split(\t); openConnection(parts[0]); GroupData gd = getDataAboutGroup(parts[1]); ... } } I would probably not bother with a reducer if all you are doing is pulling down data. Also the output format you choose really depends on the type of data you are downloading, and how you want to use that data later. For example if you want to download the actual picture then you probably want to use a sequence file format or some other binary format, because converting a picture to text can be very costly. --Bobby Evans On 5/31/11 10:35 AM, parismav paok_gate...@hotmail.com wrote: Hello dear forum, i am working on a project on apache Hadoop, i am totally new to this software and i need some help understanding the basic features! To sum up, for my project i have configured hadoop so that it runs 3 datanodes on one machine. The project's main goal is, to use both Flickr API (flickr.com) libraries and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a Flickr group and returns photos' info from that group. In order to do that, i have 3 flickr accounts, each one with a different api key. I dont need any help on the flickr side of the code, ofcourse. But what i dont understand, is how to use the Mapper and Reducer part of the code. What input do i have to give the Map() function? do i have to contain this whole info downloading process in the map() function? In a few words, how do i convert my code so that it runs distributedly on hadoop? thank u! -- View this message in context: http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Starting a Hadoop job outside the cluster
Steve, What do you mean when you say it shows windows os and user.dir? There will be a few properties in the job.xml that may carry client machine information but these shouldn't be a hinderance. Unless a TaskTracker was started on the Windows box (no daemons ought to be started on the client machine), no task may run on it. On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com wrote: I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J
Is something wrong when I'm shuffling 2 bytes into RAM?
For some of my jobs I'll see long stretches of log files that look like this: INFO mapred.ReduceTask: Shuffling 2 bytes (14 raw bytes) into RAM from attempt_201105041713_5850_m_002764_0 INFO mapred.ReduceTask: Read 2 bytes from map-output for attempt_201105041713_5850_m_002764_0 INFO mapred.ReduceTask: attempt_201105041713_5850_r_17_2 Scheduled 1 outputs (0 slow hosts and74 dup hosts) INFO mapred.ReduceTask: Rec #1 from attempt_201105041713_5850_m_002764_0 - (-1, -1) from hnode52.tuk2.intelius.com INFO mapred.ReduceTask: header: attempt_201105041713_5850_m_002729_0, compressed len: 14, decompressed len: 2 INFO mapred.ReduceTask: Shuffling 2 bytes (14 raw bytes) into RAM from attempt_201105041713_5850_m_002729_0 INFO mapred.ReduceTask: Read 2 bytes from map-output for attempt_201105041713_5850_m_002729_0 INFO mapred.ReduceTask: Rec #1 from attempt_201105041713_5850_m_002729_0 - (-1, -1) from hnode42.tuk2.intelius.com INFO mapred.ReduceTask: attempt_201105041713_5850_r_17_2 Scheduled 1 outputs (0 slow hosts and70 dup hosts) INFO mapred.ReduceTask: attempt_201105041713_5850_r_17_2 Scheduled 1 outputs (0 slow hosts and64 dup hosts) INFO mapred.ReduceTask: header: attempt_201105041713_5850_m_003036_0, compressed len: 14, decompressed len: 2 This looks really wrong to me. Am I correct in thinking that when I'm shuffling 2 bytes into memory at a time I've got a real performance problem? Does anyone have ideas as to what might be going on here? The jobs that hit this sometimes work and sometimes fail for reasons that may or may not be related to the logs excerpted above.
Re: Starting a Hadoop job outside the cluster
My Reducer code says this: public static class Reduce extends ReducerText, Text, Text, Text { private boolean m_DateSent; /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. */ @Override protected void reduce(Text key, IterableText values, Context context) throws IOException, InterruptedException { if (!m_DateSent) { Text dkey = new Text(CreationDate); Text dValue = new Text(); writeKeyValue(context, dkey,dValue,CreationDate,new Date().toString()); writeKeyValue(context, dkey,dValue,user.dir,System.getProperty(user.dir)); writeKeyValue(context, dkey,dValue,os.arch,System.getProperty(os.arch)); writeKeyValue(context, dkey,dValue,os.name ,System.getProperty(os.name)); //dkey.set(ip); //java.net.InetAddress addr = java.net.InetAddress.getLocalHost(); //dValue.set(System.getProperty(addr.toString())); //context.write(dkey, dValue); m_DateSent = true; } IteratorText itr = values.iterator(); // Add interesting code here while (itr.hasNext()) { Text vCheck = itr.next(); context.write(key, vCheck); } } } if os.arch is linux I am running on the cluster - if windows I am running locally I run this main hoping to run on the cluster with the NameNode and Job Tracker at glados public static void main(String[] args) throws Exception { String outFile = ./out; Configuration conf = new Configuration(); // cause output to go to the cluster conf.set(fs.default.name, hdfs://glados:9000/); conf.set(mapreduce.jobtracker.address, glados:9000/); conf.set(mapred.jar, NShot.jar); conf.set(fs.defaultFS,hdfs://glados:9000/); Job job = new Job(conf, Generated data); conf = job.getConfiguration(); job.setJarByClass(NShotInputFormat.class); .. Other setup code ... boolean ans = job.waitForCompletion(true); int ret = ans ? 0 : 1; } On Tue, May 31, 2011 at 9:35 AM, Harsh J ha...@cloudera.com wrote: Steve, What do you mean when you say it shows windows os and user.dir? There will be a few properties in the job.xml that may carry client machine information but these shouldn't be a hinderance. Unless a TaskTracker was started on the Windows box (no daemons ought to be started on the client machine), no task may run on it. On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com wrote: I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Starting a Hadoop job outside the cluster
Simply remove that trailing slash (forgot to catch it earlier, sorry) and you should be set (or at least more set than before surely.) On Tue, May 31, 2011 at 10:51 PM, Steve Lewis lordjoe2...@gmail.com wrote: 0.20.2 - we have been avoiding 0.21 because it is not terribly stable and made some MAJOR changes to critical classes When I say Configuration conf = new Configuration(); // cause output to go to the cluster conf.set(fs.default.name, hdfs://glados:9000/); // conf.set(mapreduce.jobtracker.address, glados:9000/); conf.set(mapred.job.tracker, glados:9000/); conf.set(mapred.jar, NShot.jar); // conf.set(fs.defaultFS,hdfs://glados:9000/); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); // if (otherArgs.length != 2) { // System.err.println(Usage: wordcount in out); // System.exit(2); // } Job job = new Job(conf, Generated data); I get Exception in thread main java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: glados:9000 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:150) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123) at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:410) at org.apache.hadoop.mapreduce.Job.init(Job.java:50) at org.apache.hadoop.mapreduce.Job.init(Job.java:54) at org.systemsbiology.hadoopgenerated.NShotTest.main(NShotTest.java:188) Caused by: java.net.URISyntaxException: Relative path in absolute URI: glados:9000 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) I promise to publish a working example if this ever works On Tue, May 31, 2011 at 10:02 AM, Harsh J ha...@cloudera.com wrote: Steve, On Tue, May 31, 2011 at 10:27 PM, Steve Lewis lordjoe2...@gmail.com wrote: My Reducer code says this: dkey,dValue,os.arch,System.getProperty(os.arch)); writeKeyValue(context, dkey,dValue,os.name,System.getProperty(os.name)); if os.arch is linux I am running on the cluster - if windows I am running locally Correct, so it should be Linux since these are System properties, and if you're getting Windows its probably running locally on your client box itself! conf.set(mapreduce.jobtracker.address, glados:9000/); This here might be your problem. That form of property would only work with 0.21.x, while on 0.20.x if you do not set it as mapred.job.tracker then the local job runner takes over by default, thereby making this odd thing happen (that's my guess). What version of Hadoop are you using? -- Harsh J -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J
Re: Hadoop project - help needed
Hi, To be very precise, input to the mapper should be something you want to filter on basis of which you want to do the aggregation. The Reducer is where you aggregate the output from mapper. Check the WordCount Example in Hadoop, it can help you to understand the basic concepts. Cheers, Jagaran From: parismav paok_gate...@hotmail.com To: core-u...@hadoop.apache.org Sent: Tue, 31 May, 2011 8:35:27 AM Subject: Hadoop project - help needed Hello dear forum, i am working on a project on apache Hadoop, i am totally new to this software and i need some help understanding the basic features! To sum up, for my project i have configured hadoop so that it runs 3 datanodes on one machine. The project's main goal is, to use both Flickr API (flickr.com) libraries and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a Flickr group and returns photos' info from that group. In order to do that, i have 3 flickr accounts, each one with a different api key. I dont need any help on the flickr side of the code, ofcourse. But what i dont understand, is how to use the Mapper and Reducer part of the code. What input do i have to give the Map() function? do i have to contain this whole info downloading process in the map() function? In a few words, how do i convert my code so that it runs distributedly on hadoop? thank u! -- View this message in context: http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
trying to select technology
Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
Think of Lucene and Apache SOLR Cheers, Jagaran From: cs230 chintanjs...@gmail.com To: core-u...@hadoop.apache.org Sent: Tue, 31 May, 2011 10:50:49 AM Subject: trying to select technology Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes. --Matt On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: trying to select technology
To pile on, thousands or millions of documents are well within the range that is well addressed by Lucene. Solr may be an even better option than bare Lucene since it handles lots of the boilerplate problems like document parsing and index update scheduling. On Tue, May 31, 2011 at 11:56 AM, Matthew Foley ma...@yahoo-inc.com wrote: Sounds like you're looking for a full-text inverted index. Lucene is a good opensource implementation of that. I believe it has an option for storing the original full text as well as the indexes. --Matt On May 31, 2011, at 10:50 AM, cs230 wrote: Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: No. of Map and reduce tasks
What if I had multiple files in input directory, hadoop should then fire parallel map jobs? On Thu, May 26, 2011 at 7:21 PM, jagaran das jagaran_...@yahoo.co.in wrote: If you give really low size files, then the use of Big Block Size of Hadoop goes away. Instead try merging files. Hope that helps From: James Seigel ja...@tynt.com To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 6:04:07 PM Subject: Re: No. of Map and reduce tasks Set input split size really low, you might get something. I'd rather you fire up some nix commands and pack together that file onto itself a bunch if times and the put it back into hdfs and let 'er rip Sent from my mobile. Please excuse the typos. On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute accross multiple machines? On Thu, May 26, 2011 at 3:08 PM, jagaran das jagaran_...@yahoo.co.in wrote: Hi Mohit, No of Maps - It depends on what is the Total File Size / Block Size No of Reducers - You can specify. Regards, Jagaran From: Mohit Anchlia mohitanch...@gmail.com To: common-user@hadoop.apache.org Sent: Thu, 26 May, 2011 2:48:20 PM Subject: No. of Map and reduce tasks How can I tell how the map and reduce tasks were spread accross the cluster? I looked at the jobtracker web page but can't find that info. Also, can I specify how many map or reduce tasks I want to be launched? From what I understand is that it's based on the number of input files passed to hadoop. So if I have 4 files there will be 4 Map taks that will be launced and reducer is dependent on the hashpartitioner.
Re: Why don't my jobs get preempted?
On Tue, May 31, 2011 at 2:50 PM, W.P. McNeill bill...@gmail.com wrote: I'm launching long-running tasks on a cluster running the Fair Scheduler. As I understand it, the Fair Scheduler is preemptive. What I expect to see is that my long-running jobs sometimes get killed to make room for other people's jobs. This never happens instead my long-running jobs hog mapper and reducer slots and starve other people out. Am I misunderstanding how the Fair Scheduler works? Try adding minSharePreemptionTimeout120/minSharePreemptionTimeout fairSharePreemptionTimeout180/fairSharePreemptionTimeout To one of your pools and see if that pool pre-empts other pools
Starting JobTracker Locally but binding to remote Address
Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* *at org.apache.hadoop.ipc.Server.bind(Server.java:190)* *at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* *at org.apache.hadoop.ipc.Server.init(Server.java:1026)* *at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* *at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* *at sun.nio.ch.Net.bind(Native Method)* *at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * *at org.apache.hadoop.ipc.Server.bind(Server.java:188)* *... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony
Re: Starting JobTracker Locally but binding to remote Address
This seems to be your problem, really... * namemapred.job.tracker/name* * valueslave2:9001/value* On Tue, May 31, 2011 at 06:07PM, Juan P. wrote: Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* *at org.apache.hadoop.ipc.Server.bind(Server.java:190)* *at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* *at org.apache.hadoop.ipc.Server.init(Server.java:1026)* *at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* *at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* *at sun.nio.ch.Net.bind(Native Method)* *at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * *at org.apache.hadoop.ipc.Server.bind(Server.java:188)* *... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony
Re: Starting JobTracker Locally but binding to remote Address
The problem is that start-all.sh isn't all that intelligent. The way that start-all.sh works is by running start-dfs.sh and start-mapred.sh. The start-mapred.sh script always starts a job tracker on the local host and a task tracker on all of the hosts listed in slaves (it uses SSH to do the remote execution). The start-dfs.sh script always starts a name node on the local host, a data node on all of the hosts listed in slaves, and a secondary name node on all of the hosts listed in masters. In your case, you'll want to run start-dfs.sh on slave3 and start-mapred.sh on slave2. -Joey On Tue, May 31, 2011 at 5:07 PM, Juan P. gordoslo...@gmail.com wrote: Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* * at org.apache.hadoop.ipc.Server.bind(Server.java:190)* * at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* * at org.apache.hadoop.ipc.Server.init(Server.java:1026)* * at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* * at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* * at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * * at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* * at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* * at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* * at sun.nio.ch.Net.bind(Native Method)* * at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* * at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * * at org.apache.hadoop.ipc.Server.bind(Server.java:188)* * ... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: Starting JobTracker Locally but binding to remote Address
Eeeeh why? Isnt That the config for the jobtracker? Slave2 has been defined in my /etc/hosts files. Should those lines not be in both nodes? Thanks for helping! Pony On 31/05/2011, at 18:12, Konstantin Boudnik c...@apache.org wrote: This seems to be your problem, really... * namemapred.job.tracker/name* * valueslave2:9001/value* On Tue, May 31, 2011 at 06:07PM, Juan P. wrote: Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* *at org.apache.hadoop.ipc.Server.bind(Server.java:190)* *at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* *at org.apache.hadoop.ipc.Server.init(Server.java:1026)* *at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* *at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* *at sun.nio.ch.Net.bind(Native Method)* *at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * *at org.apache.hadoop.ipc.Server.bind(Server.java:188)* *... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony
Re: Starting JobTracker Locally but binding to remote Address
:D i'll give that a try 1st thing in the morning! Thanks a lot joey!! Sent from my iPhone On 31/05/2011, at 18:18, Joey Echeverria j...@cloudera.com wrote: The problem is that start-all.sh isn't all that intelligent. The way that start-all.sh works is by running start-dfs.sh and start-mapred.sh. The start-mapred.sh script always starts a job tracker on the local host and a task tracker on all of the hosts listed in slaves (it uses SSH to do the remote execution). The start-dfs.sh script always starts a name node on the local host, a data node on all of the hosts listed in slaves, and a secondary name node on all of the hosts listed in masters. In your case, you'll want to run start-dfs.sh on slave3 and start-mapred.sh on slave2. -Joey On Tue, May 31, 2011 at 5:07 PM, Juan P. gordoslo...@gmail.com wrote: Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* *at org.apache.hadoop.ipc.Server.bind(Server.java:190)* *at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* *at org.apache.hadoop.ipc.Server.init(Server.java:1026)* *at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* *at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* *at sun.nio.ch.Net.bind(Native Method)* *at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * *at org.apache.hadoop.ipc.Server.bind(Server.java:188)* *... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony -- Joseph Echeverria Cloudera, Inc. 443.305.9434
Re: Starting JobTracker Locally but binding to remote Address
On Tue, May 31, 2011 at 06:21PM, gordoslocos wrote: Eeeeh why? Isnt That the config for the jobtracker? Slave2 has been defined in my /etc/hosts files. Should those lines not be in both nodes? Indeed, but you are running MR start script on slave3 meaning that JT will be started on slave3 whatever the configuration says: start-mapred.sh isn't that smart and doesn't check your configs. Cos Thanks for helping! Pony On 31/05/2011, at 18:12, Konstantin Boudnik c...@apache.org wrote: This seems to be your problem, really... * namemapred.job.tracker/name* * valueslave2:9001/value* On Tue, May 31, 2011 at 06:07PM, Juan P. wrote: Hi Guys, I recently configured my cluster to have 2 VMs. I configured 1 machine (slave3) to be the namenode and another to be the jobtracker (slave2). They both work as datanode/tasktracker as well. Both configs have the following contents in their masters and slaves file: *slave2* *slave3* Both machines have the following contents on their mapred-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namemapred.job.tracker/name* * valueslave2:9001/value* * /property* */configuration* Both machines have the following contents on their core-site.xml file: *?xml version=1.0?* *?xml-stylesheet type=text/xsl href=configuration.xsl?* * * *!-- Put site-specific property overrides in this file. --* * * *configuration* * property* * namefs.default.name/name* * valuehdfs://slave3:9000/value* * /property* */configuration* When I log into the namenode and I run the start-all.sh script, everything but the jobtracker starts. In the log files I get the following exception: */* *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* *STARTUP_MSG: args = []* *STARTUP_MSG: version = 0.20.2* *STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010* */* *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)* *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot assign requested address* *at org.apache.hadoop.ipc.Server.bind(Server.java:190)* *at org.apache.hadoop.ipc.Server$Listener.init(Server.java:253)* *at org.apache.hadoop.ipc.Server.init(Server.java:1026)* *at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:488)* *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)* *at org.apache.hadoop.mapred.JobTracker.init(JobTracker.java:1595) * *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)* *at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)* *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)* *Caused by: java.net.BindException: Cannot assign requested address* *at sun.nio.ch.Net.bind(Native Method)* *at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)* *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) * *at org.apache.hadoop.ipc.Server.bind(Server.java:188)* *... 8 more* * * *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG:* */* *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112* */* As I see it, from the lines *STARTUP_MSG: Starting JobTracker* *STARTUP_MSG: host = slave3/10.20.11.112* the namenode (slave3) is trying to run the jobtracker locally but when it starts the jobtracker server it binds it to the slave2 address and of course fails: *Problem binding to slave2/10.20.11.166:9001* What do you guys think could be going wrong? Thanks! Pony
copyToLocal (from Amazon AWS)
Hi, I am not sure if this question has been asked. Its more of a hadoop fs question. I am trying to execute the following hadoop fs command : hadoop fs -copyToLocal s3n://Access Key:Secret Key@bucket name/file.txt /home/hadoop/workspace/file.txt When I execute this command directly from the Terminal shell, it works perfectly fine, however the above command from code doesn't execute. In fact, it says : Exception in thread main copyToLocal: null Please note I am using Runtime.getRunTime().exec(cmdStr), where cmdStr is the above hadoop command. Also, please note that hadoop fs -cp or hadoop fs -rmr commands work fine with source and destination being both Amazon AWS locations. In the above command (hadoop fs -copyToLocal) the destination is local location to my machine(Ubuntu installed). Your help would be greatly appreciated. Thanks, Neeral
Re: Why don't my jobs get preempted?
Preemption is only available in Hadoop 0.20+ or in distributions of Hadoop that have applied that patch, such as Cloudera's distribution. If you are running one of these, check out http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html for information on how to enable preemption. Matei On May 31, 2011, at 12:20 PM, Edward Capriolo wrote: On Tue, May 31, 2011 at 2:50 PM, W.P. McNeill bill...@gmail.com wrote: I'm launching long-running tasks on a cluster running the Fair Scheduler. As I understand it, the Fair Scheduler is preemptive. What I expect to see is that my long-running jobs sometimes get killed to make room for other people's jobs. This never happens instead my long-running jobs hog mapper and reducer slots and starve other people out. Am I misunderstanding how the Fair Scheduler works? Try adding minSharePreemptionTimeout120/minSharePreemptionTimeout fairSharePreemptionTimeout180/fairSharePreemptionTimeout To one of your pools and see if that pool pre-empts other pools
Re: Why don't my jobs get preempted?
Sorry, I meant 0.21+, not 0.20+ for the Apache releases. Matei On May 31, 2011, at 4:05 PM, Matei Zaharia wrote: Preemption is only available in Hadoop 0.20+ or in distributions of Hadoop that have applied that patch, such as Cloudera's distribution. If you are running one of these, check out http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html for information on how to enable preemption. Matei On May 31, 2011, at 12:20 PM, Edward Capriolo wrote: On Tue, May 31, 2011 at 2:50 PM, W.P. McNeill bill...@gmail.com wrote: I'm launching long-running tasks on a cluster running the Fair Scheduler. As I understand it, the Fair Scheduler is preemptive. What I expect to see is that my long-running jobs sometimes get killed to make room for other people's jobs. This never happens instead my long-running jobs hog mapper and reducer slots and starve other people out. Am I misunderstanding how the Fair Scheduler works? Try adding minSharePreemptionTimeout120/minSharePreemptionTimeout fairSharePreemptionTimeout180/fairSharePreemptionTimeout To one of your pools and see if that pool pre-empts other pools
Re: copyToLocal (from Amazon AWS)
try using complete path for where you hadoop binary is present. For eg /usr/bin/hadoop instead of hadoop... On Tue, May 31, 2011 at 3:56 PM, neeral beladia neeral_bela...@yahoo.comwrote: Hi, I am not sure if this question has been asked. Its more of a hadoop fs question. I am trying to execute the following hadoop fs command : hadoop fs -copyToLocal s3n://Access Key:Secret Key@bucket name/file.txt /home/hadoop/workspace/file.txt When I execute this command directly from the Terminal shell, it works perfectly fine, however the above command from code doesn't execute. In fact, it says : Exception in thread main copyToLocal: null Please note I am using Runtime.getRunTime().exec(cmdStr), where cmdStr is the above hadoop command. Also, please note that hadoop fs -cp or hadoop fs -rmr commands work fine with source and destination being both Amazon AWS locations. In the above command (hadoop fs -copyToLocal) the destination is local location to my machine(Ubuntu installed). Your help would be greatly appreciated. Thanks, Neeral
Re: copyToLocal (from Amazon AWS)
Oops.. reading again, the command is working. what is the exact string that you have in cmdStr ? On Tue, May 31, 2011 at 4:51 PM, Mapred Learn mapred.le...@gmail.comwrote: try using complete path for where you hadoop binary is present. For eg /usr/bin/hadoop instead of hadoop... On Tue, May 31, 2011 at 3:56 PM, neeral beladia neeral_bela...@yahoo.comwrote: Hi, I am not sure if this question has been asked. Its more of a hadoop fs question. I am trying to execute the following hadoop fs command : hadoop fs -copyToLocal s3n://Access Key:Secret Key@bucket name/file.txt /home/hadoop/workspace/file.txt When I execute this command directly from the Terminal shell, it works perfectly fine, however the above command from code doesn't execute. In fact, it says : Exception in thread main copyToLocal: null Please note I am using Runtime.getRunTime().exec(cmdStr), where cmdStr is the above hadoop command. Also, please note that hadoop fs -cp or hadoop fs -rmr commands work fine with source and destination being both Amazon AWS locations. In the above command (hadoop fs -copyToLocal) the destination is local location to my machine(Ubuntu installed). Your help would be greatly appreciated. Thanks, Neeral
Re: trying to select technology
Hi, I think you should check out MarkLogic, a product with database and search capabilities especially designed for XML and unstructured data. We also allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic. For more information on MarkLogic, please check out: http://www.marklogic.com/products/overview.html Thanks, Jane --- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote: From: cs230 chintanjs...@gmail.com Subject: trying to select technology To: core-u...@hadoop.apache.org Date: Tuesday, May 31, 2011, 10:50 AM Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
DistributedCache - getLocalCacheFiles method returns null
Hi, I have a file on amazon aws under : s3n://Access Key:Secret Key@Bucket Name/file.txt I want this file to be accessible by the slave nodes via Distributed Cache. I put the following after the job configuration statements in the Driver program : DistributedCache.addCacheFile(new Path(s3n://Access Key:Secret Key@Bucket Name/file.txt).toUri(), job.getConfiguration()); Also in my setup method in the mapper class, I have the below statement : Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); cacheFiles is gettng assigned null. Could you please let me know what I am doing wrong here ? The file does exist on S3. Thanks, Neeral
Re: trying to select technology
my suggestion, ElasticSearch:http://elasticsearch.org -原始邮件- From: Jane Chen Sent: Wednesday, June 01, 2011 12:19 PM To: core-u...@hadoop.apache.org ; common-user@hadoop.apache.org Subject: Re: trying to select technology Hi, I think you should check out MarkLogic, a product with database and search capabilities especially designed for XML and unstructured data. We also allow you to run Hadoop MapReduce jobs on top of data stored in MarkLogic. For more information on MarkLogic, please check out: http://www.marklogic.com/products/overview.html Thanks, Jane --- On Tue, 5/31/11, cs230 chintanjs...@gmail.com wrote: From: cs230 chintanjs...@gmail.com Subject: trying to select technology To: core-u...@hadoop.apache.org Date: Tuesday, May 31, 2011, 10:50 AM Hello All, I am planning to start project where I have to do extensive storage of xml and text files. On top of that I have to implement efficient algorithm for searching over thousands or millions of files, and also do some indexes to make search faster next time. I looked into Oracle database but it delivers very poor result. Can I use Hadoop for this? Which Hadoop project would be best fit for this? Is there anything from Google I can use? Thanks a lot in advance. -- View this message in context: http://old.nabble.com/trying-to-select-technology-tp31743063p31743063.html Sent from the Hadoop core-user mailing list archive at Nabble.com.