How to write simple programs using Hadoop?
Is there any chance to see some simple programs for Hadoop (such as Hello world, counting numbers 1-10, reading two numbers and printing the larger one, other number, string and file processing examples,...etc) written in Java/C++. It seems that the only available public code on the world (Internet) is the WordCount program. I learn programming easily and faster by examples and I would appreciate it if anyone can share some simple programs written in Java/C++ for Hadoop . If there is any manuals, examples, links about writing programs for Hadoop, please share it. -- View this message in context: http://www.nabble.com/How-to-write-simple-programs-using-Hadoop--tp17099073p17099073.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: How to write simple programs using Hadoop?
On May 7, 2008, at 12:33 AM, Hadoop wrote: Is there any chance to see some simple programs for Hadoop (such as Hello world, counting numbers 1-10, reading two numbers and printing the larger one, other number, string and file processing examples,...etc) written in Java/C++. It seems that the only available public code on the world (Internet) is the WordCount program. I learn programming easily and faster by examples and I would appreciate it if anyone can share some simple programs written in Java/C++ for Hadoop . If there is any manuals, examples, links about writing programs for Hadoop, please share it. Take a look at the src/examples directory in your hadoop distribution: http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/org/ apache/hadoop/examples/ and http://svn.apache.org/viewvc/hadoop/core/trunk/src/examples/pipes/impl/ Map-Reduce tutorial: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html Hadoop Streaming: http://hadoop.apache.org/core/docs/current/streaming.html Arun -- View this message in context: http://www.nabble.com/How-to-write- simple-programs-using-Hadoop--tp17099073p17099073.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Collecting output not to file
Derek Shaw wrote: Hey, From the examples that I have seen thus far, all of the results from the reduce function are being written to a file. Instead of writing results to a file, I want to store them What do you mean by store and inspect? and inspect them after the job is completed. (I think that I need to implement my own OutputCollector, but I don't know how to tell hadoop to use it.) How can I do this? -Derek
Re: single node Hbase
Try this one http://hadoop.apache.org/hbase/docs/r0.1.1/api/overview-summary.html#overview_description - Yuri. On Wed, May 7, 2008 at 4:40 PM, Ahmed Shiraz Memon [EMAIL PROTECTED] wrote: the link is not working... Shiraz On Mon, Mar 17, 2008 at 9:34 PM, stack [EMAIL PROTECTED] wrote: Try our 'getting started': http://hadoop.apache.org/hbase/docs/current/api/index.html. St.Ack Peter W. wrote: Hello, Are there any Hadoop documentation resources showing how to run the current version of Hbase on a single node? Thanks, Peter W.
Not allow file split
Hi at all, I'm a newbie and I have the following problem. I need to implement an InputFormat such that the isSplitable always returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question no 10). And here there is the problem. I have also to implement the RecordReader interface for returning the whole content of the input file but I don't know how. I have found only examples that uses the LineRecordReader Someone can help me? Thanks -- Roberto Zandonati
Re: Not allow file split
You can implement a custom input format and a record reader. Assuming your record data type is class RecType, the input format should subclass FileInputFormat LongWritable, RecType and the record reader should implement RecordReader LongWritable, RecType In this case the key could be the offset into the file, although it is not very useful since you treat the entire file as one record. The isSplitable() method in the input format should return false. The RecordReader.next( LongWritable pos, RecType val ) method should read the entire file and set val to the file contents. This will ensure that the entire file goes to one map task as a single record. -Rahul Sood [EMAIL PROTECTED] Hi at all, I'm a newbie and I have the following problem. I need to implement an InputFormat such that the isSplitable always returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question no 10). And here there is the problem. I have also to implement the RecordReader interface for returning the whole content of the input file but I don't know how. I have found only examples that uses the LineRecordReader Someone can help me? Thanks
Where is the files?
Hi All, I started Hadoop in standalone mode, and put some file on to HDSF. I strictly followed the instructions in Hadoop Quick Start. HDSF is mapped to a local directory in my local file system, right? and where is it? Thank you in advance!
Re: Where is the files?
it will be mapped to /tmp -- equivalanet to drive of HADOOP_ROOT/tmp in windows Regards, -Vikas. On Wed, May 7, 2008 at 8:06 PM, hong [EMAIL PROTECTED] wrote: Hi All, I started Hadoop in standalone mode, and put some file on to HDSF. I strictly followed the instructions in Hadoop Quick Start. HDSF is mapped to a local directory in my local file system, right? and where is it? Thank you in advance!
Re: Not allow file split
On May 7, 2008, at 6:30 AM, Roberto Zandonati wrote: Hi at all, I'm a newbie and I have the following problem. I need to implement an InputFormat such that the isSplitable always returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question no 10). And here there is the problem. I have also to implement the RecordReader interface for returning the whole content of the input file but I don't know how. I have found only examples that uses the LineRecordReader Couple of things. 1. Take a look at SequenceFileRecordReader: http://svn.apache.org/ viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/ SequenceFileRecordReader.java?view=log 2. If you just want to process a text file as a while or a sequence file as whole (or any existing one) you do not need to implement a 'RecordReader' at all. Just sub-class the InputFormat, override the isSplittable and the RecordReader will work correctly. Take a look at SortValidtor (http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/ org/apache/hadoop/mapred/SortValidator.java) and how it sub-classes SequenceFileInputFormat to implement a NonSplittableSequenceFileInputFormat. Arun
Re: Where is the files?
DFS files are mapped into blocks. Blocks are stored under dfs.data.dir/current. Hairong On 5/7/08 7:36 AM, hong [EMAIL PROTECTED] wrote: Hi All, I started Hadoop in standalone mode, and put some file on to HDSF. I strictly followed the instructions in Hadoop Quick Start. HDSF is mapped to a local directory in my local file system, right? and where is it? Thank you in advance!
Read timed out, Abandoning block blk_-5476242061384228962
What is this bit of the log trying to tell me, and what sorts of things should I be looking at to make sure it doesn't happen? I don't think the network has any basic configuration issues - I can telnet from the machine creating this log to the destination - telnet 10.252.222.239 50010 works fine when I ssh in to the box with this error. 2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out 2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-5476242061384228962 2008-05-07 13:20:31,196 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target node: 10.252.222.239:50010 I'm seeing a fair number of these. My reduces finally complete, but there are usually a couple at the end that take longer than I think they should, and they frequently have these sorts of errors. I'm running 20 machines on ec2 right now, with hadoop version 0.16.4. -- James Moore | [EMAIL PROTECTED] blog.restphone.com
Re: Read timed out, Abandoning block blk_-5476242061384228962
I noticed that there was a hard-coded timeout value of 6000 (ms) in src/java/org/apache/hadoop/dfs/DFSClient.java - as an experiment, I took that way down and now I'm not noticing the problem. (Doesn't mean it's not there, I just don't feel the pain...) This feels like a terrible solution^H^H^H^H^H^hack though, particularly since I haven't yet taken the time to actually understand the code. -- James Moore | [EMAIL PROTECTED] blog.restphone.com
Hadoop Permission Problem
Hi, My datanode and jobtracker are started by user hadoop. And user Test needs to submit the job. So if the user Test copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode=user:hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Re: Read timed out, Abandoning block blk_-5476242061384228962
Taking the timeout out is very dangerous. It may cause your application to hang. You could change the timeout parameter to a larger number. HADOOP-2188 fixed the problem. Check https://issues.apache.org/jira/browse/HADOOP-2188. Hairong On 5/7/08 2:36 PM, James Moore [EMAIL PROTECTED] wrote: I noticed that there was a hard-coded timeout value of 6000 (ms) in src/java/org/apache/hadoop/dfs/DFSClient.java - as an experiment, I took that way down and now I'm not noticing the problem. (Doesn't mean it's not there, I just don't feel the pain...) This feels like a terrible solution^H^H^H^H^H^hack though, particularly since I haven't yet taken the time to actually understand the code.
Re: Read timed out, Abandoning block blk_-5476242061384228962
Hi James Were you able to start all the nodes in the same 'availability zone'? You using the new AMI kernels? If you are using the contrib/ec2 scripts, you might upgrade (just the scripts) to http://svn.apache.org/viewvc/hadoop/core/branches/branch-0.17/src/contrib/ec2/ These support the new kernels and availability zones. My transient errors went away when upgrading. The functional changes are documented here: http://wiki.apache.org/hadoop/AmazonEC2 fyi, you will need to build your own images (via the create-image command) with whatever version of Hadoop you are comfortable with. this will also get you a Ganglia install... ckw On May 7, 2008, at 1:29 PM, James Moore wrote: What is this bit of the log trying to tell me, and what sorts of things should I be looking at to make sure it doesn't happen? I don't think the network has any basic configuration issues - I can telnet from the machine creating this log to the destination - telnet 10.252.222.239 50010 works fine when I ssh in to the box with this error. 2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: Read timed out 2008-05-07 13:20:31,194 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-5476242061384228962 2008-05-07 13:20:31,196 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target node: 10.252.222.239:50010 I'm seeing a fair number of these. My reduces finally complete, but there are usually a couple at the end that take longer than I think they should, and they frequently have these sorts of errors. I'm running 20 machines on ec2 right now, with hadoop version 0.16.4. -- James Moore | [EMAIL PROTECTED] blog.restphone.com Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
Re: Hadoop Permission Problem
Hi Senthil, Since the path myapps is relative, copyFromLocal will copy the file to the home directory, i.e. /user/Test/myapps in your case. If /user/Test doesn't not exist, it will first try to create it. You got AccessControlException because the permission of /user is 755. Hope this helps. Nicholas - Original Message From: Natarajan, Senthil [EMAIL PROTECTED] To: [EMAIL PROTECTED] [EMAIL PROTECTED] Sent: Wednesday, May 7, 2008 2:36:22 PM Subject: Hadoop Permission Problem Hi, My datanode and jobtracker are started by user hadoop. And user Test needs to submit the job. So if the user Test copies file to HDFS, there is a permission error. /usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=Test, access=WRITE, inode=user:hadoop:supergroup:rwxr-xr-x Could you please let me know how other users (other than hadoop) can access HDFS and then submit MapReduce jobs. Where to configure or what default configuration needs to be changed. Thanks, Senthil
Fwd: Collecting output not to file
To clarify: static class TestOutputFormat implements OutputFormat Text, Text { static class TestRecordWriter implements RecordWriter Text, Text { TestOutputFormat output; public TestRecordWriter (TestOutputFormat output, org.apache.hadoop.fs.FileSystem ignored, JobConf job, String name, Progressable progress) { this.output = output; } public void close (Reporter reporter) {} public void write (Text key, Text value) { output.addResults (value.toString ()); } } protected String results = ; public void checkOutputSpecs (org.apache.hadoop.fs.FileSystem ignored, JobConf job) throws IOException {} public RecordWriter Text, Text getRecordWriter (org.apache.hadoop.fs.FileSystem ignored, JobConf job, String name, Progressable progress) { return new TestRecordWriter (this, ignored, job, name, progress); } public void addResults (String r) { results += r + ,; } public String getResults () { return results; } } And then running the task: public int run(String[] args) throws Exception { JobClient.runJob(job); // getOutputFormatcreates a new instance of the outputformat. I want to get the instance of the output format that the reduce function wrote to // The recordWriter that reduce wrote to would be just as good TestOutputFormat results = (TestOutputFormat) job.getOutputFormat (); // Always prints the empty string, not the populated results System.out.println (results: + results.getResults ()); return 0; } Derek Shaw [EMAIL PROTECTED] wrote: Date: Tue, 6 May 2008 23:26:30 -0400 (EDT) From: Derek Shaw [EMAIL PROTECTED] Subject: Collecting output not to file To: core-user@hadoop.apache.org Hey, From the examples that I have seen thus far, all of the results from the reduce function are being written to a file. Instead of writing results to a file, I want to store them and inspect them after the job is completed. (I think that I need to implement my own OutputCollector, but I don't know how to tell hadoop to use it.) How can I do this? -Derek