Hadoop Engineers ...
Hello - Attributor is looking for talented backend/infrastructure senior software engineers with Hadoop skills. Contact me if you are interested. I will provide more on the job description. Thanks, Htin Hlaing Attributor
RE: HDFS error
Thanks for you help Samuel. I was having problem in both writing and reading. Did run the fsck and removed some damaged files and restarted the dfs. Seems to be OK now. Not exactly sure what happened though. Thanks, Htin -Original Message- From: Samuel Guo [mailto:[EMAIL PROTECTED] Sent: Thursday, October 09, 2008 6:15 PM To: core-user@hadoop.apache.org Subject: Re: HDFS error Does this happen when you want to write some files to HDFS? if it is so, plz check that you have enough space in the disks of your datanode. if this happened when you want to read some files in HDFS, maybe you can run fsck to check if the file is healthy. hope it will be helpful. On Fri, Oct 10, 2008 at 8:21 AM, Htin Hlaing <[EMAIL PROTECTED]> wrote: > Hello - I am experiencing the following HDFS problem across the clusters > sharing the DFS. It's not specific to this particular data node ip > address but the exception is across all other data nodes as well. Any > help is appreciated. > > 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Exception in > createBlockOutputStream java.io.IOException: Bad connect ack with > firstBadLink 10.50.80.108:50010 > 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Abandoning > block blk_2383100013215057496 > 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Waiting to > find target node: 10.50.80.112:50010 > 2008-10-09 14:14:16,604 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_3359685166656187008 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:14:54,370 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_-4901580690304720524 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:17:19,619 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_3359685166656187008 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:17:57,385 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_-4901580690304720524 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:20:25,634 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_3359685166656187008 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:21:09,401 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_-4901580690304720524 from any node: java.io.IOException: > No live nodes contain current block > 2008-10-09 14:23:28,649 INFO org.apache.hadoop.fs.DFSClient: Could not > obtain block blk_3359685166656187008 from any node: java.io.IOException: > No live nodes contain current block > > Is there a knowledge base that I can search in the old posts to the > mailing list? > > Thanks, > Htin >
HDFS error
Hello - I am experiencing the following HDFS problem across the clusters sharing the DFS. It's not specific to this particular data node ip address but the exception is across all other data nodes as well. Any help is appreciated. 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.50.80.108:50010 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Abandoning block blk_2383100013215057496 2008-10-09 14:13:59,732 INFO org.apache.hadoop.fs.DFSClient: Waiting to find target node: 10.50.80.112:50010 2008-10-09 14:14:16,604 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_3359685166656187008 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:14:54,370 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_-4901580690304720524 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:17:19,619 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_3359685166656187008 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:17:57,385 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_-4901580690304720524 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:20:25,634 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_3359685166656187008 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:21:09,401 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_-4901580690304720524 from any node: java.io.IOException: No live nodes contain current block 2008-10-09 14:23:28,649 INFO org.apache.hadoop.fs.DFSClient: Could not obtain block blk_3359685166656187008 from any node: java.io.IOException: No live nodes contain current block Is there a knowledge base that I can search in the old posts to the mailing list? Thanks, Htin
RE: How to input a hdfs file inside a mapper?
I would imagine something like: FSDataInputStream inFileStream = dfsFileSystem.open(dfsFilePath); Don't forget to close after. Thanks, Htin -Original Message- From: Amit_Gupta [mailto:[EMAIL PROTECTED] Sent: Friday, September 26, 2008 5:47 AM To: core-user@hadoop.apache.org Subject: How to input a hdfs file inside a mapper? How can I get an Input stream on a file stored in HDFS inside a mapper or a reducer? thanks Amit -- View this message in context: http://www.nabble.com/How-to-input-a-hdfs-file-inside-a-mapper--tp19687785 p19687785.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
RE: How to instruct Job Tracker to use certain hosts only
Thanks Owen for the suggestion. I wonder if there would be side effects from failing the job from the node consistently? Would job tracker black list the nodes for other jobs as well? Htin -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 10:53 PM To: core-user@hadoop.apache.org Subject: Re: How to instruct Job Tracker to use certain hosts only On Apr 18, 2008, at 1:52 PM, Htin Hlaing wrote: > I would like to run the first job to run on all the compute hosts > in the > cluster (which is by default) and then, I would like to run the > second job > with only on a subset of the hosts (due to some licensing issue). One option would be to set mapred.map.max.attempts and mapred.reduce.max.attempts to larger numbers and have the map or reduce fail if it is run on a bad node. When the task re-runs, it will run on a different node. Eventually it will find a valid node. -- Owen
How to instruct Job Tracker to use certain hosts only
Hi - I have a situation that I cannot seem to get good answer myself. I am using 0.1.6.2. Basically, I have two jobs that I run in order from the same java driver process. I would like to run the first job to run on all the compute hosts in the cluster (which is by default) and then, I would like to run the second job with only on a subset of the hosts (due to some licensing issue). First, I thought that I might be able to do this by giving a different hadoop-site.xml containing the mapred.hosts specification when I run the second job. That does not work because it seems like mapred.hosts is read in when the jobtracker is started. What are my options? 1) Change the cluster to only the subset. Not a good option for me. 2) Between the jobs, restart the hadoop jobtracker with different hadoop-site.xml. I would like to avoid it. 3) Possibly hadoop on demand but that probably means the job chain cannot be in the same java process. Plus, I cannot upgrade to HOD easily yet. 4) I would really appreciate if someone can provide a better alternative that will allow my objective. Thanks, Htin