finding the input file of a failed map task

2009-04-27 Thread Sandhya E
In the JobTracker website, when I click on a JobId, there is a listing
of completed maps and killed maps. When I click on the number under
the column completed or killed, there is a table with columns as
mentioned below.

Task, Complete, Status, Start Time, Finish Time, Errors

Status column is blank for Failed jobs, while for completed jobs it
lists the actual inputfile/block on which this map was executed. This
is the exact information that I'm looking for in case of a failed job.
Our jobs run on numerous files, and sometimes some input files are
corrupt. So if a failed map task can also show me what was the input
file it was working on, I can quickly remove that corrupt input file
and rerun the job.

Please let me know if this information can be obtained in any other way.

Thanks  Regards
Sandhya


IO Exception in Map Tasks

2009-04-27 Thread Rakhi Khatwani
Hi,

  In one of the map tasks, i get the following exception:
  java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

what could be the reason?

Thanks,
Raakhi


Re: Storing data-node content to other machine

2009-04-27 Thread jason hadoop
There is no requirement that your hdfs and mapred clusters share an
installation directory, it is just done that way because it is simple and
most people have a datanode and tasktracker on each slave node.

Simply have 2 configuration directories on your cluster machines, and us the
bin/start-dfs.sh script in one, and the bin/start-mapred.sh script in the
other, and maintain different slaves files in the two directories.

You will loose the benefit of data locality for your tasktrackers which do
not reside on the datanode machines.

On Sun, Apr 26, 2009 at 10:06 PM, Vishal Ghawate 
vishal_ghaw...@persistent.co.in wrote:

 Hi,
 I want to store the contents of all the client machine(datanode)of hadoop
 cluster to centralized machine
  with high storage capacity.so that tasktracker will be on the client
 machine but the contents are stored on the
 centralized machine.
Can anybody help me on this please.

 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is
 the property of Persistent Systems Ltd. It is intended only for the use of
 the individual or entity to which it is addressed. If you are not the
 intended recipient, you are not authorized to read, retain, copy, print,
 distribute or use this message. If you have received this communication in
 error, please notify the sender and delete all copies of this message.
 Persistent Systems Ltd. does not accept any liability for virus infected
 mails.




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: IO Exception in Map Tasks

2009-04-27 Thread jason hadoop
The jvm had a hard failure and crashed


On Sun, Apr 26, 2009 at 11:34 PM, Rakhi Khatwani
rakhi.khatw...@gmail.comwrote:

 Hi,

  In one of the map tasks, i get the following exception:
  java.io.IOException: Task process exit with nonzero status of 255.
 at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

 java.io.IOException: Task process exit with nonzero status of 255.
 at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)

 what could be the reason?

 Thanks,
 Raakhi




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: IO Exception in Map Tasks

2009-04-27 Thread Rakhi Khatwani
Thanks Jason,
  is there any way we can avoid this exception??

Thanks,
Raakhi

On Mon, Apr 27, 2009 at 1:20 PM, jason hadoop jason.had...@gmail.comwrote:

 The jvm had a hard failure and crashed


 On Sun, Apr 26, 2009 at 11:34 PM, Rakhi Khatwani
 rakhi.khatw...@gmail.comwrote:

  Hi,
 
   In one of the map tasks, i get the following exception:
   java.io.IOException: Task process exit with nonzero status of 255.
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)
 
  java.io.IOException: Task process exit with nonzero status of 255.
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)
 
  what could be the reason?
 
  Thanks,
  Raakhi
 



 --
 Alpha Chapters of my book on Hadoop are available
 http://www.apress.com/book/view/9781430219422



Balancing datanodes - Running hadoop 0.18.3

2009-04-27 Thread Usman Waheed

Hi,
I had sent out an email yesterday asking about how to balance the 
cluster after setting the replication level to 2. I have 4 datanodes and 
one namenode in my setup.
Using the -R switch with -setrep did the trick but one of my nodes 
became under utilized. I then ran hadoop balancer and it did help but 
upto a certain extent.


Datanode 4 noted below is now up to almost 5% but when i try to balance 
the datanode again using the hadoop balance command it says that the 
cluster is already balanced which isnt.
I wonder if there is an alternate way(s) or maybe overtime Datanode-4 
will pick up more blocks?


Any clues?

Thanks,
Usman

Name: 1
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 35858599(206.97 GB)
Used raw bytes: 48140136448 (44.83 GB)
% used: 16.39%
Last contact: Mon Apr 27 08:34:46 UTC 2009


Name: 2
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 231235100994(215.35 GB)
Used raw bytes: 40704245760 (37.91 GB)
% used: 13.86%
Last contact: Mon Apr 27 08:34:45 UTC 2009


Name: 3
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 211936026161(197.38 GB)
Used raw bytes: 59591700480 (55.5 GB)
% used: 20.28%
Last contact: Mon Apr 27 08:34:45 UTC 2009


*Name: 4
*State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 258876991693(241.1 GB)
Used raw bytes: 12142653440 (11.31 GB)
% used: 4.13%
Last contact: Mon Apr 27 08:34:46 UTC 2009



write a large file to HDFS?

2009-04-27 Thread Xie, Tao

hi, 
If I write a large file to HDFS, will it be split into blocks and
multi-blocks are written to HDFS at the same time? Or HDFS can only write
block by block? 
Thanks.
-- 
View this message in context: 
http://www.nabble.com/write-a-large-file-to-HDFS--tp23252754p23252754.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Blocks replication in downtime even

2009-04-27 Thread Stas Oskin
Hi.

I have a question:

If I have N of DataNodes, and one or several of the nodes have become
unavailable, would HDFS re-synchronize the blocks automatically, according
to replication level set?
And if yes, when? As soon as the offline node was detected, or only on file
access?

Regards.


Re: Balancing datanodes - Running hadoop 0.18.3

2009-04-27 Thread Tamir Kamara
Hi,

The balancer works with the average utilization of all the nodes in the
cluster - in your case it's about 13%. Only nodes that are +/- 10% off the
average will be rebalanced. Node 4 isn't under-utilized because 13-10=3
which is less than 4%. You can use a different threshold than the default
10% (hadoop balancer -threshold 5). Read more here:
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer

Tamir


On Mon, Apr 27, 2009 at 11:36 AM, Usman Waheed usm...@opera.com wrote:

 Hi,
 I had sent out an email yesterday asking about how to balance the cluster
 after setting the replication level to 2. I have 4 datanodes and one
 namenode in my setup.
 Using the -R switch with -setrep did the trick but one of my nodes became
 under utilized. I then ran hadoop balancer and it did help but upto a
 certain extent.

 Datanode 4 noted below is now up to almost 5% but when i try to balance the
 datanode again using the hadoop balance command it says that the cluster
 is already balanced which isnt.
 I wonder if there is an alternate way(s) or maybe overtime Datanode-4 will
 pick up more blocks?

 Any clues?

 Thanks,
 Usman

 Name: 1
 State  : In Service
 Total raw bytes: 293778976768 (273.6 GB)
 Remaining raw bytes: 35858599(206.97 GB)
 Used raw bytes: 48140136448 (44.83 GB)
 % used: 16.39%
 Last contact: Mon Apr 27 08:34:46 UTC 2009


 Name: 2
 State  : In Service
 Total raw bytes: 293778976768 (273.6 GB)
 Remaining raw bytes: 231235100994(215.35 GB)
 Used raw bytes: 40704245760 (37.91 GB)
 % used: 13.86%
 Last contact: Mon Apr 27 08:34:45 UTC 2009


 Name: 3
 State  : In Service
 Total raw bytes: 293778976768 (273.6 GB)
 Remaining raw bytes: 211936026161(197.38 GB)
 Used raw bytes: 59591700480 (55.5 GB)
 % used: 20.28%
 Last contact: Mon Apr 27 08:34:45 UTC 2009


 *Name: 4
 *State  : In Service
 Total raw bytes: 293778976768 (273.6 GB)
 Remaining raw bytes: 258876991693(241.1 GB)
 Used raw bytes: 12142653440 (11.31 GB)
 % used: 4.13%
 Last contact: Mon Apr 27 08:34:46 UTC 2009




Re: Balancing datanodes - Running hadoop 0.18.3

2009-04-27 Thread Usman Waheed

Hi Tamir,

Thanks for the info, makes sense now :).

Cheers,
Usman

Hi,

The balancer works with the average utilization of all the nodes in the
cluster - in your case it's about 13%. Only nodes that are +/- 10% off the
average will be rebalanced. Node 4 isn't under-utilized because 13-10=3
which is less than 4%. You can use a different threshold than the default
10% (hadoop balancer -threshold 5). Read more here:
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer

Tamir


On Mon, Apr 27, 2009 at 11:36 AM, Usman Waheed usm...@opera.com wrote:

  

Hi,
I had sent out an email yesterday asking about how to balance the cluster
after setting the replication level to 2. I have 4 datanodes and one
namenode in my setup.
Using the -R switch with -setrep did the trick but one of my nodes became
under utilized. I then ran hadoop balancer and it did help but upto a
certain extent.

Datanode 4 noted below is now up to almost 5% but when i try to balance the
datanode again using the hadoop balance command it says that the cluster
is already balanced which isnt.
I wonder if there is an alternate way(s) or maybe overtime Datanode-4 will
pick up more blocks?

Any clues?

Thanks,
Usman

Name: 1
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 35858599(206.97 GB)
Used raw bytes: 48140136448 (44.83 GB)
% used: 16.39%
Last contact: Mon Apr 27 08:34:46 UTC 2009


Name: 2
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 231235100994(215.35 GB)
Used raw bytes: 40704245760 (37.91 GB)
% used: 13.86%
Last contact: Mon Apr 27 08:34:45 UTC 2009


Name: 3
State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 211936026161(197.38 GB)
Used raw bytes: 59591700480 (55.5 GB)
% used: 20.28%
Last contact: Mon Apr 27 08:34:45 UTC 2009


*Name: 4
*State  : In Service
Total raw bytes: 293778976768 (273.6 GB)
Remaining raw bytes: 258876991693(241.1 GB)
Used raw bytes: 12142653440 (11.31 GB)
% used: 4.13%
Last contact: Mon Apr 27 08:34:46 UTC 2009





  




.20.0, Partitioners?

2009-04-27 Thread Ryan Farris
Is there some magic to get a Partitioner working on .20.0?  Setting
the partitioner class on the Job object doesn't take, hadoop always
uses the HashPartitioner.  Looking through the source code, it looks
like the MapOutputBuffer in MapTask only ever fetches the
mapred.partitioner.class, and doesn't check for new api's
mapreduce.partitioner.class, but I'm not confident in my
understanding of how things work.

I was eventually able to get my test program working correctly by:
  1) Creating a partitioner that extends the deprecated
org.apache.hadoop.mapred.Partitioner class.
  2) Calling job.getConfiguration().set(mapred.partitioner.class,
DeprecatedTestPartitioner.class.getCanonicalName());
  3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
where it asserts that mapred.partitioner.class is null

But I'm assuming editing the hadoop core sourcecode is not the
intended path.  Am I missing some simple switch or something?

rf


ANN: R and Hadoop = RHIPE 0.1

2009-04-27 Thread Saptarshi Guha
Hello,
I'd like to announce the release of the 0.1 version of RHIPE -R and
Hadoop Integrated Processing Environment. Using RHIPE, it is possible
to write map-reduce algorithms using the R language and start them
from within R.
RHIPE is built on Hadoop and so benefits from Hadoop's fault
tolerance, distributed file system and job scheduling features.
For the R user, there is rhlapply which runs an lapply across the cluster.
For the Hadoop user, there is rhmr which runs a general map-reduce program.

The tired example of counting words:

m - function(key,val){
  words - substr(val, +)[[1]]
  wc - table(words)
  cln - names(wc)
  return(sapply(1:length(wc),function(r)
list(key=cln[r],value=wc[[r]]),simplify=F))
}
r - function(key,value){
  value - do.call(rbind,value)
  return(list(list(key=key,value=sum(value
}
rhmr(mapper=m,reduce=r,input.folder=X,output.folder=Y)

URL: http://ml.stat.purdue.edu/rhipe

There are some downsides to RHIPE which are described at
http://ml.stat.purdue.edu/rhipe/install.html#sec-5

Regards
Saptarshi Guha


Re: Can't start fully-distributed operation of Hadoop in Sun Grid Engine

2009-04-27 Thread Jasmine (Xuanjing) Huang
I have contacted with the administor of our cluster and he gave me the 
access. Now my program can work under full distributed mode.


Thanks a lot.

Jasmine
- Original Message - 
From: jason hadoop jason.had...@gmail.com

To: core-user@hadoop.apache.org
Sent: Sunday, April 26, 2009 12:13 PM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun Grid 
Engine




It may be that the sun grid is similar to the EC2 and the machines have an
internal IPaddress/name that MUST be used for inter machine communication
and an external IPaddress/name that is only for internet access.

The above overly complex sentence basically states there may be some
firewall rules/tools in the sun grid that you need to be aware of and use.

On Sun, Apr 26, 2009 at 6:31 AM, Jasmine (Xuanjing) Huang 
xjhu...@cs.umass.edu wrote:


Hi, Jason,

Thanks for your advice, after insert port into the file of
hadoop-site.xml, I can start namenode and run job now.
But my system works only when I  set localhost to masters and add 
localhost

(as well as some other nodes) to slavers file. And all the tasks are
Data-local map tasks. I wonder if whether I enter fully distributed mode, 
or

still in pseudo mode.

As for the SGE, I am only a user and know little about it. This is the 
user

manual of our cluster:
http://www.cs.umass.edu/~swarm/index.php?n=Main.UserDochttp://www.cs.umass.edu/%7Eswarm/index.php?n=Main.UserDoc

Best,
Jasmine

- Original Message - From: jason hadoop 
jason.had...@gmail.com

To: core-user@hadoop.apache.org
Sent: Sunday, April 26, 2009 12:06 AM
Subject: Re: Can't start fully-distributed operation of Hadoop in Sun 
Grid

Engine



 the parameter you specify for fs.default name should be of the form
hdfs://host:port and the parameter you specify for the 
mapred.job.tracker

MUST be host:port. I haven't looked at 18.3,  but it appears that the
:port
is mandatory.

In your case, the piece of code parsing the fs.default.name variable is
not
able to tokenize it into protocol host and port correctly

recap:
fs.default.name hdfs://namenodeHost:port
mapred.job.tracker jobtrackerHost:port
sepecify all the parts above and try again.

Can you please point me at information on using the sun grid, I want to
include a paragraph or two about it in my book.

On Sat, Apr 25, 2009 at 4:28 PM, Jasmine (Xuanjing) Huang 
xjhu...@cs.umass.edu wrote:

 Hi, there,


My hadoop system (version: 0.18.3) works well under standalone and
pseudo-distributed operation. But if I try to run hadoop in
fully-distributed mode in Sun Grid Engine, Hadoop always failed -- in
fact,
the jobTracker and TaskzTracker can be started, but the namenode and
secondary namenode cannot be started. Could anyone help me with it?

My SGE scripts looks like:

#!/bin/bash
#$ -cwd
#$ -S /bin/bash
#$ -l long=TRUE
#$ -v JAVA_HOME=/usr/java/latest
#$ -v HADOOP_HOME=*
#$ -pe hadoop 6
PATH=$HADOOP_HOME/bin:$PATH
hadoop fs -put 
hadoop jar *
hadoop fs -get *

Then the output looks like:
Exception in thread main java.lang.NumberFormatException: For input
string: 
 at
java.lang.NumberFormatException.forInputString(NumberFormatException.
java:48)
 at java.lang.Integer.parseInt(Integer.java:468)
 at java.lang.Integer.parseInt(Integer.java:497)
 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
 at org.apache.hadoop.dfs.NameNode.getAddress(NameNode.java:116)
 at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFil
eSystem.java:66)
 at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1339
)
 at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
 at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
 at org.apache.hadoop.fs.FsShell.run(FsShell.java:1703)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.fs.FsShell.main(FsShell.java:1852)

And the log of NameNode looks like
2009-04-25 17:27:17,032 INFO org.apache.hadoop.dfs.NameNode: 
STARTUP_MSG:

/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = 
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.18.3
/
2009-04-25 17:27:17,147 ERROR org.apache.hadoop.dfs.NameNode:
java.lang.NumberFormatException: For i
nput string: 
 at

java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
 at java.lang.Integer.parseInt(Integer.java:468)
 at java.lang.Integer.parseInt(Integer.java:497)
 at
org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:144)
 at 

RE: Blocks replication in downtime even

2009-04-27 Thread Koji Noguchi
http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Disk+Fa
ilure%2C+Heartbeats+and+Re-Replication

hope this helps.

Koji

-Original Message-
From: Stas Oskin [mailto:stas.os...@gmail.com] 
Sent: Monday, April 27, 2009 4:11 AM
To: core-user@hadoop.apache.org
Subject: Blocks replication in downtime even

Hi.

I have a question:

If I have N of DataNodes, and one or several of the nodes have become
unavailable, would HDFS re-synchronize the blocks automatically,
according
to replication level set?
And if yes, when? As soon as the offline node was detected, or only on
file
access?

Regards.


Re: Datanode Setup

2009-04-27 Thread jpe30

bump*

Any suggestions?

-- 
View this message in context: 
http://www.nabble.com/Datanode-Setup-tp23064660p23259364.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: IO Exception in Map Tasks

2009-04-27 Thread jason hadoop
You will need to figure out why your task crashed,
Check the task logs, there may be some messages there, that give you a hint
as to what is going on.

you can enable saving failed task logs and then run the task standalone in
the isolation runner.
chapter 7 of my book (alpha available) provides details on this, hoping the
failure repeats in the controlled environment.

You could unlimit the core dump size, via hadoop-env.sh *ulimit -c unlimited
*, but that will require that the failed task logs be available as the core
will be in the task working directory.


On Mon, Apr 27, 2009 at 1:30 AM, Rakhi Khatwani rakhi.khatw...@gmail.comwrote:

 Thanks Jason,
  is there any way we can avoid this exception??

 Thanks,
 Raakhi

 On Mon, Apr 27, 2009 at 1:20 PM, jason hadoop jason.had...@gmail.com
 wrote:

  The jvm had a hard failure and crashed
 
 
  On Sun, Apr 26, 2009 at 11:34 PM, Rakhi Khatwani
  rakhi.khatw...@gmail.comwrote:
 
   Hi,
  
In one of the map tasks, i get the following exception:
java.io.IOException: Task process exit with nonzero status of 255.
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)
  
   java.io.IOException: Task process exit with nonzero status of 255.
   at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:424)
  
   what could be the reason?
  
   Thanks,
   Raakhi
  
 
 
 
  --
  Alpha Chapters of my book on Hadoop are available
  http://www.apress.com/book/view/9781430219422
 




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: write a large file to HDFS?

2009-04-27 Thread jason hadoop
block by block.
open multiple connections and write multiple files if you are not saturating
your network connection.
Generally a single file writer writing large blocks rapidly will do a decent
job of saturating things.

On Mon, Apr 27, 2009 at 2:22 AM, Xie, Tao xietao1...@gmail.com wrote:


 hi,
 If I write a large file to HDFS, will it be split into blocks and
 multi-blocks are written to HDFS at the same time? Or HDFS can only write
 block by block?
 Thanks.
 --
 View this message in context:
 http://www.nabble.com/write-a-large-file-to-HDFS--tp23252754p23252754.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: .20.0, Partitioners?

2009-04-27 Thread Jothi Padmanabhan
Ryan,

I observed this behavior too -- Partitioner does not seems to work with the
new API exactly for the reason you have mentioned. Till this gets fixed, you
probably need to use the old API.

Jothi


On 4/27/09 7:14 PM, Ryan Farris farri...@gmail.com wrote:

 Is there some magic to get a Partitioner working on .20.0?  Setting
 the partitioner class on the Job object doesn't take, hadoop always
 uses the HashPartitioner.  Looking through the source code, it looks
 like the MapOutputBuffer in MapTask only ever fetches the
 mapred.partitioner.class, and doesn't check for new api's
 mapreduce.partitioner.class, but I'm not confident in my
 understanding of how things work.
 
 I was eventually able to get my test program working correctly by:
   1) Creating a partitioner that extends the deprecated
 org.apache.hadoop.mapred.Partitioner class.
   2) Calling job.getConfiguration().set(mapred.partitioner.class,
 DeprecatedTestPartitioner.class.getCanonicalName());
   3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
 where it asserts that mapred.partitioner.class is null
 
 But I'm assuming editing the hadoop core sourcecode is not the
 intended path.  Am I missing some simple switch or something?
 
 rf



Re: .20.0, Partitioners?

2009-04-27 Thread Jothi Padmanabhan
I created 

https://issues.apache.org/jira/browse/HADOOP-5750

to follow this up. 

Thanks
Jothi


On 4/27/09 10:10 PM, Jothi Padmanabhan joth...@yahoo-inc.com wrote:

 Ryan,
 
 I observed this behavior too -- Partitioner does not seems to work with the
 new API exactly for the reason you have mentioned. Till this gets fixed, you
 probably need to use the old API.
 
 Jothi
 
 
 On 4/27/09 7:14 PM, Ryan Farris farri...@gmail.com wrote:
 
 Is there some magic to get a Partitioner working on .20.0?  Setting
 the partitioner class on the Job object doesn't take, hadoop always
 uses the HashPartitioner.  Looking through the source code, it looks
 like the MapOutputBuffer in MapTask only ever fetches the
 mapred.partitioner.class, and doesn't check for new api's
 mapreduce.partitioner.class, but I'm not confident in my
 understanding of how things work.
 
 I was eventually able to get my test program working correctly by:
   1) Creating a partitioner that extends the deprecated
 org.apache.hadoop.mapred.Partitioner class.
   2) Calling job.getConfiguration().set(mapred.partitioner.class,
 DeprecatedTestPartitioner.class.getCanonicalName());
   3) Commenting out line 395 of org.apache.hadoop.mapreduce.Job.java,
 where it asserts that mapred.partitioner.class is null
 
 But I'm assuming editing the hadoop core sourcecode is not the
 intended path.  Am I missing some simple switch or something?
 
 rf
 



Rescheduling of already completed map/reduce task

2009-04-27 Thread Sagar Naik

Hi,
The job froze after the filesystem hung on a machine which had 
successfully completed a map task.

Is there a flag to enable the re scheduling of such a task ?


Jstack of job tracker

SocketListener0-2 prio=10 tid=0x08916000 nid=0x4a4f runnable 
[0x4d05c000..0x4d05ce30]

  java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at org.mortbay.util.LineInput.fill(LineInput.java:469)
   at org.mortbay.util.LineInput.fillLine(LineInput.java:547)
   at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:293)
   at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:277)
   at org.mortbay.http.HttpRequest.readHeader(HttpRequest.java:238)
   at 
org.mortbay.http.HttpConnection.readRequest(HttpConnection.java:861)
   at 
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:907)

   at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
   at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)

   at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
   at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

  Locked ownable synchronizers:
   - None


SocketListener0-1 prio=10 tid=0x4da8c800 nid=0xeeb runnable 
[0x4d266000..0x4d2670b0]

  java.lang.Thread.State: RUNNABLE
   at java.net.SocketInputStream.socketRead0(Native Method)
   at java.net.SocketInputStream.read(SocketInputStream.java:129)
   at org.mortbay.util.LineInput.fill(LineInput.java:469)
   at org.mortbay.util.LineInput.fillLine(LineInput.java:547)
   at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:293)
   at org.mortbay.util.LineInput.readLineBuffer(LineInput.java:277)
   at org.mortbay.http.HttpRequest.readHeader(HttpRequest.java:238)
   at 
org.mortbay.http.HttpConnection.readRequest(HttpConnection.java:861)
   at 
org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:907)

   at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
   at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)

   at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
   at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

IPC Server listener on 54311 daemon prio=10 tid=0x4df70400 nid=0xe86 
runnable [0x4d9fe000..0x4d9feeb0]

  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x54fb4320 (a sun.nio.ch.Util$1)
   - locked 0x54fb4310 (a java.util.Collections$UnmodifiableSet)
   - locked 0x54fb40b8 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
   at org.apache.hadoop.ipc.Server$Listener.run(Server.java:296)

  Locked ownable synchronizers:
   - None

IPC Server Responder daemon prio=10 tid=0x4da22800 nid=0xe85 runnable 
[0x4db75000..0x4db75e30]

  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
   at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
   at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
   - locked 0x54f0 (a sun.nio.ch.Util$1)
   - locked 0x54fdce10 (a java.util.Collections$UnmodifiableSet)
   - locked 0x54fdcc18 (a sun.nio.ch.EPollSelectorImpl)
   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
   at org.apache.hadoop.ipc.Server$Responder.run(Server.java:455)

  Locked ownable synchronizers:
   - None

RMI TCP Accept-0 daemon prio=10 tid=0x4da13400 nid=0xe31 runnable 
[0x4de55000..0x4de56130]

  java.lang.Thread.State: RUNNABLE
   at java.net.PlainSocketImpl.socketAccept(Native Method)
   at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
   - locked 0x54f6dae0 (a java.net.SocksSocketImpl)
   at java.net.ServerSocket.implAccept(ServerSocket.java:453)
   at java.net.ServerSocket.accept(ServerSocket.java:421)
   at 
sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
   at 
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
   at 
sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)

   at java.lang.Thread.run(Thread.java:619)

  Locked ownable synchronizers:
   - None

-Sagar


Re: Blocks replication in downtime even

2009-04-27 Thread Stas Oskin
Thanks.

2009/4/27 Koji Noguchi knogu...@yahoo-inc.com

 http://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Disk+Fa
 ilure%2C+Heartbeats+and+Re-Replicationhttp://hadoop.apache.org/core/docs/current/hdfs_design.html#Data+Disk+Fa%0Ailure%2C+Heartbeats+and+Re-Replication

 hope this helps.

 Koji

 -Original Message-
 From: Stas Oskin [mailto:stas.os...@gmail.com]
 Sent: Monday, April 27, 2009 4:11 AM
 To: core-user@hadoop.apache.org
 Subject: Blocks replication in downtime even

 Hi.

 I have a question:

 If I have N of DataNodes, and one or several of the nodes have become
 unavailable, would HDFS re-synchronize the blocks automatically,
 according
 to replication level set?
 And if yes, when? As soon as the offline node was detected, or only on
 file
 access?

 Regards.



Re: How to set System property for my job

2009-04-27 Thread mlimotte

I think what you want is the section Task Execution  Environment in 
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html .  Here is a
sample from that document:

property
  namemapred.child.java.opts/name
  value
 -Xmx512M -Djava.library.path=/home/mycompany/lib -verbose:gc
-Xloggc:/tmp/@tas...@.gc
 -Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
  /value
/property 


-Marc


Tarandeep wrote:
 
 Hi,
 
 While submitting a job to Hadoop, how can I set system properties that are
 required by my code ?
 Passing -Dmy.prop=myvalue to the hadoop job command is not going to work
 as
 hadoop command will pass this to my program as command line argument.
 
 Is there any way to achieve this ?
 
 Thanks,
 Taran
 
 *
 
 *
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-set-System-property-for-my-job-tp18896188p23264520.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Debian support for Cloudera's Distribution

2009-04-27 Thread Christophe Bisciglia
Hey Hadoop fans, just wanted to drop a quick note to let you know that
we now have debian packages for our distribution in addition to RPMs.
We will continue to support both platforms going forward.

Todd Lipcon put in many late nights for this, so next time you see
him, but him a beer :-)

http://www.cloudera.com/hadoop-deb

Cheers,
Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera


Hadoop Training, May 15th: SF Bay Area with Online Participation Available

2009-04-27 Thread Christophe Bisciglia
OK, last announcement from me today :-)

We're hosting a training session in the SF bay area (at the Cloudera
office) on Friday, May 15th.

We're doing two things differently:
1) We've allocated a chunk of discounted early bird registrations -
first come first serve until May 1st, at which point, only regular
registration is available.
2) We're enabling people from outside the bay area to attend through
some pretty impressive web based video remote presence software we've
been piloting - all you need is a browser with flash. If you have a
webcam and mic, all the better. We're working with a startup on this,
and we're really impressed with the technology. Since this is new for
us, we've discounted web based participation significantly for this
session.

registration: http://cloudera.eventbrite.com/

Cheers,
Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera