jobtracker always say 'tip is null'

2012-02-27 Thread Li, Yonggang
Hi All,
I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
always say :
Tip is null
Serious problem.  While updating status, cannot find tasked

Below is jobtrack log:
2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: oldState 
is RUNNING,newState is RUNNING
2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 1
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
org.apache.hadoop.mapred.TaskInProgress@3bf9ff
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: oldState 
is RUNNING,newState is KILLED
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
KILLED
2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
shouldFail is null
2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 1
2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
org.apache.hadoop.mapred.TaskInProgress@a11b29
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
SUCCEEDED
2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
'attempt_20120223171354_20120224185829_0019_m_04_0' ha
s completed task_20120223171354_20120224185829_0019_m_04 successfully.
2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
with id: 'job_20120223171354_20120224160112_0006' of u
ser: 'ecip'
2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus is 
1, newStatus is 3
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_1' from 'tracker_psns200n:localhost/127.0.0.1:56471'
2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
completed task 'attempt_20120223171354_20120224185829_0019
_r_01_2' from 

RE: BZip2 Splittable?

2012-02-27 Thread Daniel Baptista
Thanks to everyone with their help on this. 

We are currently using pig, but I don't think that this is something that we 
are currently using, I will pass this recommendation on!

Thanks again, Dan.

-Original Message-
From: Srinivas Surasani [mailto:hivehadooplearn...@gmail.com] 
Sent: 24 February 2012 21:08
To: common-user@hadoop.apache.org
Subject: Re: BZip2 Splittable?

@Daniel,

If you want to process bz2 files in parallel( more than one mapper/reducer
), you can go for Pig.

See below.

Pig has inbuilt support for processing .bz2 files in parallel (.gz support
is coming soon). If the input file name extension is .bz2, Pig decompresses
the file on the fly and passes the decompressed input stream to your load
function.

Regards,


On Fri, Feb 24, 2012 at 2:59 PM, Rohit ro...@hortonworks.com wrote:

 Hi Daniel,

 Because your MapReduce jobs will not split bzip2 files, each entire bzip2
 file will be processed by one Map task. Thus, if your job takes multiple
 bzip2 text files as the input, then you'll have as many Map tasks as you
 have files running in parallel.

 The Map tasks will be run by your TaskTrackers. Usually the cluster setup
 has the DataNode and the TaskTracker processing running on the same
 machines - so with 6 data nodes, you have 6 tasktrackers.

 Hope that answers your question.


 Rohit Bakhshi



 www.hortonworks.com (http://www.hortonworks.com/)



 On Friday, February 24, 2012 at 7:59 AM, Daniel Baptista wrote:
  Hi Rohit, thanks for the response, this is pretty much as I expected and
 hopefully adds weight to my other thoughts...
 
  Could this mean that all my datanodes are being sent all of the data or
 that only one datanode is executing the job.
 
  Thanks again , Dan.
 
  -Original Message-
  From: Rohit Bakhshi [mailto:ro...@hortonworks.com]
  Sent: 24 February 2012 15:54
  To: common-user@hadoop.apache.org (mailto:common-user@hadoop.apache.org)
  Subject: Re: BZip2 Splittable?
 
  Daniel,
 
  I just noticed your Hadoop version - 0.20.2.
 
  The JIRA fix below is for Hadoop 0.21.0, which is a different version.
 So it may not be supported on your version of Hadoop.
 
  --
  Rohit Bakhshi
  www.hortonworks.com (http://www.hortonworks.com/)
 
 
 
 
  On Friday, February 24, 2012 at 7:49 AM, Rohit Bakhshi wrote:
 
   Hi Daniel,
  
   Bzip2 compression codec allows for splittable files.
  
   According to this Hadoop JIRA improvement, splitting of bzip2
 compressed files in Hadoop jobs is supported:
   https://issues.apache.org/jira/browse/HADOOP-4012
  
   --
   Rohit Bakhshi
   www.hortonworks.com (http://www.hortonworks.com/)
  
  
  
  
   On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote:
  
Hi All,
   
I have a cluster of 6 datanodes, all running hadoop version 0.20.2,
 r911707 that take a series of bzip2 compressed text files as input.
   
I have read conflicting articles regarding whether or not hadoop can
 split these bzip2 files, can anyone give me a definite answer?
   
Thanks is advance, Dan.
 
 
  
 
  CONFIDENTIALITY - This email and any files transmitted with it, are
 confidential, may be legally privileged and are intended solely for the use
 of the individual or entity to whom they are addressed. If this has come to
 you in error, you must not copy, distribute, disclose or use any of the
 information it contains. Please notify the sender immediately and delete
 them from your system.
 
  SECURITY - Please be aware that communication by email, by its very
 nature, is not 100% secure and by communicating with Perform Group by email
 you consent to us monitoring and reading any such correspondence.
 
  VIRUSES - Although this email message has been scanned for the presence
 of computer viruses, the sender accepts no liability for any damage
 sustained as a result of a computer virus and it is the recipient's
 responsibility to ensure that email is virus free.
 
  AUTHORITY - Any views or opinions expressed in this email are solely
 those of the sender and do not necessarily represent those of Perform Group.
 
  COPYRIGHT - Copyright of this email and any attachments belongs to
 Perform Group, Companies House Registration number 6324278.




-- 
Regards,
-- Srinivas
srini...@cloudwick.com


Re: jobtracker always say 'tip is null'

2012-02-27 Thread Harsh J
Hi Yonggang,

Unfortunately you're using a very old version, so its hard to tell
what was wrong with it.

Could you please try upgrading the the most recent stable release
(1.0.x)? We've not seen this issue come up in the last couple of
years, so it may have been a bug fixed quite some time ago.

On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang yongga...@hp.com wrote:
 Hi All,
 I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
 always say :
 Tip is null
 Serious problem.  While updating status, cannot find tasked

 Below is jobtrack log:
 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: 
 oldState is RUNNING,newState is RUNNING
 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 1
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
 org.apache.hadoop.mapred.TaskInProgress@3bf9ff
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
 oldState is RUNNING,newState is KILLED
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
 KILLED
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
 shouldFail is null
 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 1
 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
 org.apache.hadoop.mapred.TaskInProgress@a11b29
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
 SUCCEEDED
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
 'attempt_20120223171354_20120224185829_0019_m_04_0' ha
 s completed task_20120223171354_20120224185829_0019_m_04 successfully.
 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
 with id: 'job_20120223171354_20120224160112_0006' of u
 ser: 'ecip'
 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 3
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:47,312 INFO org.apache.hadoop.mapred.JobTracker: tip is null
 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_03_0' from 'tracker_psns200n:localhost/127.0.0.1:56471'
 2012-02-24 19:20:47,313 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 

Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
Can someone please suggest if parameters like dfs.block.size,
mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
these be set per client job configuration?

On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it need
 to be cluster wide setting in .xml files?

 Also, is there a way to check the block of a given file?



Re: dfs.block.size

2012-02-27 Thread Joey Echeverria
dfs.block.size can be set per job.

mapred.tasktracker.map.tasks.maximum is per tasktracker.

-Joey

On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Can someone please suggest if parameters like dfs.block.size,
 mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
 these be set per client job configuration?

 On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it need
 to be cluster wide setting in .xml files?

 Also, is there a way to check the block of a given file?




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434


Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Sriram Ganesan
Hello All,

I am a beginning hadoop user. I am trying to install hadoop as part of a
single-node setup. I read in the documentation that the supported platforms
are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
setup. I am guessing I need to use some virtualization solution like
VirtualBox
to run Linux. If anyone has a better way of running hadoop on a mac, please
kindly share your experiences. If this question is not appropriate for this
mailing list, I apologize and please kindly let me know what is the best
mailing list to post this question.

Thanks
Sriram


Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Serge Blazhievsky
Hi 

I have detailed instructions online here:

http://hadoopway.blogspot.com/


It works on MAC and all software is open source.

Serge

On 2/26/12 8:28 PM, Sriram Ganesan sriram.b...@gmail.com wrote:

Hello All,

I am a beginning hadoop user. I am trying to install hadoop as part of a
single-node setup. I read in the documentation that the supported
platforms
are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
setup. I am guessing I need to use some virtualization solution like
VirtualBox
to run Linux. If anyone has a better way of running hadoop on a mac,
please
kindly share your experiences. If this question is not appropriate for
this
mailing list, I apologize and please kindly let me know what is the best
mailing list to post this question.

Thanks
Sriram



Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread W.P. McNeill
You don't need any virtualization. Mac OS X is Linux and runs Hadoop as is.


Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Art Ignacio
Good to know about the VirtualBox instructions.

Here are a couple of other links that might help on single node:

Single Node Setup
http://hadoop.apache.org/common/docs/stable/single_node_setup.html

Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)
http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_(Single-Node_Cluster)

Art Ignacio
hortonworks.com

On Mon, Feb 27, 2012 at 8:49 AM, Serge Blazhievsky 
serge.blazhiyevs...@nice.com wrote:

 Hi

 I have detailed instructions online here:

 http://hadoopway.blogspot.com/


 It works on MAC and all software is open source.

 Serge

 On 2/26/12 8:28 PM, Sriram Ganesan sriram.b...@gmail.com wrote:

 Hello All,
 
 I am a beginning hadoop user. I am trying to install hadoop as part of a
 single-node setup. I read in the documentation that the supported
 platforms
 are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
 setup. I am guessing I need to use some virtualization solution like
 VirtualBox
 to run Linux. If anyone has a better way of running hadoop on a mac,
 please
 kindly share your experiences. If this question is not appropriate for
 this
 mailing list, I apologize and please kindly let me know what is the best
 mailing list to post this question.
 
 Thanks
 Sriram




Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Jamack, Peter
You could also use vmware Fusion on a MacŠ I do this when I'm creating a
distributed hadoop cluster with a few data nodes, but just for a single
node,  you can install that on a Mac OSX, no need for virtualization.

Peter J

On 2/26/12 8:28 PM, Sriram Ganesan sriram.b...@gmail.com wrote:

Hello All,

I am a beginning hadoop user. I am trying to install hadoop as part of a
single-node setup. I read in the documentation that the supported
platforms
are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
setup. I am guessing I need to use some virtualization solution like
VirtualBox
to run Linux. If anyone has a better way of running hadoop on a mac,
please
kindly share your experiences. If this question is not appropriate for
this
mailing list, I apologize and please kindly let me know what is the best
mailing list to post this question.

Thanks
Sriram



Re: Can't build hadoop-1.0.1 -- Break building fuse-dfs

2012-02-27 Thread Kumar Ravi

Hello,

I found a work around for this problem

 -- The libhdfs files were elsewhere in the build in $HADOOP_HOME/build/c+
+/Linux-amd64-64/lib/ and not in the $HADOOP_HOME/build/libhdfs directory
as the Makefile in fuse-dfs were pointing to.

Regards,
Kumar

Kumar Ravi



   
  From:   Kumar Ravi/Austin/IBM@IBMUS  
   
  To: common-user@hadoop.apache.org
   
  Date:   02/27/2012 10:22 AM  
   
  Subject:Can't build hadoop-1.0.1 -- Break building fuse-dfs  
   






Hello,

 I am running into the following problem building hadoop-1.0.1:


-
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] make[1]: Nothing to be done for `all-am'.
 [exec] make[1]: Leaving directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs'
 [exec] Making all in src
 [exec] make[1]: Entering directory
`/home/kumar/hadoop-1.0.1/src/contrib/fuse-dfs/src'
 [exec] gcc  -Wall -O3 -L/home/kumar/hadoop-1.0.1/build/libhdfs -lhdfs
-L/lib -lfuse -L/usr/java/jdk1.6.0_27//jre/lib/amd64/server -ljvm  -o
fuse_dfs fuse_dfs.o fuse_options.o fuse_trash.o fuse_stat_struct.o
fuse_users.o fuse_init.o fuse_connect.o fuse_impls_access.o
fuse_impls_chmod.o fuse_impls_chown.o fuse_impls_create.o
fuse_impls_flush.o fuse_impls_getattr.o fuse_impls_mkdir.o
fuse_impls_mknod.o fuse_impls_open.o fuse_impls_read.o fuse_impls_release.o
fuse_impls_readdir.o fuse_impls_rename.o fuse_impls_rmdir.o
fuse_impls_statfs.o fuse_impls_symlink.o fuse_impls_truncate.o
fuse_impls_utimens.o fuse_impls_unlink.o fuse_impls_write.o
 [exec] /usr/bin/ld: cannot find -lhdfs
 [exec] collect2: ld returned 1 exit status

---

Src. was downloaded from --
http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.1/ using
svn, and the ant command with target used was:

ant -Dlibhdfs=true -Dcompile.native=true -Dfusedfs=true -Dcompile.c++=true
-Dforrest.home=/apache-forrest-0.8/ compile-core-native compile-c++
compile-c++-examples task-controller tar record-parser compile-hdfs-classes
package -Djava5.home=/opt/sun/jdk1.5.0_22/


I am using Sun Java JDK 1.6.0_31 -

java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)


I would appreciate any pointers to getting past this problem.




Kumar Ravi

Re: Setting up Hadoop single node setup on Mac OS X

2012-02-27 Thread Keith Wiley
Seconded, I've setup and run Hadoop CDH3 on a recent 10.7(.2) Mac. Works like a 
charm.

Sent from my phone, please excuse my brevity.
Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com


Serge Blazhievsky serge.blazhiyevs...@nice.com wrote:

Hi

I have detailed instructions online here:

http://hadoopway.blogspot.com/


It works on MAC and all software is open source.

Serge

On 2/26/12 8:28 PM, Sriram Ganesan sriram.b...@gmail.com wrote:

Hello All,

I am a beginning hadoop user. I am trying to install hadoop as part of a
single-node setup. I read in the documentation that the supported
platforms
are GNU/Linux and Win32. I have a Mac OS X and wish to run the single-node
setup. I am guessing I need to use some virtualization solution like
VirtualBox
to run Linux. If anyone has a better way of running hadoop on a mac,
please
kindly share your experiences. If this question is not appropriate for
this
mailing list, I apologize and please kindly let me know what is the best
mailing list to post this question.

Thanks
Sriram



Task Killed but no errors

2012-02-27 Thread Mohit Anchlia
I submitted a map reduce job that had 9 tasks killed out of 139. But I
don't see any errors in the admin page. The entire job however has
SUCCEDED. How can I track down the reason?

Also, how do I determine if this is something to worry about?


Re: Task Killed but no errors

2012-02-27 Thread Shi Yu

On 2/27/2012 1:55 PM, Mohit Anchlia wrote:

I submitted a map reduce job that had 9 tasks killed out of 139. But I
don't see any errors in the admin page. The entire job however has
SUCCEDED. How can I track down the reason?

Also, how do I determine if this is something to worry about?


Hi,

You should go to the data nodes and check /logs/userlogs/ directories 
there.


Though I am not either very clear about this case:  If you are working 
on the administrated cluster and you don't have access to the data 
nodes, how to check the error logs.  For me I have to ask administrator 
forward me the error log sometimes.  The logs in jobtracker and namenode 
are very limited.  And the datanode info in admin web page is blocked 
because of security reasons? I didn't follow up the new releases about 
this issue.  Maybe in 1.0.x they have improved methods about this?


Shi


Re: dfs.block.size

2012-02-27 Thread Mohit Anchlia
How do I verify the block size of a given file? Is there a command?

On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote:

 dfs.block.size can be set per job.

 mapred.tasktracker.map.tasks.maximum is per tasktracker.

 -Joey

 On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  Can someone please suggest if parameters like dfs.block.size,
  mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
 can
  these be set per client job configuration?
 
  On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
  If I want to change the block size then can I use Configuration in
  mapreduce job and set it when writing to the sequence file or does it
 need
  to be cluster wide setting in .xml files?
 
  Also, is there a way to check the block of a given file?
 



 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434



Re: dfs.block.size

2012-02-27 Thread Kai Voigt
hadoop fsck filename -blocks is something that I think of quickly.

http://hadoop.apache.org/common/docs/current/commands_manual.html#fsck has more 
details

Kai

Am 28.02.2012 um 02:30 schrieb Mohit Anchlia:

 How do I verify the block size of a given file? Is there a command?
 
 On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote:
 
 dfs.block.size can be set per job.
 
 mapred.tasktracker.map.tasks.maximum is per tasktracker.
 
 -Joey
 
 On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 Can someone please suggest if parameters like dfs.block.size,
 mapred.tasktracker.map.tasks.maximum are only cluster wide settings or
 can
 these be set per client job configuration?
 
 On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
 
 If I want to change the block size then can I use Configuration in
 mapreduce job and set it when writing to the sequence file or does it
 need
 to be cluster wide setting in .xml files?
 
 Also, is there a way to check the block of a given file?
 
 
 
 
 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434
 

-- 
Kai Voigt
k...@123.org






Handling bad records

2012-02-27 Thread Mohit Anchlia
What's the best way to write records to a different file? I am doing xml
processing and during processing I might come accross invalid xml format.
Current I have it under try catch block and writing to log4j. But I think
it would be better to just write it to an output file that just contains
errors.


Re: Bypassing reducer

2012-02-27 Thread Serge Blazhievsky
Try setting numbers of the reducers to 0.


On 2/27/12 2:34 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

Is there a way to completely bypass reduce step? Pig is able to do it but
it doesn't work for me in map reduce program even though I've commented
setReducerClass



RE: jobtracker always say 'tip is null'

2012-02-27 Thread Li, Yonggang
Hi Harsh,
I have tried to install hadoop1.0 in hp-ux but fail to run it. because The 
shell of hp-ux and Linux  syntax is slightly different.


Best Regards

Yonggang Li


-Original Message-
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Monday, February 27, 2012 8:01 PM
To: common-user@hadoop.apache.org
Subject: Re: jobtracker always say 'tip is null'

Hi Yonggang,

Unfortunately you're using a very old version, so its hard to tell
what was wrong with it.

Could you please try upgrading the the most recent stable release
(1.0.x)? We've not seen this issue come up in the last couple of
years, so it may have been a bug fixed quite some time ago.

On Mon, Feb 27, 2012 at 1:47 PM, Li, Yonggang yongga...@hp.com wrote:
 Hi All,
 I am running hadoop0.19.1 in hp-ux and now encounter a problem. Jobtracker 
 always say :
 Tip is null
 Serious problem.  While updating status, cannot find tasked

 Below is jobtrack log:
 2012-02-24 19:20:41,894 INFO org.apache.hadoop.mapred.TaskInProgress: 
 oldState is RUNNING,newState is RUNNING
 2012-02-24 19:20:41,895 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 1
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobTracker: tip is 
 org.apache.hadoop.mapred.TaskInProgress@3bf9ff
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
 oldState is RUNNING,newState is KILLED
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.JobInProgress: state is 
 KILLED
 2012-02-24 19:20:42,259 INFO org.apache.hadoop.mapred.TaskInProgress: 
 shouldFail is null
 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 1
 2012-02-24 19:20:42,260 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_01_3' from 'tracker_psns200n:localhost/127.0.0.1:56471'
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobTracker: tip is 
 org.apache.hadoop.mapred.TaskInProgress@a11b29
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: state is 
 SUCCEEDED
 2012-02-24 19:20:42,500 INFO org.apache.hadoop.mapred.JobInProgress: Task 
 'attempt_20120223171354_20120224185829_0019_m_04_0' ha
 s completed task_20120223171354_20120224185829_0019_m_04 successfully.
 2012-02-24 19:20:42,536 INFO org.apache.hadoop.mapred.JobTracker: Retired job 
 with id: 'job_20120223171354_20120224160112_0006' of u
 ser: 'ecip'
 2012-02-24 19:20:42,570 INFO org.apache.hadoop.mapred.JobTracker: prevStatus 
 is 1, newStatus is 3
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_01_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_04_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_05_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_02_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_0' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,571 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_1' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_2' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:42,572 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_03_3' from 'tracker_psns280n:localhost/127.0.0.1:61244'
 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _m_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:43,499 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_00_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:43,500 INFO org.apache.hadoop.mapred.JobTracker: Removed 
 completed task 'attempt_20120223171354_20120224185829_0019
 _r_04_0' from 'tracker_psns250n:localhost/127.0.0.1:59955'
 2012-02-24 19:20:47,312 INFO 

Re: Invocation exception

2012-02-27 Thread Subir S
On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 For some reason I am getting invocation exception and I don't see any more
 details other than this exception:

 My job is configured as:


 JobConf conf = *new* JobConf(FormMLProcessor.*class*);

 conf.addResource(hdfs-site.xml);

 conf.addResource(core-site.xml);

 conf.addResource(mapred-site.xml);

 conf.set(mapred.reduce.tasks, 0);

 conf.setJobName(mlprocessor);

 DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar),
 conf);

 DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar),
 conf);

 conf.setOutputKeyClass(Text.*class*);

 conf.setOutputValueClass(Text.*class*);

 conf.setMapperClass(Map.*class*);

 conf.setCombinerClass(Reduce.*class*);

 conf.setReducerClass(IdentityReducer.*class*);


Why would you set the Reducer when the number of reducers is set to zero.
Not sure if this is the real cause.



 conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);

 conf.setOutputFormat(TextOutputFormat.*class*);

 FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));

 FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));

 JobClient.*runJob*(conf);

 -
 *

 java.lang.RuntimeException*: Error in configuring object

 at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
 ReflectionUtils.java:93*)

 at
 org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)

 at org.apache.hadoop.util.ReflectionUtils.newInstance(*
 ReflectionUtils.java:117*)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)

 at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)

 at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)

 at java.security.AccessController.doPrivileged(*Native Method*)

 at javax.security.auth.Subject.doAs(*Subject.java:396*)

 at org.apache.hadoop.security.UserGroupInformation.doAs(*
 UserGroupInformation.java:1157*)

 at org.apache.hadoop.mapred.Child.main(*Child.java:264*)

 Caused by: *java.lang.reflect.InvocationTargetException
 *

 at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)

 at sun.reflect.NativeMethodAccessorImpl.invoke(*
 NativeMethodAccessorImpl.java:39*)

 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav



Re: Invocation exception

2012-02-27 Thread Mohit Anchlia
Does it matter if reducer is set even if the no of reducers is 0? Is there
a way to get more clear reason?

On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com wrote:

 On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  For some reason I am getting invocation exception and I don't see any
 more
  details other than this exception:
 
  My job is configured as:
 
 
  JobConf conf = *new* JobConf(FormMLProcessor.*class*);
 
  conf.addResource(hdfs-site.xml);
 
  conf.addResource(core-site.xml);
 
  conf.addResource(mapred-site.xml);
 
  conf.set(mapred.reduce.tasks, 0);
 
  conf.setJobName(mlprocessor);
 
  DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar),
  conf);
 
  DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar),
  conf);
 
  conf.setOutputKeyClass(Text.*class*);
 
  conf.setOutputValueClass(Text.*class*);
 
  conf.setMapperClass(Map.*class*);
 
  conf.setCombinerClass(Reduce.*class*);
 
  conf.setReducerClass(IdentityReducer.*class*);
 

 Why would you set the Reducer when the number of reducers is set to zero.
 Not sure if this is the real cause.


 
  conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
 
  conf.setOutputFormat(TextOutputFormat.*class*);
 
  FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
 
  FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
 
  JobClient.*runJob*(conf);
 
  -
  *
 
  java.lang.RuntimeException*: Error in configuring object
 
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
  ReflectionUtils.java:93*)
 
  at
  org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
 
  at org.apache.hadoop.util.ReflectionUtils.newInstance(*
  ReflectionUtils.java:117*)
 
  at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
 
  at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
 
  at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
 
  at java.security.AccessController.doPrivileged(*Native Method*)
 
  at javax.security.auth.Subject.doAs(*Subject.java:396*)
 
  at org.apache.hadoop.security.UserGroupInformation.doAs(*
  UserGroupInformation.java:1157*)
 
  at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
 
  Caused by: *java.lang.reflect.InvocationTargetException
  *
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke(*
  NativeMethodAccessorImpl.java:39*)
 
  at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
 



Re: Invocation exception

2012-02-27 Thread Mohit Anchlia
On Mon, Feb 27, 2012 at 8:58 PM, Prashant Kommireddi prash1...@gmail.comwrote:

 Tom White's Definitive Guide book is a great reference. Answers to
 most of your questions could be found there.

 I've been through that book but haven't come accross how to debug this
exception. Can you point me to the topic in that book where I'll find this
information?


 Sent from my iPhone

 On Feb 27, 2012, at 8:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote:

  Does it matter if reducer is set even if the no of reducers is 0? Is
 there
  a way to get more clear reason?
 
  On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com
 wrote:
 
  On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
  For some reason I am getting invocation exception and I don't see any
  more
  details other than this exception:
 
  My job is configured as:
 
 
  JobConf conf = *new* JobConf(FormMLProcessor.*class*);
 
  conf.addResource(hdfs-site.xml);
 
  conf.addResource(core-site.xml);
 
  conf.addResource(mapred-site.xml);
 
  conf.set(mapred.reduce.tasks, 0);
 
  conf.setJobName(mlprocessor);
 
  DistributedCache.*addFileToClassPath*(*new*
 Path(/jars/analytics.jar),
  conf);
 
  DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar),
  conf);
 
  conf.setOutputKeyClass(Text.*class*);
 
  conf.setOutputValueClass(Text.*class*);
 
  conf.setMapperClass(Map.*class*);
 
  conf.setCombinerClass(Reduce.*class*);
 
  conf.setReducerClass(IdentityReducer.*class*);
 
 
  Why would you set the Reducer when the number of reducers is set to
 zero.
  Not sure if this is the real cause.
 
 
 
  conf.setInputFormat(SequenceFileAsTextInputFormat.*class*);
 
  conf.setOutputFormat(TextOutputFormat.*class*);
 
  FileInputFormat.*setInputPaths*(conf, *new* Path(args[0]));
 
  FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1]));
 
  JobClient.*runJob*(conf);
 
  -
  *
 
  java.lang.RuntimeException*: Error in configuring object
 
  at org.apache.hadoop.util.ReflectionUtils.setJobConf(*
  ReflectionUtils.java:93*)
 
  at
 
 org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*)
 
  at org.apache.hadoop.util.ReflectionUtils.newInstance(*
  ReflectionUtils.java:117*)
 
  at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*)
 
  at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*)
 
  at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*)
 
  at java.security.AccessController.doPrivileged(*Native Method*)
 
  at javax.security.auth.Subject.doAs(*Subject.java:396*)
 
  at org.apache.hadoop.security.UserGroupInformation.doAs(*
  UserGroupInformation.java:1157*)
 
  at org.apache.hadoop.mapred.Child.main(*Child.java:264*)
 
  Caused by: *java.lang.reflect.InvocationTargetException
  *
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke(*
  NativeMethodAccessorImpl.java:39*)
 
  at
 
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
 
 



Re: Handling bad records

2012-02-27 Thread Harsh J
Mohit,

Use the MultipleOutputs API:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
to have a named output of bad records. There is an example of use
detailed on the link.

On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 What's the best way to write records to a different file? I am doing xml
 processing and during processing I might come accross invalid xml format.
 Current I have it under try catch block and writing to log4j. But I think
 it would be better to just write it to an output file that just contains
 errors.



-- 
Harsh J


Re: Handling bad records

2012-02-27 Thread Mohit Anchlia
Thanks that's helpful. In that example what is A and B referring to? Is
that the output file name?

mos.getCollector(seq, A, reporter).collect(key, new Text(Bye));
mos.getCollector(seq, B, reporter).collect(key, new Text(Chau));


On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote:

 Mohit,

 Use the MultipleOutputs API:

 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
 to have a named output of bad records. There is an example of use
 detailed on the link.

 On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  What's the best way to write records to a different file? I am doing xml
  processing and during processing I might come accross invalid xml format.
  Current I have it under try catch block and writing to log4j. But I think
  it would be better to just write it to an output file that just contains
  errors.



 --
 Harsh J



Need help on hadoop eclipse plugin

2012-02-27 Thread praveenesh kumar
Hi all,

I am trying to use hadoop eclipse plugin on my windows machine to connect
to the my remote hadoop cluster. I am currently using putty to login to the
cluster. So ssh is enable and my windows machine is able to listen to my
hadoop cluster.

I am using hadoop 0.20.205, hadoop-eclipse plugin -0.20.205.jar . eclipse
helios Version: 3.6.2,  Oracle JDK 1.7

If I am using original eclipse-plugin.jar by putting it inside my
$ECLIPSE_HOME/dropins or /plugins folder, I am able to see Hadoop
map-reduce perspective.

But after specifying hadoop NN / JT connections, I am seeing the following
error, whenever I am trying to access the HDFS.

An internal error occurred during: Connecting to DFS lxe9700.
org/apache/commons/configuration/Configuration

Connecting to DFS lxe9700' has encountered a problem.
An internal error occured during  Connecting to DFS

After seeing the .log file .. I am seeing the following lines :

!MESSAGE An internal error occurred during: Connecting to DFS lxe9700.
!STACK 0
java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:37)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.clinit(DefaultMetricsSystem.java:34)
at
org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at
org.apache.hadoop.security.KerberosName.clinit(KerberosName.java:83)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
at
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
at
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
at
org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1436)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122)
at
org.apache.hadoop.eclipse.server.HadoopServer.getDFS(HadoopServer.java:469)
at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(DFSPath.java:146)
at
org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61)
at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(DFSFolder.java:178)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.configuration.Configuration
at
org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:506)
at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422)
at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410)
at
org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 21 more

!ENTRY org.eclipse.jface 4 0 2012-01-03 02:47:50.812
!MESSAGE The command (dfs.browser.action.download) is undefined
!STACK 0
java.lang.Exception
at
org.eclipse.jface.action.ExternalActionManager$CommandCallback.isActive(ExternalActionManager.java:370)
at
org.eclipse.jface.action.ActionContributionItem.isCommandActive(ActionContributionItem.java:647)
at
org.eclipse.jface.action.ActionContributionItem.isVisible(ActionContributionItem.java:703)
at
org.eclipse.jface.action.MenuManager.isChildVisible(MenuManager.java:985)
at org.eclipse.jface.action.MenuManager.update(MenuManager.java:759)
at
org.eclipse.jface.action.MenuManager.handleAboutToShow(MenuManager.java:470)
at org.eclipse.jface.action.MenuManager.access$1(MenuManager.java:465)
at
org.eclipse.jface.action.MenuManager$2.menuShown(MenuManager.java:491)
at
org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:241)
at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1053)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1077)
at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1058)
at org.eclipse.swt.widgets.Control.WM_INITMENUPOPUP(Control.java:4487)
at org.eclipse.swt.widgets.Control.windowProc(Control.java:4190)
at org.eclipse.swt.widgets.Canvas.windowProc(Canvas.java:341)
at org.eclipse.swt.widgets.Decorations.windowProc(Decorations.java:1598)
at org.eclipse.swt.widgets.Shell.windowProc(Shell.java:2038)
at 

hadoop streaming : need help in using custom key value separator

2012-02-27 Thread Austin Chungath
When I am using more than one reducer in hadoop streaming where I am using
my custom separater rather than the tab, it looks like the hadoop shuffling
process is not happening as it should.

This is the reducer output when I am using '\t' to separate my key value
pair that is output from the mapper.

*output from reducer 1:*
10321,22
23644,37
41231,42
23448,20
12325,39
71234,20
*output from reducer 2:*
24123,43
33213,46
11321,29
21232,32

the above output is as expected the first column is the key and the second
value is the count. There are 10 unique keys and 6 of them are in output of
the first reducer and the remaining 4 int the second reducer output.

But now when I use a custom separater for my key value pair output from my
mapper. Here I am using '*' as the separator
-D stream.mapred.output.field.separator=*
-D mapred.reduce.tasks=2

*output from reducer 1:*
10321,5
21232,19
24123,16
33213,28
23644,21
41231,12
23448,18
11321,29
12325,24
71234,9
* *
*output from reducer 2:*
10321,17
21232,13
33213,18
23644,16
41231,30
23448,2
24123,27
12325,15
71234,11

Now both the reducers are getting all the keys and part of the values go to
reducer 1 and part of the reducer go to reducer 2.
Why is it behaving like this when I am using a custom separator, shouldn't
each reducer get a unique key after the shuffling?
I am using Hadoop 0.20.205.0 and below is the command that I am using to
run hadoop streaming. Is there some more options that I should specify for
hadoop streaming to work properly if I am using a custom separator?

hadoop jar
$HADOOP_PREFIX/contrib/streaming/hadoop-streaming-0.20.205.0.jar
-D stream.mapred.output.field.separator=*
-D mapred.reduce.tasks=2
-mapper ./map.py
-reducer ./reducer.py
-file ./map.py
-file ./reducer.py
-input /user/inputdata
-output /user/outputdata
-verbose


Any help is much appreciated,
Thanks,
Austin