Increase node-mappers capacity in single node

2011-05-27 Thread Mark question
Hi,

  I tried changing mapreduce.job.maps to be more than 2 , but since I'm
running in pseudo distributed mode, JobTracker is local and hence this
property is not changed.

  I'm running on a 12 core machine and would like to make use of that ... Is
there a way to trick Hadoop?

I also tried using my virtual machine name instead of localhost, but no
luck.

Please help,
Thanks,
Mark


Can not access hadoop cluster from outside

2011-05-27 Thread Jeff Zhang
Hi all,

I meet a wried problem that I can not access hadoop cluster from outside. I
have a client machine, and I can telnet namenode's port 9000 in this client
machine , but I can not access the namenode through command  hadoop fs
10.249.68.39:9000 -ls /

It tells me Bad connection to FS. command aborted. exception: Call to /
10.249.68.39:9000 failed on local exception: java.io.IOException: Connection
reset by peer

Has anyone meet this problem before ? I guess maybe some network
configuration problem, but not sure what's wrong. Thanks



-- 
Best Regards

Jeff Zhang


Re: Can not access hadoop cluster from outside

2011-05-27 Thread Harsh J
What is your ${fs.default.name} set to?

On Fri, May 27, 2011 at 12:29 PM, Jeff Zhang zjf...@gmail.com wrote:
 Hi all,

 I meet a wried problem that I can not access hadoop cluster from outside. I
 have a client machine, and I can telnet namenode's port 9000 in this client
 machine , but I can not access the namenode through command  hadoop fs
 10.249.68.39:9000 -ls /

 It tells me Bad connection to FS. command aborted. exception: Call to /
 10.249.68.39:9000 failed on local exception: java.io.IOException: Connection
 reset by peer

 Has anyone meet this problem before ? I guess maybe some network
 configuration problem, but not sure what's wrong. Thanks



 --
 Best Regards

 Jeff Zhang




-- 
Harsh J


Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Konstantin Boudnik
On Thu, May 26, 2011 at 07:01PM, Xu, Richard  wrote:
 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, 
 DFSCl
 ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File 
 /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
 n
 odes, instead of 1
 java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info 
 could only be replicated to 0 nodes, instead of 1

Is your DFS up running, by any chance? 

Cos


Re: java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal

2011-05-27 Thread Steve Loughran

On 05/26/2011 07:45 PM, subhransu wrote:

Hello Geeks,
  I am a new bee to use hadoop and i am currently installed hadoop-0.20.203.0
I am running the sample programs part of this package but getting this error

Any pointer to fix this ???

~/Hadoop/hadoop-0.20.203.0 788  bin/hadoop jar
hadoop-examples-0.20.203.0.jar sort
java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal
  at
org.apache.hadoop.security.UserGroupInformation.clinit(UserGroupInformation.java:246)
  at java.lang.J9VMInternals.initializeImpl(Native Method)
  at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
  at org.apache.hadoop.mapred.JobClient.init(JobClient.java:437)
  at org.apache.hadoop.examples.Sort.run(Sort.java:82)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.hadoop.examples.Sort.main(Sort.java:187)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at


you're running the IBM JVM.
https://issues.apache.org/jira/browse/HADOOP-7211

Go to the IBM web site and download their slightly-modified version of 
Hadoop that works with their JVM, or switch to the Sun JVM, which is the 
only one that Hadoop is rigorously tested on. Sorry.


-steve




RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
That setting is 3.

From: DAN [mailto:chaidong...@163.com]
Sent: Thursday, May 26, 2011 10:23 PM
To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT]
Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 
cluster

Hi, Richard

Pay attention to Not able to place enough replicas, still in need of 1. Pls 
confirm right
setting of dfs.replication in hdfs-site.xml.

Good luck!
Dan
--


At 2011-05-27 08:01:37,Xu, Richard  
richard...@citi.commailto:richard...@citi.com wrote:



Hi Folks,



We try to get hbase and hadoop running on clusters, take 2 Solaris servers for 
now.



Because of the incompatibility issue between hbase and hadoop, we have to 
stick with hadoop 0.20.2-append release.



It is very straight forward to make hadoop-0.20.203 running, but stuck for 
several days with hadoop-0.20.2, even the official release, not the append 
version.



1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR 
start jobtracker), following errors shown in namenode and jobtracker logs:



2011-05-26 12:30:29,169 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough 
replicas, still in need of 1

2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, DFSCl

ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 n

odes, instead of 1

java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info 
could only be replicated to 0 nodes, instead of 1

at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)

at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)





2. Also, Configured Capacity is 0, cannot put any file to HDFS.



3. in datanode server, no error in logs, but tasktracker logs has the 
following suspicious thing:

2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting

2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 41904: starting

2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
0 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
1 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
2 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
3 on 41904: starting

.

2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
63 on 41904: starting

2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker 
up at: localhost/127.0.0.1:41904

2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
tracker tracker_loanps3d:localhost/127.0.0.1:41904





I have tried all suggestions found so far, including

 1) remove hadoop-name and hadoop-data folders and reformat namenode;

 2) clean up all temp files/folders under /tmp;



But nothing works.



Your help is greatly appreciated.



Thanks,



RX



Re: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Simon
First you need to make sure that your dfs daemons are running.
You can start you namenode and datanode separately on the master and slave
nodes, and see what happens with the following commands:

hadoop namenode
hadoop datanode

The chancess are that your data node can not be started correctly.
Let us know your error logs if there are errors.

HTH~

Thanks
Simon

2011/5/27 Xu, Richard richard...@citi.com

 That setting is 3.

 From: DAN [mailto:chaidong...@163.com]
 Sent: Thursday, May 26, 2011 10:23 PM
 To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT]
 Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203
 cluster

 Hi, Richard

 Pay attention to Not able to place enough replicas, still in need of 1.
 Pls confirm right
 setting of dfs.replication in hdfs-site.xml.

 Good luck!
 Dan
 --


 At 2011-05-27 08:01:37,Xu, Richard  richard...@citi.commailto:
 richard...@citi.com wrote:



 Hi Folks,

 

 We try to get hbase and hadoop running on clusters, take 2 Solaris servers
 for now.

 

 Because of the incompatibility issue between hbase and hadoop, we have to
 stick with hadoop 0.20.2-append release.

 

 It is very straight forward to make hadoop-0.20.203 running, but stuck for
 several days with hadoop-0.20.2, even the official release, not the append
 version.

 

 1. Once try to run start-mapred.sh(hadoop-daemon.sh --config
 $HADOOP_CONF_DIR start jobtracker), following errors shown in namenode and
 jobtracker logs:

 

 2011-05-26 12:30:29,169 WARN
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
 enough replicas, still in need of 1

 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/
 jobtracker.info, DFSCl

 ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException:
 File /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be
 replicated to 0 n

 odes, instead of 1

 java.io.IOException: File 
 /tmp/hadoop-cfadm/mapred/system/jobtracker.infocould only be replicated to 0 
 nodes, instead of 1

 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)

 at
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)

 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

 

 

 2. Also, Configured Capacity is 0, cannot put any file to HDFS.

 

 3. in datanode server, no error in logs, but tasktracker logs has the
 following suspicious thing:

 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server
 Responder: starting

 2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 41904: starting

 2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 0 on 41904: starting

 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 1 on 41904: starting

 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 2 on 41904: starting

 2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 3 on 41904: starting

 .

 2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 63 on 41904: starting

 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker:
 TaskTracker up at: localhost/127.0.0.1:41904

 2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker:
 Starting tracker tracker_loanps3d:localhost/127.0.0.1:41904

 

 

 I have tried all suggestions found so far, including

  1) remove hadoop-name and hadoop-data folders and reformat namenode;

  2) clean up all temp files/folders under /tmp;

 

 But nothing works.

 

 Your help is greatly appreciated.

 

 Thanks,

 

 RX




-- 
Regards,
Simon


Error while trying to connect use s3 with Haddop in pseudo mode

2011-05-27 Thread Subhramanian, Deepak
I am trying to use Amazon s3 with Hadoop pseudo mode. I am getting some
errors in the log for datanode , namenode , jobtracker etc. I did hadoop
namenode -format before starting the hadoop services. Please help. I am able
to use the hadoop and list the directories in my s3 bucket. I am using
Cloudera CDH3 version.

Here is the errors i am getting for different services.

/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = ip-edited
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u0
STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14;
compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011
/
2011-05-27 12:57:58,329 INFO
org.apache.hadoop.security.UserGroupInformation: JAAS Configuration already
set up for Hadoop, not re-installing.
2011-05-27 12:57:58,345 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
java.lang.IllegalArgumentException: Invalid URI for NameNode address (check
fs.default.name): s3n://bucketnameedited is not of scheme 'hdfs'.
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:220)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:205)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:325)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:280)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1533)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1473)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1491)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1616)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1626)

2011-05-27 12:57:58,348 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at ip-edited

/
2011-05-27 12:58:05,472 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ipedited
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u0
STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14;
compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011
/
2011-05-27 12:58:05,785 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.IllegalArgumentException: Invalid URI for NameNode address (check
fs.default.name): s3n://bucketname is not of scheme 'hdfs'.
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:220)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:260)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:461)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1208)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1217)

2011-05-27 12:58:05,786 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at ipedited

2011-05-27 12:58:04,074 INFO org.apache.hadoop.mapred.JobTracker:
STARTUP_MSG:
/
STARTUP_MSG: Starting JobTracker
STARTUP_MSG:   host = ip-edited
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u0
STARTUP_MSG:   build =  -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14;
compiled by 'hudson' on Fri Mar 25 20:19:33 PDT 2011
/
2011-05-27 12:58:04,590 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2011-05-27 12:58:04,593 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)
2011-05-27 12:58:04,594 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2011-05-27 12:58:04,601 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)
2011-05-27 12:58:04,602 INFO
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens
2011-05-27 12:58:04,724 INFO org.apache.hadoop.mapred.JobTracker: Starting
jobtracker with owner as mapred
2011-05-27 12:58:04,780 INFO 

Re:RE: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread DAN
Hi, Richard

You see you have 2 Solaris servers for now, and dfs.replication is setted as 
3.
These don't match.

Good Luck
Dan


At 2011-05-27 19:34:10,Xu, Richard  richard...@citi.com wrote:

That setting is 3.

From: DAN [mailto:chaidong...@163.com]
Sent: Thursday, May 26, 2011 10:23 PM
To: common-user@hadoop.apache.org; Xu, Richard [ICG-IT]
Subject: Re:Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 
cluster

Hi, Richard

Pay attention to Not able to place enough replicas, still in need of 1. Pls 
confirm right
setting of dfs.replication in hdfs-site.xml.

Good luck!
Dan
--


At 2011-05-27 08:01:37,Xu, Richard  
richard...@citi.commailto:richard...@citi.com wrote:



Hi Folks,



We try to get hbase and hadoop running on clusters, take 2 Solaris servers 
for now.



Because of the incompatibility issue between hbase and hadoop, we have to 
stick with hadoop 0.20.2-append release.



It is very straight forward to make hadoop-0.20.203 running, but stuck for 
several days with hadoop-0.20.2, even the official release, not the append 
version.



1. Once try to run start-mapred.sh(hadoop-daemon.sh --config $HADOOP_CONF_DIR 
start jobtracker), following errors shown in namenode and jobtracker logs:



2011-05-26 12:30:29,169 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough 
replicas, still in need of 1

2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, 
DFSCl

ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
n

odes, instead of 1

java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info 
could only be replicated to 0 nodes, instead of 1

at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)

at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)





2. Also, Configured Capacity is 0, cannot put any file to HDFS.



3. in datanode server, no error in logs, but tasktracker logs has the 
following suspicious thing:

2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting

2011-05-25 23:36:10,839 INFO org.apache.hadoop.ipc.Server: IPC Server 
listener on 41904: starting

2011-05-25 23:36:10,852 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
0 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
1 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
2 on 41904: starting

2011-05-25 23:36:10,853 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
3 on 41904: starting

.

2011-05-25 23:36:10,855 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
63 on 41904: starting

2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: 
TaskTracker up at: localhost/127.0.0.1:41904

2011-05-25 23:36:10,950 INFO org.apache.hadoop.mapred.TaskTracker: Starting 
tracker tracker_loanps3d:localhost/127.0.0.1:41904





I have tried all suggestions found so far, including

 1) remove hadoop-name and hadoop-data folders and reformat namenode;

 2) clean up all temp files/folders under /tmp;



But nothing works.



Your help is greatly appreciated.



Thanks,



RX



Re: Increase node-mappers capacity in single node

2011-05-27 Thread Harsh J
Hello Mark,

This is due to a default configuration (tasktracker slots, as we
generally call it) and is covered in the FAQ:
http://wiki.apache.org/hadoop/FAQ#I_see_a_maximum_of_2_maps.2BAC8-reduces_spawned_concurrently_on_each_TaskTracker.2C_how_do_I_increase_that.3F

On Fri, May 27, 2011 at 11:56 AM, Mark question markq2...@gmail.com wrote:
 Hi,

  I tried changing mapreduce.job.maps to be more than 2 , but since I'm
 running in pseudo distributed mode, JobTracker is local and hence this
 property is not changed.

  I'm running on a 12 core machine and would like to make use of that ... Is
 there a way to trick Hadoop?

 I also tried using my virtual machine name instead of localhost, but no
 luck.

 Please help,
 Thanks,
 Mark




-- 
Harsh J


Re: web site doc link broken

2011-05-27 Thread Harsh J
Am not sure if someone's already fixed this, but I head to the first
link and click Learn About, and it gets redirected to the current/
just fine. There's only one such link on the page as well.

On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote:
 Th Hadoop Common home page:
 http://hadoop.apache.org/common/
 has a broken link (Learn About) to the docs. It tries to use:
 http://hadoop.apache.org/common/docs/stable/
 which doesn't exist (404). It should probably be:
 http://hadoop.apache.org/common/docs/current/
 Or, someone has deleted the stable docs, which I can't help you with. :-)
 Thanks.




-- 
Harsh J


Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer

On May 27, 2011, at 7:26 AM, DAN wrote:
 You see you have 2 Solaris servers for now, and dfs.replication is setted 
 as 3.
 These don't match.


That doesn't matter.  HDFS will basically flag any files written with a 
warning that they are under-replicated.

The problem is that the datanode processes aren't running and/or aren't 
communicating to the namenode. That's what the java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
nodes, instead of 1 means.

It should also be pointed out that writing to /tmp (the default) is a 
bad idea.  This should get changed.

Also, since you are running Solaris, check the FAQ on some settings 
you'll need to do in order to make Hadoop's broken username detection to work 
properly, amongst other things.

Using own InputSplit

2011-05-27 Thread Mohit Anchlia
I am new to hadoop and from what I understand by default hadoop splits
the input into blocks. Now this might result in splitting a line of
record into 2 pieces and getting spread accross 2 maps. For eg: Line
abcd might get split into ab and cd. How can one prevent this in
hadoop and pig? I am looking for some examples where I can see how I
can specify my own split so that it logically splits based on the
record delimiter and not the block size. For some reason I am not able
to get right examples online.


Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit,

Please do not cross-post a question to multiple lists unless you're
announcing something.

What you describe, does not happen; and the way the splitting is done
for Text files is explained in good detail here:
http://wiki.apache.org/hadoop/HadoopMapReduce

Hope this solves your doubt :)

On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 I am new to hadoop and from what I understand by default hadoop splits
 the input into blocks. Now this might result in splitting a line of
 record into 2 pieces and getting spread accross 2 maps. For eg: Line
 abcd might get split into ab and cd. How can one prevent this in
 hadoop and pig? I am looking for some examples where I can see how I
 can specify my own split so that it logically splits based on the
 record delimiter and not the block size. For some reason I am not able
 to get right examples online.




-- 
Harsh J


Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
thanks! Just thought it's better to post to multiple groups together
since I didn't know where it belongs :)

On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote:
 Mohit,

 Please do not cross-post a question to multiple lists unless you're
 announcing something.

 What you describe, does not happen; and the way the splitting is done
 for Text files is explained in good detail here:
 http://wiki.apache.org/hadoop/HadoopMapReduce

 Hope this solves your doubt :)

 On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 I am new to hadoop and from what I understand by default hadoop splits
 the input into blocks. Now this might result in splitting a line of
 record into 2 pieces and getting spread accross 2 maps. For eg: Line
 abcd might get split into ab and cd. How can one prevent this in
 hadoop and pig? I am looking for some examples where I can see how I
 can specify my own split so that it logically splits based on the
 record delimiter and not the block size. For some reason I am not able
 to get right examples online.




 --
 Harsh J



Re: Using own InputSplit

2011-05-27 Thread Harsh J
The query fit into mapreduce-user, since it primarily dealt with how
Map/Reduce operates over data, just to clarify :)

On Fri, May 27, 2011 at 10:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 thanks! Just thought it's better to post to multiple groups together
 since I didn't know where it belongs :)

 On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote:
 Mohit,

 Please do not cross-post a question to multiple lists unless you're
 announcing something.

 What you describe, does not happen; and the way the splitting is done
 for Text files is explained in good detail here:
 http://wiki.apache.org/hadoop/HadoopMapReduce

 Hope this solves your doubt :)

 On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 I am new to hadoop and from what I understand by default hadoop splits
 the input into blocks. Now this might result in splitting a line of
 record into 2 pieces and getting spread accross 2 maps. For eg: Line
 abcd might get split into ab and cd. How can one prevent this in
 hadoop and pig? I am looking for some examples where I can see how I
 can specify my own split so that it logically splits based on the
 record delimiter and not the block size. For some reason I am not able
 to get right examples online.




 --
 Harsh J





-- 
Harsh J


Re: Using own InputSplit

2011-05-27 Thread Mohit Anchlia
Actually this link confused me

http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input

Clearly, logical splits based on input-size is insufficient for many
applications since record boundaries must be respected. In such cases,
the application should implement a RecordReader, who is responsible
for respecting record-boundaries and presents a record-oriented view
of the logical InputSplit to the individual task.

But it looks like application doesn't need to do that since it's done
default? Or am I misinterpreting this entirely?

On Fri, May 27, 2011 at 10:08 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 thanks! Just thought it's better to post to multiple groups together
 since I didn't know where it belongs :)

 On Fri, May 27, 2011 at 10:04 AM, Harsh J ha...@cloudera.com wrote:
 Mohit,

 Please do not cross-post a question to multiple lists unless you're
 announcing something.

 What you describe, does not happen; and the way the splitting is done
 for Text files is explained in good detail here:
 http://wiki.apache.org/hadoop/HadoopMapReduce

 Hope this solves your doubt :)

 On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com 
 wrote:
 I am new to hadoop and from what I understand by default hadoop splits
 the input into blocks. Now this might result in splitting a line of
 record into 2 pieces and getting spread accross 2 maps. For eg: Line
 abcd might get split into ab and cd. How can one prevent this in
 hadoop and pig? I am looking for some examples where I can see how I
 can specify my own split so that it logically splits based on the
 record delimiter and not the block size. For some reason I am not able
 to get right examples online.




 --
 Harsh J




Re: Using own InputSplit

2011-05-27 Thread Harsh J
Mohit,

On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Actually this link confused me

 http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input

 Clearly, logical splits based on input-size is insufficient for many
 applications since record boundaries must be respected. In such cases,
 the application should implement a RecordReader, who is responsible
 for respecting record-boundaries and presents a record-oriented view
 of the logical InputSplit to the individual task.

 But it looks like application doesn't need to do that since it's done
 default? Or am I misinterpreting this entirely?

For any type of InputFormat Hadoop provides along with itself, it
should already handle this for you (Text Files (say, \n-ended),
Sequence Files, Avro Datafiles). If you have a custom file format that
defines its own record delimiter character(s); you would surely need
to write your own InputFormat that splits across properly (the wiki
still helps on how to manage the reads across the first split and the
subsequents).

-- 
Harsh J


How to copy over using dfs

2011-05-27 Thread Mohit Anchlia
If I have to overwrite a file I generally use

hadoop dfs -rm file
hadoop dfs -copyFromLocal or -put file

Is there a command to overwrite/replace the file instead of doing rm first?


RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
Hi Allen,

Thanks a lot for your response.

I agree with you that it does not matter with replication settings.

What really bothered me is same environment, same configures, hadoop 0.20.203 
takes us 3 mins, why 0.20.2 took 3 days.

Can you pls. shed more light on how to make Hadoop's broken username detection 
to work properly?

-Original Message-
From: Allen Wittenauer [mailto:a...@apache.org]
Sent: Friday, May 27, 2011 11:42 AM
To: common-user@hadoop.apache.org
Cc: Xu, Richard [ICG-IT]
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 
cluster


On May 27, 2011, at 7:26 AM, DAN wrote:
 You see you have 2 Solaris servers for now, and dfs.replication is setted 
 as 3.
 These don't match.


That doesn't matter.  HDFS will basically flag any files written with a 
warning that they are under-replicated.

The problem is that the datanode processes aren't running and/or aren't 
communicating to the namenode. That's what the java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
nodes, instead of 1 means.

It should also be pointed out that writing to /tmp (the default) is a 
bad idea.  This should get changed.

Also, since you are running Solaris, check the FAQ on some settings 
you'll need to do in order to make Hadoop's broken username detection to work 
properly, amongst other things.


RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Xu, Richard
Add more to that:

I also tried start 0.20.2 on a linux machine in distributed mode, same error.

I had successfully started 0.20.203 on this linux machine with same config.

Seems that it is not related to Solaris.

Could it caused by port? I checked a few, did not find anyone blocked.



-Original Message-
From: Xu, Richard [ICG-IT]
Sent: Friday, May 27, 2011 4:18 PM
To: 'Allen Wittenauer'; 'common-user@hadoop.apache.org'
Subject: RE: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 
cluster

Hi Allen,

Thanks a lot for your response.

I agree with you that it does not matter with replication settings.

What really bothered me is same environment, same configures, hadoop 0.20.203 
takes us 3 mins, why 0.20.2 took 3 days.

Can you pls. shed more light on how to make Hadoop's broken username detection 
to work properly?

-Original Message-
From: Allen Wittenauer [mailto:a...@apache.org]
Sent: Friday, May 27, 2011 11:42 AM
To: common-user@hadoop.apache.org
Cc: Xu, Richard [ICG-IT]
Subject: Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 
cluster


On May 27, 2011, at 7:26 AM, DAN wrote:
 You see you have 2 Solaris servers for now, and dfs.replication is setted 
 as 3.
 These don't match.


That doesn't matter.  HDFS will basically flag any files written with a 
warning that they are under-replicated.

The problem is that the datanode processes aren't running and/or aren't 
communicating to the namenode. That's what the java.io.IOException: File 
/tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
nodes, instead of 1 means.

It should also be pointed out that writing to /tmp (the default) is a 
bad idea.  This should get changed.

Also, since you are running Solaris, check the FAQ on some settings 
you'll need to do in order to make Hadoop's broken username detection to work 
properly, amongst other things.


Has anyone else seen out of memory errors at the start of combiner tasks?

2011-05-27 Thread W.P. McNeill
I have a job that uses an identity mapper and the same code for both the
combiner and the reducer.  In a small percentage of combiner tasks, after a
few seconds I get errors that look like this:

FATAL mapred.TaskTracker: Error running child : java.lang.OutOfMemoryError:
Java heap space
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:781)
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:524)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

Those tasks fail, though then subsequently restart and complete
successfully.  Eventually the whole job completes successfully.
 Nevertheless this happens consistently enough that it is clearly a problem
with my code rather than a transient glitch on my cluster.

From the stack it looks like the out of memory error is happening before any
of my combiner code has had a chance to run.  If I don't specify a combiner
class and run everything through reducers, there are no out of memory errors
and everything works fine.

Obviously I have a bug, but I'm wondering if anyone has seen this particular
failure mode before and has insights into why it is happening.  My
hypothesis is that I have some memory usage within the combiner/reducer code
that doesn't scale to the largest inputs my job is getting. This is a
problem for combiners and not reducers because more combiners than reducers
run on a single task tracker node. The problematic job is not the one that's
failing during initialization but one that is running at the same time on
the same node and chewing up all the memory.  Does this hypothesis sound
plausible?


Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Allen Wittenauer

On May 27, 2011, at 1:18 PM, Xu, Richard wrote:

 Hi Allen,
 
 Thanks a lot for your response.
 
 I agree with you that it does not matter with replication settings.
 
 What really bothered me is same environment, same configures, hadoop 0.20.203 
 takes us 3 mins, why 0.20.2 took 3 days.
 
 Can you pls. shed more light on how to make Hadoop's broken username 
 detection to work properly?

It's in the FAQ so that I don't have to do that.

http://wiki.apache.org/hadoop/FAQ


Also, check your logs.  All your logs.  Not just the namenode log.

Re: How to copy over using dfs

2011-05-27 Thread Mark question
I don't think so, becauseI read somewhere that this is to insure the safety
of the produced data. Hence Hadoop will force you to do this to know what
exactly is happening.

Mark

On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia mohitanch...@gmail.comwrote:

 If I have to overwrite a file I generally use

 hadoop dfs -rm file
 hadoop dfs -copyFromLocal or -put file

 Is there a command to overwrite/replace the file instead of doing rm first?



Re: web site doc link broken

2011-05-27 Thread Mark question
I also got the following from learn about :
Not Found

The requested URL /common/docs/stable/ was not found on this server.
--
Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at
hadoop.apache.orgPort 80


Mark


On Fri, May 27, 2011 at 8:03 AM, Harsh J ha...@cloudera.com wrote:

 Am not sure if someone's already fixed this, but I head to the first
 link and click Learn About, and it gets redirected to the current/
 just fine. There's only one such link on the page as well.

 On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote:
  Th Hadoop Common home page:
  http://hadoop.apache.org/common/
  has a broken link (Learn About) to the docs. It tries to use:
  http://hadoop.apache.org/common/docs/stable/
  which doesn't exist (404). It should probably be:
  http://hadoop.apache.org/common/docs/current/
  Or, someone has deleted the stable docs, which I can't help you with. :-)
  Thanks.
 



 --
 Harsh J



Re: web site doc link broken

2011-05-27 Thread Harsh J
Alright, I see its stable/ now. Weird, is my cache playing with me?

On Sat, May 28, 2011 at 5:08 AM, Mark question markq2...@gmail.com wrote:
 I also got the following from learn about :
 Not Found

 The requested URL /common/docs/stable/ was not found on this server.
 --
 Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at
 hadoop.apache.orgPort 80


 Mark


 On Fri, May 27, 2011 at 8:03 AM, Harsh J ha...@cloudera.com wrote:

 Am not sure if someone's already fixed this, but I head to the first
 link and click Learn About, and it gets redirected to the current/
 just fine. There's only one such link on the page as well.

 On Fri, May 27, 2011 at 3:42 AM, Lee Fisher blib...@gmail.com wrote:
  Th Hadoop Common home page:
  http://hadoop.apache.org/common/
  has a broken link (Learn About) to the docs. It tries to use:
  http://hadoop.apache.org/common/docs/stable/
  which doesn't exist (404). It should probably be:
  http://hadoop.apache.org/common/docs/current/
  Or, someone has deleted the stable docs, which I can't help you with. :-)
  Thanks.
 



 --
 Harsh J





-- 
Harsh J


Re: How to copy over using dfs

2011-05-27 Thread Harsh J
Mohit,

On Sat, May 28, 2011 at 12:58 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 If I have to overwrite a file I generally use

 hadoop dfs -rm file
 hadoop dfs -copyFromLocal or -put file

 Is there a command to overwrite/replace the file instead of doing rm first?


There's no command available right now to do this (best to write a
wrapper script acting as a command, or a custom shell utility
program?)

That said, https://issues.apache.org/jira/browse/HDFS-1608 covers the
addition of -f/-overwrite feature and it may be available in future
releases (possibly 0.23+).

-- 
Harsh J