Thanks Ted!

Ruhua
On Jan 16, 2015, at 4:08 PM, Ted Yu <[email protected]> wrote:

> Have you looked at:
> http://sourceforge.net/p/myhadoop/mailman/?source=navbar
> 
> Cheers
> 
> On Fri, Jan 16, 2015 at 12:55 PM, Ruhua Jiang <[email protected]> wrote:
> Hello 
> 
> I am quite new to Hadoop. I am trying to run Hadoop on top of a HPC 
> infrastructure using a solution called “myHadoop”. Basically what it does is 
> trying  to allocate some nodes from HPC dynamically and run Hadoop(Use one as 
> NameNode, others as DataNode ). If anybody familiar with it that would be 
> perfect, but I think my problem is mostly the Hadoop part. 
> I am using Hadoop 1.2.1 do to the limited support of myHadoop.
> 
> Here is the log:
> ===
> myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1
> myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4128
> myHadoop: Using JAVA_HOME=/usr
> myHadoop: Generating Hadoop configuration in directory in 
> /home/hpc-ruhua/hadoop/conf/hadoop-conf.4128...
> myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS 
> state...
> myHadoop: Designating cn53 as master node (namenode, secondary namenode, and 
> jobtracker)
> myHadoop: The following nodes will be slaves (datanode, tasktracer):
> cn53
> cn54
> cn55
> cn56
> Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4128/hdfs_data on cn53
> Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4128/hdfs_data on cn54
> Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4128/hdfs_data on cn55
> Warning: Permanently added 'cn55,192.168.100.55' (RSA) to the list of known 
> hosts.
> Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4128/hdfs_data on cn56
> Warning: Permanently added 'cn56,192.168.100.56' (RSA) to the list of known 
> hosts.
> starting namenode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-namenode-cn53.out
> cn53: starting datanode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn53.out
> cn54: starting datanode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn54.out
> cn55: starting datanode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn55.out
> cn56: starting datanode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-datanode-cn56.out
> cn53: starting secondarynamenode, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-secondarynamenode-cn53.out
> starting jobtracker, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-jobtracker-cn53.out
> cn53: starting tasktracker, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn53.out
> cn56: starting tasktracker, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn56.out
> cn55: starting tasktracker, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn55.out
> cn54: starting tasktracker, logging to 
> /tmp/hpc-ruhua/4128/logs/hadoop-hpc-ruhua-tasktracker-cn54.out
> mkdir: cannot create directory data: File exists
> put: Target data/pg2701.txt already exists
> Found 1 items
> -rw-r--r--   3 hpc-ruhua supergroup          0 2015-01-07 00:09 
> /user/hpc-ruhua/data/pg2701.txt
> 15/01/14 12:21:08 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:hpc-ruhua 
> cause:org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is 
> not yet RUNNING
>       at 
> org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199)
>       at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
> 
> org.apache.hadoop.ipc.RemoteException: 
> org.apache.hadoop.mapred.JobTrackerNotYetInitializedException: JobTracker is 
> not yet RUNNING
>       at 
> org.apache.hadoop.mapred.JobTracker.checkJobTrackerState(JobTracker.java:5199)
>       at org.apache.hadoop.mapred.JobTracker.getNewJobId(JobTracker.java:3543)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1432)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1428)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1426)
> 
>       at org.apache.hadoop.ipc.Client.call(Client.java:1113)
>       at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>       at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>       at org.apache.hadoop.mapred.$Proxy2.getNewJobId(Unknown Source)
>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944)
>       at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>       at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
>       at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
>       at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
>       at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>       at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> ls: Cannot access wordcount-output: No such file or directory.
> get: null
> stopping jobtracker
> cn54: stopping tasktracker
> cn55: stopping tasktracker
> cn53: stopping tasktracker
> cn56: stopping tasktracker
> stopping namenode
> cn53: no datanode to stop
> cn54: no datanode to stop
> cn56: no datanode to stop
> cn55: no datanode to stop
> ===
> The erro is “ERROR security.UserGroupInformation: PriviledgedActionException 
> as:hpc-ruhua cause:org.apache.hadoop.ipc.RemoteException: “, anybody has an 
> idea of what might be the problem? 
> That’s the result of using “$HADOOP_HOME/bin/start-all.sh”
> 
> I tried to split the start phase to “
> $HADOOP_HOME/bin/hadoop namenode
> $HADOOP_HOME/bin/hadoop datanode
> “
> 
> Below is the log:
> myHadoop: Using HADOOP_HOME=/home/hpc-ruhua/hadoop-stack/hadoop-1.2.1
> myHadoop: Using MH_SCRATCH_DIR=/tmp/hpc-ruhua/4178
> myHadoop: Using JAVA_HOME=/usr
> myHadoop: Generating Hadoop configuration in directory in 
> /home/hpc-ruhua/hadoop/conf/hadoop-conf.4178...
> myHadoop: Using directory /home/hpc-ruhua/hadoop/hdfs for persisting HDFS 
> state...
> myHadoop: Designating cn53 as master node (namenode, secondary namenode, and 
> jobtracker)
> myHadoop: The following nodes will be slaves (datanode, tasktracer):
> cn53
> cn54
> cn55
> cn56
> Linking /home/hpc-ruhua/hadoop/hdfs/0 to /tmp/hpc-ruhua/4178/hdfs_data on cn53
> Linking /home/hpc-ruhua/hadoop/hdfs/1 to /tmp/hpc-ruhua/4178/hdfs_data on cn54
> Linking /home/hpc-ruhua/hadoop/hdfs/2 to /tmp/hpc-ruhua/4178/hdfs_data on cn55
> Linking /home/hpc-ruhua/hadoop/hdfs/3 to /tmp/hpc-ruhua/4178/hdfs_data on cn56
> 15/01/16 15:35:14 INFO namenode.NameNode: STARTUP_MSG: 
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = cn53/192.168.100.53
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.2.1
> STARTUP_MSG:   build = 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 
> 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
> STARTUP_MSG:   java = 1.7.0_71
> ************************************************************/
> 15/01/16 15:35:14 INFO impl.MetricsConfig: loaded properties from 
> hadoop-metrics2.properties
> 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source 
> MetricsSystem,sub=Stats registered.
> 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 
> 10 second(s).
> 15/01/16 15:35:14 INFO impl.MetricsSystemImpl: NameNode metrics system started
> 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source ugi 
> registered.
> 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source jvm 
> registered.
> 15/01/16 15:35:14 INFO impl.MetricsSourceAdapter: MBean for source NameNode 
> registered.
> 15/01/16 15:35:14 INFO util.GSet: Computing capacity for map BlocksMap
> 15/01/16 15:35:14 INFO util.GSet: VM type       = 64-bit
> 15/01/16 15:35:14 INFO util.GSet: 2.0% max memory = 932184064
> 15/01/16 15:35:14 INFO util.GSet: capacity      = 2^21 = 2097152 entries
> 15/01/16 15:35:14 INFO util.GSet: recommended=2097152, actual=2097152
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: fsOwner=hpc-ruhua
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: supergroup=supergroup
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: isAccessTokenEnabled=false 
> accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
> 15/01/16 15:35:15 INFO namenode.FSNamesystem: Registered 
> FSNamesystemStateMBean and NameNodeMXBean
> 15/01/16 15:35:15 INFO namenode.FSEditLog: 
> dfs.namenode.edits.toleration.length = 0
> 15/01/16 15:35:15 INFO namenode.NameNode: Caching file names occuring more 
> than 10 times 
> 15/01/16 15:35:15 INFO common.Storage: Start loading image file 
> /tmp/hpc-ruhua/4178/namenode_data/current/fsimage
> 15/01/16 15:35:15 INFO common.Storage: Number of files = 28
> 15/01/16 15:35:15 INFO common.Storage: Number of files under construction = 1
> 15/01/16 15:35:15 INFO common.Storage: Image file 
> /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 2996 bytes loaded 
> in 0 seconds.
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Start loading edits file 
> /tmp/hpc-ruhua/4178/namenode_data/current/edits
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Invalid opcode, reached end of 
> edit log Number of transactions found: 32.  Bytes read: 2579
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Start checking end of edit log 
> (/tmp/hpc-ruhua/4178/namenode_data/current/edits) ...
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Checked the bytes after the end of 
> edit log (/tmp/hpc-ruhua/4178/namenode_data/current/edits):
> 15/01/16 15:35:15 INFO namenode.FSEditLog:   Padding position  = 2579 (-1 
> means padding not found)
> 15/01/16 15:35:15 INFO namenode.FSEditLog:   Edit log length   = 1048580
> 15/01/16 15:35:15 INFO namenode.FSEditLog:   Read length       = 2579
> 15/01/16 15:35:15 INFO namenode.FSEditLog:   Corruption length = 0
> 15/01/16 15:35:15 INFO namenode.FSEditLog:   Toleration length = 0 (= 
> dfs.namenode.edits.toleration.length)
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Summary: |---------- Read=2579 
> ----------|-- Corrupt=0 --|-- Pad=1046001 --|
> 15/01/16 15:35:15 INFO namenode.FSEditLog: Edits file 
> /tmp/hpc-ruhua/4178/namenode_data/current/edits of size 1048580 edits # 32 
> loaded in 0 seconds.
> 15/01/16 15:35:15 INFO common.Storage: Image file 
> /tmp/hpc-ruhua/4178/namenode_data/current/fsimage of size 3745 bytes saved in 
> 0 seconds.
> 15/01/16 15:35:15 INFO namenode.FSEditLog: closing edit log: position=4, 
> editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits
> 15/01/16 15:35:15 INFO namenode.FSEditLog: close success: truncate to 4, 
> editlog=/tmp/hpc-ruhua/4178/namenode_data/current/edits
> 15/01/16 15:35:16 INFO namenode.NameCache: initialized with 0 entries 0 
> lookups
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Finished loading FSImage in 
> 1162 msecs
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.threshold.pct      
>     = 0.9990000128746033
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: 
> dfs.namenode.safemode.min.datanodes = 0
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: dfs.safemode.extension          
>     = 30000
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of blocks excluded by 
> safe block count: 0 total blocks: 0 and thus the safe blocks: 0
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Total number of blocks = 0
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of invalid blocks = 0
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of under-replicated 
> blocks = 0
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: Number of  over-replicated 
> blocks = 0
> 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Safe mode termination scan 
> for invalid, over- and under-replicated blocks completed in 7 msec
> 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Leaving safe mode after 1 secs
> 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* Network topology has 0 racks 
> and 0 datanodes
> 15/01/16 15:35:16 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 
> blocks
> 15/01/16 15:35:16 INFO util.HostsFileReader: Refreshing hosts 
> (include/exclude) list
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue 
> QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: ReplicateQueue 
> QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec 
> processing time, 0 msec clock time, 1 cycles
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue 
> QueueProcessingStatistics: First cycle completed 0 blocks in 0 msec
> 15/01/16 15:35:16 INFO namenode.FSNamesystem: InvalidateQueue 
> QueueProcessingStatistics: Queue flush completed 0 blocks in 0 msec 
> processing time, 0 msec clock time, 1 cycles
> 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source 
> FSNamesystemMetrics registered.
> 15/01/16 15:35:16 INFO ipc.Server: Starting SocketReader
> 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source 
> RpcDetailedActivityForPort54310 registered.
> 15/01/16 15:35:16 INFO impl.MetricsSourceAdapter: MBean for source 
> RpcActivityForPort54310 registered.
> 15/01/16 15:35:16 INFO namenode.NameNode: Namenode up at: 
> cn53/192.168.100.53:54310
> 15/01/16 15:35:16 INFO mortbay.log: Logging to 
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> org.mortbay.log.Slf4jLog
> 15/01/16 15:35:16 INFO http.HttpServer: Added global filtersafety 
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 15/01/16 15:35:16 INFO http.HttpServer: dfs.webhdfs.enabled = false
> 15/01/16 15:35:16 INFO http.HttpServer: Port returned by 
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
> listener on 50070
> 15/01/16 15:35:16 INFO http.HttpServer: listener.getLocalPort() returned 
> 50070 webServer.getConnectors()[0].getLocalPort() returned 50070
> 15/01/16 15:35:16 INFO http.HttpServer: Jetty bound to port 50070
> 15/01/16 15:35:16 INFO mortbay.log: jetty-6.1.26
> 15/01/16 15:35:16 INFO mortbay.log: Started 
> [email protected]:50070
> 15/01/16 15:35:16 INFO namenode.NameNode: Web-server up at: 0.0.0.0:50070
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server Responder: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server listener on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 0 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 1 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 2 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 3 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 4 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 5 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 6 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 8 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 7 on 54310: starting
> 15/01/16 15:35:16 INFO ipc.Server: IPC Server handler 9 on 54310: starting
> 
> ==
> I can also provide the script of running myHadoop or other system information 
> if that helps.  I have been struggling with this problem for quite long time. 
> Could anyone help? 
> 
> Best,
> Ruhua
> 
> 
> 
> 
> 

Reply via email to