date:20111130

[help]how to stop HDFS

2011-11-30 Thread hailong . yang1115

Actually I started to play with the latest release 0.23.0 on two nodes 
yesterday. And it was easy to start the hdfs. However it took me a while to 
configure the yarn. I set the variables HADOOP_COMMON_HOME to where you 
extracted the tarball and HADOOP_HDFS_HOME to the local dir where I pointed the 
hdfs to. After that I could bring up yarn and run the benchmark. But I am 
facing a problem that I could not see the jobs in the UI. And also when I 
started the historyserver, I got the following error.

11/11/30 20:53:19 FATAL hs.JobHistoryServer: Error starting JobHistoryServer
java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.hadoop.fs.Hdfs
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1179)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:142)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:315)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:313)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:313)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:426)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistory.init(JobHistory.java:183)
at 
org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.init(JobHistoryServer.java:62)
at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:77)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1125)
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1177)
... 14 more
Any clue?

Hailong




***
* Hailong Yang, PhD. Candidate 
* Sino-German Joint Software Institute, 
* School of Computer ScienceEngineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1...@gmail.com
* Address: G413, New Main Building in Beihang University, 
*  No.37 XueYuan Road,HaiDian District, 
*  Beijing,P.R.China,100191
***

发件人： cat fa
发送时间： 2011-11-30 10:28
收件人： common-user
主题： Re: Re: [help]how to stop HDFS
In fact it's me to say sorry. I used the word install which was misleading.

In fact I downloaded a tar file and extracted it to ／usr／bin／hadoop

Could you please tell me where to point those variables？

2011/11/30, Prashant Sharma prashant.ii...@gmail.com:
 I am sorry, I had no idea you have done a rpm install, my suggestion was
 based on the assumption that you have done a tar extract install where all
 three distribution have to extracted and then export variables.
 Also I have no experience with rpm based installs - so no comments about
 what went wrong in your case.

 Basically from the error i can say that it is not able to find the jars
 needed  on classpath which is referred by scripts through
 HADOOP_COMMON_HOME. I would say check with the access permission as in
 which user was it installed with and which user is it running with ?

 On Tue, Nov 29, 2011 at 10:48 PM, cat fa boost.subscrib...@gmail.comwrote:

 Thank you for your help, but I'm still a little confused.
 Suppose I installed hadoop in /usr/bin/hadoop/ .Should I
 point HADOOP_COMMON_HOME to /usr/bin/hadoop ? Where should I
 point HADOOP_HDFS_HOME? Also to /usr/bin/hadoop/ ?

 2011/11/30 Prashant Sharma prashant.ii...@gmail.com

  I mean, you have to export the variables
 
  export HADOOP_CONF_DIR=/path/to/your/configdirectory.
 
  also export HADOOP_HDFS_HOME ,HADOOP_COMMON_HOME. before your run your
  command. I suppose this should fix the problem.
  -P
 
  On Tue, Nov 29, 2011 at 6:23 PM, cat fa boost.subscrib...@gmail.com
  wrote:
 
   it didn't work. It gave me the Usage information.
  
   2011/11/29 hailong.yang1115 hailong.yang1...@gmail.com
  
Try $HADOOP_PREFIX_HOME/bin/hdfs namenode stop --config
  $HADOOP_CONF_DIR
and

mapreduce matrix multiplication on hadoop

2011-11-30 Thread ChWaqas


Hi I am trying to run the matrix multiplication example mentioned(with source
code) on the following link:

http://www.norstad.org/matrix-multiply/index.html

I have hadoop setup in pseudodistributed mode and I configured it using this
tutorial:

http://hadoop-tutorial.blogspot.com/2010/11/running-hadoop-in-pseudo-distributed.html?showComment=1321528406255#c3661776111033973764

When I run my jar file then I get the following error:

Identity test
11/11/30 10:37:34 INFO input.FileInputFormat: Total input paths to process :
2
11/11/30 10:37:34 INFO mapred.JobClient: Running job: job_20291041_0010
11/11/30 10:37:35 INFO mapred.JobClient:  map 0% reduce 0%
11/11/30 10:37:44 INFO mapred.JobClient:  map 100% reduce 0%
11/11/30 10:37:56 INFO mapred.JobClient:  map 100% reduce 100%
11/11/30 10:37:58 INFO mapred.JobClient: Job complete: job_20291041_0010
11/11/30 10:37:58 INFO mapred.JobClient: Counters: 17
11/11/30 10:37:58 INFO mapred.JobClient:   Job Counters
11/11/30 10:37:58 INFO mapred.JobClient: Launched reduce tasks=1
11/11/30 10:37:58 INFO mapred.JobClient: Launched map tasks=2
11/11/30 10:37:58 INFO mapred.JobClient: Data-local map tasks=2
11/11/30 10:37:58 INFO mapred.JobClient:   FileSystemCounters
11/11/30 10:37:58 INFO mapred.JobClient: FILE_BYTES_READ=114
11/11/30 10:37:58 INFO mapred.JobClient: HDFS_BYTES_READ=248
11/11/30 10:37:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=298
11/11/30 10:37:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=124
11/11/30 10:37:58 INFO mapred.JobClient:   Map-Reduce Framework
11/11/30 10:37:58 INFO mapred.JobClient: Reduce input groups=2
11/11/30 10:37:58 INFO mapred.JobClient: Combine output records=0
11/11/30 10:37:58 INFO mapred.JobClient: Map input records=4
11/11/30 10:37:58 INFO mapred.JobClient: Reduce shuffle bytes=60
11/11/30 10:37:58 INFO mapred.JobClient: Reduce output records=2
11/11/30 10:37:58 INFO mapred.JobClient: Spilled Records=8
11/11/30 10:37:58 INFO mapred.JobClient: Map output bytes=100
11/11/30 10:37:58 INFO mapred.JobClient: Combine input records=0
11/11/30 10:37:58 INFO mapred.JobClient: Map output records=4
11/11/30 10:37:58 INFO mapred.JobClient: Reduce input records=4
11/11/30 10:37:58 INFO input.FileInputFormat: Total input paths to process :
1
11/11/30 10:37:59 INFO mapred.JobClient: Running job: job_20291041_0011
11/11/30 10:38:00 INFO mapred.JobClient:  map 0% reduce 0%
11/11/30 10:38:09 INFO mapred.JobClient:  map 100% reduce 0%
11/11/30 10:38:21 INFO mapred.JobClient:  map 100% reduce 100%
11/11/30 10:38:23 INFO mapred.JobClient: Job complete: job_20291041_0011
11/11/30 10:38:23 INFO mapred.JobClient: Counters: 17
11/11/30 10:38:23 INFO mapred.JobClient:   Job Counters
11/11/30 10:38:23 INFO mapred.JobClient: Launched reduce tasks=1
11/11/30 10:38:23 INFO mapred.JobClient: Launched map tasks=1
11/11/30 10:38:23 INFO mapred.JobClient: Data-local map tasks=1
11/11/30 10:38:23 INFO mapred.JobClient:   FileSystemCounters
11/11/30 10:38:23 INFO mapred.JobClient: FILE_BYTES_READ=34
11/11/30 10:38:23 INFO mapred.JobClient: HDFS_BYTES_READ=124
11/11/30 10:38:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=100
11/11/30 10:38:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=124
11/11/30 10:38:23 INFO mapred.JobClient:   Map-Reduce Framework
11/11/30 10:38:23 INFO mapred.JobClient: Reduce input groups=2
11/11/30 10:38:23 INFO mapred.JobClient: Combine output records=2
11/11/30 10:38:23 INFO mapred.JobClient: Map input records=2
11/11/30 10:38:23 INFO mapred.JobClient: Reduce shuffle bytes=0
11/11/30 10:38:23 INFO mapred.JobClient: Reduce output records=2
11/11/30 10:38:23 INFO mapred.JobClient: Spilled Records=4
11/11/30 10:38:23 INFO mapred.JobClient: Map output bytes=24
11/11/30 10:38:23 INFO mapred.JobClient: Combine input records=2
11/11/30 10:38:23 INFO mapred.JobClient: Map output records=2
11/11/30 10:38:23 INFO mapred.JobClient: Reduce input records=2
Exception in thread main java.io.IOException: Cannot open filename
/tmp/MatrixMultiply/out/_logs
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
va:1497)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java :1488)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
em.java:178)
at
org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1 437)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:142 4)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:141 7)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:141 2)
at TestMatrixMultiply.fillMatrix(TestMatrixMultiply.java:62)
at TestMatrixMultiply.readMatrix(TestMatrixMultiply.java:84)
at

Re: [help]how to stop HDFS

2011-11-30 Thread Steve Loughran


On 30/11/11 04:29, Nitin Khandelwal wrote:

Thanks,
I missed the sbin directory, was using the normal bin directory.
Thanks,
Nitin

On 30 November 2011 09:54, Harsh Jha...@cloudera.com  wrote:


Like I wrote earlier, its in the $HADOOP_HOME/sbin directory. Not the
regular bin/ directory.

On Wed, Nov 30, 2011 at 9:52 AM, Nitin Khandelwal
nitin.khandel...@germinait.com  wrote:

I am using Hadoop 0.23.0
There is no hadoop-daemon.sh in bin directory..



I found the 0.23 scripts to be hard to set up, and get working

https://issues.apache.org/jira/browse/HADOOP-7838
https://issues.apache.org/jira/browse/MAPREDUCE-3430
https://issues.apache.org/jira/browse/MAPREDUCE-3432

I'd like to see what Bigtop will offer in this area, as their test 
process will involve installing onto system images and walking through 
the scripts. the basic hadoop tars assume your system is well configured 
and you know how to do this -and debug problems

How Jobtrakcer stores Tasktrackers Information

2011-11-30 Thread mohmmadanis moulavi



 
 

Friends,



    I want to know, how Jobtracker stores information about tasktracker 
 their tasks?    
    is it stored in memory or is it stored in file?
    If anyone knows it, please let me know it

 





Thanks  Regards,

Mohmmadanis Moulavi

Student,
MTech (Computer Sci.  Engg.)
Walchand college of Engg. Sangli (M.S.) India

Re: How Jobtrakcer stores Tasktrackers Information

2011-11-30 Thread Nan Zhu

I'm not sure what's the exact meaning of the tasktracker information you
mentioned

there is a TaskTrackerStatus class, and when the system runs, tt jt
transmits serialized objects of this class which contains some information
through heartbeat

and there is a hashmapstring, TaskTrackerStatus in JobTracker,

Best,

Nan

On Wed, Nov 30, 2011 at 9:26 PM, mohmmadanis moulavi 
anis_moul...@yahoo.co.in wrote:






 Friends,



 I want to know, how Jobtracker stores information about
 tasktracker  their tasks?
 is it stored in memory or is it stored in file?
 If anyone knows it, please let me know it







 Thanks  Regards,

 Mohmmadanis Moulavi

 Student,
 MTech (Computer Sci.  Engg.)
 Walchand college of Engg. Sangli (M.S.) India




-- 
Nan Zhu
School of Electronic, Information and Electrical Engineering,229
Shanghai Jiao Tong University
800,Dongchuan Road,Shanghai,China
E-Mail: zhunans...@gmail.com

Re: Re: [help]how to stop HDFS

2011-11-30 Thread cat fa

Thank you for your help.
I can use /sbin/hadoop-daemon.sh {start|stop} {service} script to start a
namenode, but I can't start a resourcemanager.

2011/11/30 Harsh J ha...@cloudera.com

I simply use the /sbin/hadoop-daemon.sh {start|stop} {service} script
to control daemons at my end.

Does this not work for you? Or perhaps this thread is more about
documenting that?

2011/11/30 Nitin Khandelwal nitin.khandel...@germinait.com:
Hi,

Even i am facing the same problem. There may be some issue with script .
The doc says to start namenode type :
bin/hdfs namenode start

But start is not recognized. There is a hack to start namenode with
command bin/hdfs namenode , but no idea how to stop.
If it had been a issue with config , the later also should not have
worked.

Thanks,
Nitin

2011/11/30 cat fa boost.subscrib...@gmail.com

In fact it's me to say sorry. I used the word install which was
misleading.

In fact I downloaded a tar file and extracted it to ／usr／bin／hadoop

Could you please tell me where to point those variables？

2011/11/30, Prashant Sharma prashant.ii...@gmail.com:
I am sorry, I had no idea you have done a rpm install, my suggestion
was
based on the assumption that you have done a tar extract install where
all
three distribution have to extracted and then export variables.
Also I have no experience with rpm based installs - so no comments
about
what went wrong in your case.

Basically from the error i can say that it is not able to find the
jars
needed on classpath which is referred by scripts through
HADOOP_COMMON_HOME. I would say check with the access permission as in
which user was it installed with and which user is it running with ?

On Tue, Nov 29, 2011 at 10:48 PM, cat fa boost.subscrib...@gmail.com
wrote:

Thank you for your help, but I'm still a little confused.
Suppose I installed hadoop in /usr/bin/hadoop/ .Should I
point HADOOP_COMMON_HOME to /usr/bin/hadoop ? Where should I
point HADOOP_HDFS_HOME? Also to /usr/bin/hadoop/ ?

2011/11/30 Prashant Sharma prashant.ii...@gmail.com

I mean, you have to export the variables

export HADOOP_CONF_DIR=/path/to/your/configdirectory.

also export HADOOP_HDFS_HOME ,HADOOP_COMMON_HOME. before your run
your
command. I suppose this should fix the problem.
-P

On Tue, Nov 29, 2011 at 6:23 PM, cat fa
boost.subscrib...@gmail.com
wrote:

it didn't work. It gave me the Usage information.

2011/11/29 hailong.yang1115 hailong.yang1...@gmail.com

Try $HADOOP_PREFIX_HOME/bin/hdfs namenode stop --config
$HADOOP_CONF_DIR
and $HADOOP_PREFIX_HOME/bin/hdfs datanode stop --config
$HADOOP_CONF_DIR.
It would stop namenode and datanode separately.
The HADOOP_CONF_DIR is the directory where you store your
configuration
files.
Hailong

***
* Hailong Yang, PhD. Candidate
* Sino-German Joint Software Institute,
* School of Computer ScienceEngineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1...@gmail.com
* Address: G413, New Main Building in Beihang University,
* No.37 XueYuan Road,HaiDian District,
* Beijing,P.R.China,100191
***

From: cat fa
Date: 2011-11-29 20:22
To: common-user
Subject: Re: [help]how to stop HDFS
use $HADOOP_CONF or $HADOOP_CONF_DIR ? I'm using hadoop 0.23.

you mean which class? the class of hadoop or of java?

2011/11/29 Prashant Sharma prashant.ii...@gmail.com

Try making $HADOOP_CONF point to right classpath including
your
configuration folder.

On Tue, Nov 29, 2011 at 3:58 PM, cat fa
boost.subscrib...@gmail.com

wrote:

I used the command :

$HADOOP_PREFIX_HOME/bin/hdfs start namenode --config
$HADOOP_CONF_DIR

to sart HDFS.

This command is in Hadoop document (here

http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
)

However, I got errors as

Exception in thread main
java.lang.NoClassDefFoundError:start

Could anyone tell me how to start and stop HDFS?

By the way, how to set Gmail so that it doesn't top post my
reply?

Nitin Khandelwal

--
Harsh J

Re: mapreduce matrix multiplication on hadoop

2011-11-30 Thread J. Rottinghuis

The error is that you cannot open /tmp/MatrixMultiply/out/_logs
Does the directory exist?
Do you have proper access rights set?

Joep

On Wed, Nov 30, 2011 at 3:23 AM, ChWaqas waqas...@gmail.com wrote:


 Hi I am trying to run the matrix multiplication example mentioned(with
 source
 code) on the following link:

 http://www.norstad.org/matrix-multiply/index.html

 I have hadoop setup in pseudodistributed mode and I configured it using
 this
 tutorial:


 http://hadoop-tutorial.blogspot.com/2010/11/running-hadoop-in-pseudo-distributed.html?showComment=1321528406255#c3661776111033973764

 When I run my jar file then I get the following error:

 Identity test
 11/11/30 10:37:34 INFO input.FileInputFormat: Total input paths to process
 :
 2
 11/11/30 10:37:34 INFO mapred.JobClient: Running job: job_20291041_0010
 11/11/30 10:37:35 INFO mapred.JobClient:  map 0% reduce 0%
 11/11/30 10:37:44 INFO mapred.JobClient:  map 100% reduce 0%
 11/11/30 10:37:56 INFO mapred.JobClient:  map 100% reduce 100%
 11/11/30 10:37:58 INFO mapred.JobClient: Job complete:
 job_20291041_0010
 11/11/30 10:37:58 INFO mapred.JobClient: Counters: 17
 11/11/30 10:37:58 INFO mapred.JobClient:   Job Counters
 11/11/30 10:37:58 INFO mapred.JobClient: Launched reduce tasks=1
 11/11/30 10:37:58 INFO mapred.JobClient: Launched map tasks=2
 11/11/30 10:37:58 INFO mapred.JobClient: Data-local map tasks=2
 11/11/30 10:37:58 INFO mapred.JobClient:   FileSystemCounters
 11/11/30 10:37:58 INFO mapred.JobClient: FILE_BYTES_READ=114
 11/11/30 10:37:58 INFO mapred.JobClient: HDFS_BYTES_READ=248
 11/11/30 10:37:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=298
 11/11/30 10:37:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=124
 11/11/30 10:37:58 INFO mapred.JobClient:   Map-Reduce Framework
 11/11/30 10:37:58 INFO mapred.JobClient: Reduce input groups=2
 11/11/30 10:37:58 INFO mapred.JobClient: Combine output records=0
 11/11/30 10:37:58 INFO mapred.JobClient: Map input records=4
 11/11/30 10:37:58 INFO mapred.JobClient: Reduce shuffle bytes=60
 11/11/30 10:37:58 INFO mapred.JobClient: Reduce output records=2
 11/11/30 10:37:58 INFO mapred.JobClient: Spilled Records=8
 11/11/30 10:37:58 INFO mapred.JobClient: Map output bytes=100
 11/11/30 10:37:58 INFO mapred.JobClient: Combine input records=0
 11/11/30 10:37:58 INFO mapred.JobClient: Map output records=4
 11/11/30 10:37:58 INFO mapred.JobClient: Reduce input records=4
 11/11/30 10:37:58 INFO input.FileInputFormat: Total input paths to process
 :
 1
 11/11/30 10:37:59 INFO mapred.JobClient: Running job: job_20291041_0011
 11/11/30 10:38:00 INFO mapred.JobClient:  map 0% reduce 0%
 11/11/30 10:38:09 INFO mapred.JobClient:  map 100% reduce 0%
 11/11/30 10:38:21 INFO mapred.JobClient:  map 100% reduce 100%
 11/11/30 10:38:23 INFO mapred.JobClient: Job complete:
 job_20291041_0011
 11/11/30 10:38:23 INFO mapred.JobClient: Counters: 17
 11/11/30 10:38:23 INFO mapred.JobClient:   Job Counters
 11/11/30 10:38:23 INFO mapred.JobClient: Launched reduce tasks=1
 11/11/30 10:38:23 INFO mapred.JobClient: Launched map tasks=1
 11/11/30 10:38:23 INFO mapred.JobClient: Data-local map tasks=1
 11/11/30 10:38:23 INFO mapred.JobClient:   FileSystemCounters
 11/11/30 10:38:23 INFO mapred.JobClient: FILE_BYTES_READ=34
 11/11/30 10:38:23 INFO mapred.JobClient: HDFS_BYTES_READ=124
 11/11/30 10:38:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=100
 11/11/30 10:38:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=124
 11/11/30 10:38:23 INFO mapred.JobClient:   Map-Reduce Framework
 11/11/30 10:38:23 INFO mapred.JobClient: Reduce input groups=2
 11/11/30 10:38:23 INFO mapred.JobClient: Combine output records=2
 11/11/30 10:38:23 INFO mapred.JobClient: Map input records=2
 11/11/30 10:38:23 INFO mapred.JobClient: Reduce shuffle bytes=0
 11/11/30 10:38:23 INFO mapred.JobClient: Reduce output records=2
 11/11/30 10:38:23 INFO mapred.JobClient: Spilled Records=4
 11/11/30 10:38:23 INFO mapred.JobClient: Map output bytes=24
 11/11/30 10:38:23 INFO mapred.JobClient: Combine input records=2
 11/11/30 10:38:23 INFO mapred.JobClient: Map output records=2
 11/11/30 10:38:23 INFO mapred.JobClient: Reduce input records=2
 Exception in thread main java.io.IOException: Cannot open filename
 /tmp/MatrixMultiply/out/_logs
at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
 va:1497)
at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.init(DFSClient.java
 :1488)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:376)
at
 org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
 em.java:178)
at
 org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1 437)
at
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:142 4)
at

Re: [help]how to stop HDFS

2011-11-30 Thread cat fa

It seems the ClassNotFoundException exception is the most common problem.
Try point HADOOP_COMMON_HOME to HADOOP_HOME/share/hadoop/common.

In my computer it's /usr/bin/hadoop/share/hadoop/common

在 2011年11月30日 下午6:50，hailong.yang1115 hailong.yang1...@gmail.com写道：

 Actually I started to play with the latest release 0.23.0 on two nodes
 yesterday. And it was easy to start the hdfs. However it took me a while to
 configure the yarn. I set the variables HADOOP_COMMON_HOME to where you
 extracted the tarball and HADOOP_HDFS_HOME to the local dir where I pointed
 the hdfs to. After that I could bring up yarn and run the benchmark. But I
 am facing a problem that I could not see the jobs in the UI. And also when
 I started the historyserver, I got the following error.

 11/11/30 20:53:19 FATAL hs.JobHistoryServer: Error starting
 JobHistoryServer
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.apache.hadoop.fs.Hdfs
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1179)
at
 org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:142)
at
 org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:315)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:313)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at
 org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:313)
at
 org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:426)
at
 org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:448)
at
 org.apache.hadoop.mapreduce.v2.hs.JobHistory.init(JobHistory.java:183)
at
 org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.init(JobHistoryServer.java:62)
at
 org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:77)
 Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1125)
at
 org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1177)
... 14 more
 Any clue?

 Hailong




 ***
 * Hailong Yang, PhD. Candidate
 * Sino-German Joint Software Institute,
 * School of Computer ScienceEngineering, Beihang University
 * Phone: (86-010)82315908
 * Email: hailong.yang1...@gmail.com
 * Address: G413, New Main Building in Beihang University,
 *  No.37 XueYuan Road,HaiDian District,
 *  Beijing,P.R.China,100191
 ***

 发件人： cat fa
 发送时间： 2011-11-30 10:28
 收件人： common-user
 主题： Re: Re: [help]how to stop HDFS
 In fact it's me to say sorry. I used the word install which was
 misleading.

 In fact I downloaded a tar file and extracted it to ／usr／bin／hadoop

 Could you please tell me where to point those variables？

 2011/11/30, Prashant Sharma prashant.ii...@gmail.com:
  I am sorry, I had no idea you have done a rpm install, my suggestion was
  based on the assumption that you have done a tar extract install where
 all
  three distribution have to extracted and then export variables.
  Also I have no experience with rpm based installs - so no comments about
  what went wrong in your case.
 
  Basically from the error i can say that it is not able to find the jars
  needed  on classpath which is referred by scripts through
  HADOOP_COMMON_HOME. I would say check with the access permission as in
  which user was it installed with and which user is it running with ?
 
  On Tue, Nov 29, 2011 at 10:48 PM, cat fa boost.subscrib...@gmail.com
 wrote:
 
  Thank you for your help, but I'm still a little confused.
  Suppose I installed hadoop in /usr/bin/hadoop/ .Should I
  point HADOOP_COMMON_HOME to /usr/bin/hadoop ? Where should I
  point HADOOP_HDFS_HOME? Also to /usr/bin/hadoop/ ?
 
  2011/11/30 Prashant Sharma prashant.ii...@gmail.com
 
   I mean, you have to export the variables
  
   export HADOOP_CONF_DIR=/path/to/your/configdirectory.
  
   also export HADOOP_HDFS_HOME ,HADOOP_COMMON_HOME. before your run your
   command. I suppose this should fix the

HDFS Explained as Comics

2011-11-30 Thread maneesh varshney

For your reading pleasure!

PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1


Appreciate if you can spare some time to peruse this little experiment of
mine to use Comics as a medium to explain computer science topics. This
particular issue explains the protocols and internals of HDFS.

I am eager to hear your opinions on the usefulness of this visual medium to
teach complex protocols and algorithms.

[My personal motivations: I have always found text descriptions to be too
verbose as lot of effort is spent putting the concepts in proper time-space
context (which can be easily avoided in a visual medium); sequence diagrams
are unwieldy for non-trivial protocols, and they do not explain concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]

All forms of criticisms, comments (and encouragements) welcome :)

Thanks
Maneesh

Re: HDFS Explained as Comics

2011-11-30 Thread Dejan Menges

Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and comments are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.comwrote:

 For your reading pleasure!

 PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):

 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1


 Appreciate if you can spare some time to peruse this little experiment of
 mine to use Comics as a medium to explain computer science topics. This
 particular issue explains the protocols and internals of HDFS.

 I am eager to hear your opinions on the usefulness of this visual medium to
 teach complex protocols and algorithms.

 [My personal motivations: I have always found text descriptions to be too
 verbose as lot of effort is spent putting the concepts in proper time-space
 context (which can be easily avoided in a visual medium); sequence diagrams
 are unwieldy for non-trivial protocols, and they do not explain concepts;
 and finally, animations/videos happen too fast and do not offer
 self-paced learning experience.]

 All forms of criticisms, comments (and encouragements) welcome :)

 Thanks
 Maneesh

Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi

Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these (set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.comwrote:

 Hi Maneesh,

 Thanks a lot for this! Just distributed it over the team and comments are
 great :)

 Best regards,
 Dejan

 On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
 wrote:

  For your reading pleasure!
 
  PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
 
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
 
 
  Appreciate if you can spare some time to peruse this little experiment of
  mine to use Comics as a medium to explain computer science topics. This
  particular issue explains the protocols and internals of HDFS.
 
  I am eager to hear your opinions on the usefulness of this visual medium
 to
  teach complex protocols and algorithms.
 
  [My personal motivations: I have always found text descriptions to be too
  verbose as lot of effort is spent putting the concepts in proper
 time-space
  context (which can be easily avoided in a visual medium); sequence
 diagrams
  are unwieldy for non-trivial protocols, and they do not explain concepts;
  and finally, animations/videos happen too fast and do not offer
  self-paced learning experience.]
 
  All forms of criticisms, comments (and encouragements) welcome :)
 
  Thanks
  Maneesh

Re: HDFS Explained as Comics

2011-11-30 Thread maneesh varshney

Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

My understanding is that the client considers the following chain for the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.comwrote:

 Thanks Maneesh.

 Quick question, does a client really need to know Block size and
 replication factor - A lot of times client has no control over these (set
 at cluster level)

 -Prashant Kommireddi

 On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi Maneesh,
 
  Thanks a lot for this! Just distributed it over the team and comments are
  great :)
 
  Best regards,
  Dejan
 
  On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
  wrote:
 
   For your reading pleasure!
  
   PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
  
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
  
  
   Appreciate if you can spare some time to peruse this little experiment
 of
   mine to use Comics as a medium to explain computer science topics. This
   particular issue explains the protocols and internals of HDFS.
  
   I am eager to hear your opinions on the usefulness of this visual
 medium
  to
   teach complex protocols and algorithms.
  
   [My personal motivations: I have always found text descriptions to be
 too
   verbose as lot of effort is spent putting the concepts in proper
  time-space
   context (which can be easily avoided in a visual medium); sequence
  diagrams
   are unwieldy for non-trivial protocols, and they do not explain
 concepts;
   and finally, animations/videos happen too fast and do not offer
   self-paced learning experience.]
  
   All forms of criticisms, comments (and encouragements) welcome :)
  
   Thanks
   Maneesh

Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi

Sure, its just a case of how readers interpret it.

1. Client is required to specify block size and replication factor each
time
2. Client does not need to worry about it since an admin has set the
properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a WordCount in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.comwrote:

Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

My understanding is that the client considers the following chain for the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.com
wrote:

Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these (set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
wrote:

Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and comments
are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
wrote:

For your reading pleasure!

PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
attachments):

https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1

Appreciate if you can spare some time to peruse this little
experiment
of
mine to use Comics as a medium to explain computer science topics.
This
particular issue explains the protocols and internals of HDFS.

I am eager to hear your opinions on the usefulness of this visual
medium
to
teach complex protocols and algorithms.

[My personal motivations: I have always found text descriptions to be
too
verbose as lot of effort is spent putting the concepts in proper
time-space
context (which can be easily avoided in a visual medium); sequence
diagrams
are unwieldy for non-trivial protocols, and they do not explain
concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]

All forms of criticisms, comments (and encouragements) welcome :)

Thanks
Maneesh

RE: HDFS Explained as Comics

2011-11-30 Thread GOEKE, MATTHEW (AG/1000)

Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point. While one 
code path could take us through the user defining command line overrides (e.g. 
hadoop fs -D blah -put foo bar) I think it might confuse a person new to 
Hadoop. The most common flow would be using admin determined values from 
hdfs-site and the only thing that would need to change is that conversation 
happening between client / server and not user / client.

Matt

-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com] 
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a WordCount in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.comwrote:

 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
 and replication factor. In the source code, I see the following in the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following chain for the
 values:
 1. Manual values (the long form constructor; when a user provides these
 values)
 2. Configuration file values (these are cluster level defaults:
 dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of these values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does a client really need to know Block size and
  replication factor - A lot of times client has no control over these (set
  at cluster level)
 
  -Prashant Kommireddi
 
  On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
  wrote:
 
   Hi Maneesh,
  
   Thanks a lot for this! Just distributed it over the team and comments
 are
   great :)
  
   Best regards,
   Dejan
  
   On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
   wrote:
  
For your reading pleasure!
   
PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
 attachments):
   
   
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
   
   
Appreciate if you can spare some time to peruse this little
 experiment
  of
mine to use Comics as a medium to explain computer science topics.
 This
particular issue explains the protocols and internals of HDFS.
   
I am eager to hear your opinions on the usefulness of this visual
  medium
   to
teach complex protocols and algorithms.
   
[My personal motivations: I have always found text descriptions to be
  too
verbose as lot of effort is spent putting the concepts in proper
   time-space
context (which can be easily avoided in a visual medium); sequence
   diagrams
are unwieldy for non-trivial protocols, and they do not explain
  concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]
   
All forms of criticisms, comments (and encouragements) welcome :)
   
Thanks
Maneesh
   
  
 

This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of Viruses or other Malware.
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this

Re: HDFS Explained as Comics

2011-11-30 Thread Abhishek Pratap Singh

Hi,

This is indeed a good way to explain, most of the improvement has already
been discussed. waiting for sequel of this comic.

Regards,
Abhishek

On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney mvarsh...@gmail.comwrote:

Hi Matthew

I agree with both you and Prashant. The strip needs to be modified to
explain that these can be default values that can be optionally overridden
(which I will fix in the next iteration).

However, from the 'understanding concepts of HDFS' point of view, I still
think that block size and replication factors are the real strengths of
HDFS, and the learners must be exposed to them so that they get to see how
hdfs is significantly different from conventional file systems.

On personal note: thanks for the first part of your message :)

-Maneesh

On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000)
matthew.go...@monsanto.com wrote:

Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point.
While
one code path could take us through the user defining command line
overrides (e.g. hadoop fs -D blah -put foo bar) I think it might confuse
a
person new to Hadoop. The most common flow would be using admin
determined
values from hdfs-site and the only thing that would need to change is
that
conversation happening between client / server and not user / client.

Matt

-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com]
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

1. Client is required to specify block size and replication factor each
time
2. Client does not need to worry about it since an admin has set the
properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner
who
wants to write a WordCount in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.com
wrote:

Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block
size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size,
DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

My understanding is that the client considers the following chain for
the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
API
to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these
values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi
prash1...@gmail.com
wrote:

Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these
(set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges
dejan.men...@gmail.com
wrote:

Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and
comments
are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney
mvarsh...@gmail.com
wrote:

For your reading pleasure!

PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
attachments):

https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1

Appreciate if you can spare some time to peruse this little
experiment
of
mine to use Comics as a medium to explain computer science
topics.
This
particular issue explains the protocols and internals of HDFS.

I am eager to hear your opinions on the usefulness of this visual
medium
to
teach complex protocols and algorithms.

[My personal motivations: I have always found text descriptions
to
be
too
verbose as lot of effort is spent putting the concepts in proper
time-space
context (which can be easily avoided in a visual

Re: HDFS Explained as Comics

2011-11-30 Thread Alexander C.H. Lorenz

Hi all,

very cool comic!

Thanks,
Alex

On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh manu.i...@gmail.com
wrote:

Hi,

This is indeed a good way to explain, most of the improvement has already
been discussed. waiting for sequel of this comic.

Regards,
Abhishek

On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney mvarsh...@gmail.com
wrote:

Hi Matthew

I agree with both you and Prashant. The strip needs to be modified to
explain that these can be default values that can be optionally
overridden
(which I will fix in the next iteration).

However, from the 'understanding concepts of HDFS' point of view, I still
think that block size and replication factors are the real strengths of
HDFS, and the learners must be exposed to them so that they get to see
how
hdfs is significantly different from conventional file systems.

On personal note: thanks for the first part of your message :)

-Maneesh

On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000)
matthew.go...@monsanto.com wrote:

Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point.
While
one code path could take us through the user defining command line
overrides (e.g. hadoop fs -D blah -put foo bar) I think it might
confuse
a
person new to Hadoop. The most common flow would be using admin
determined
values from hdfs-site and the only thing that would need to change is
that
conversation happening between client / server and not user / client.

Matt

-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com]
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

1. Client is required to specify block size and replication factor
each
time
2. Client does not need to worry about it since an admin has set the
properties in default configuration files

A client could not be allowed to override the default configs if they
are
set final (well there are ways to go around it as well as you suggest
by
using create() :)

The information is great and helpful. Just want to make sure a beginner
who
wants to write a WordCount in Mapreduce does not worry about
specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.com
wrote:

Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of
block
size
and replication factor. In the source code, I see the following in
the
DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size,
DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

My understanding is that the client considers the following chain for
the
values:
1. Manual values (the long form constructor; when a user provides
these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
API
to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these
values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi
prash1...@gmail.com
wrote:

Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over
these
(set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges
dejan.men...@gmail.com
wrote:

Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and
comments
are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney
mvarsh...@gmail.com
wrote:

For your reading pleasure!

PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
attachments):

https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1

Appreciate if you can spare some time to peruse this little
experiment
of
mine to use Comics as a medium to explain computer science
topics.
This
particular issue explains the protocols and internals of HDFS.

I am eager to hear your opinions on the usefulness of this
visual

[help]how to stop HDFS

mapreduce matrix multiplication on hadoop

Re: [help]how to stop HDFS

How Jobtrakcer stores Tasktrackers Information

Re: How Jobtrakcer stores Tasktrackers Information

Re: Re: [help]how to stop HDFS

Re: mapreduce matrix multiplication on hadoop

Re: [help]how to stop HDFS

HDFS Explained as Comics

Re: HDFS Explained as Comics

Re: HDFS Explained as Comics

Re: HDFS Explained as Comics

Re: HDFS Explained as Comics

RE: HDFS Explained as Comics

Re: HDFS Explained as Comics

Re: HDFS Explained as Comics

16 matches

Site Navigation

Mail list logo

Footer information