Re: HDFS Explained as Comics

2011-12-01 Thread Dieter Plaetinck
Very clear.  The comic format works indeed quite well.
I never considered comics as a serious (professional) way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
Alexander C.H. Lorenz wget.n...@googlemail.com wrote:

 Hi all,
 
 very cool comic!
 
 Thanks,
  Alex
 
 On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
 manu.i...@gmail.com
  wrote:
 
  Hi,
 
  This is indeed a good way to explain, most of the improvement has
  already been discussed. waiting for sequel of this comic.
 
  Regards,
  Abhishek
 
  On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
  mvarsh...@gmail.com
  wrote:
 
   Hi Matthew
  
   I agree with both you and Prashant. The strip needs to be
   modified to explain that these can be default values that can be
   optionally
  overridden
   (which I will fix in the next iteration).
  
   However, from the 'understanding concepts of HDFS' point of view,
   I still think that block size and replication factors are the
   real strengths of HDFS, and the learners must be exposed to them
   so that they get to see
  how
   hdfs is significantly different from conventional file systems.
  
   On personal note: thanks for the first part of your message :)
  
   -Maneesh
  
  
   On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
   matthew.go...@monsanto.com wrote:
  
Maneesh,
   
Firstly, I love the comic :)
   
Secondly, I am inclined to agree with Prashant on this latest
point.
   While
one code path could take us through the user defining command
line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
might
  confuse
   a
person new to Hadoop. The most common flow would be using admin
   determined
values from hdfs-site and the only thing that would need to
change is
   that
conversation happening between client / server and not user /
client.
   
Matt
   
-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com]
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics
   
Sure, its just a case of how readers interpret it.
   
  1. Client is required to specify block size and replication
factor
  each
  time
  2. Client does not need to worry about it since an admin has
set the properties in default configuration files
   
A client could not be allowed to override the default configs
if they
  are
set final (well there are ways to go around it as well as you
suggest
  by
using create() :)
   
The information is great and helpful. Just want to make sure a
beginner
   who
wants to write a WordCount in Mapreduce does not worry about
  specifying
block size' and replication factor in his code.
   
Thanks,
Prashant
   
On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
mvarsh...@gmail.com
wrote:
   
 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
 of
  block
size
 and replication factor. In the source code, I see the
 following in
  the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size,
   DEFAULT_BLOCK_SIZE);

defaultReplication = (short)
 conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following
 chain for
   the
 values:
 1. Manual values (the long form constructor; when a user
 provides
  these
 values)
 2. Configuration file values (these are cluster level
 defaults: dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the
 org.apache.hadoop.hdfs.protocool.ClientProtocol the
   API
to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of
 these
   values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi 
prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does a client really need to know Block
  size and replication factor - A lot of times client has no
  control over
  these
(set
  at cluster level)
 
  -Prashant Kommireddi
 
  On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges 
   dejan.men...@gmail.com
  wrote:
 
   Hi

RE: HDFS Explained as Comics

2011-12-01 Thread Ravi teja ch n v
Thats indeed a great piece of work Maneesh...Waiting for the mapreduce comic :)

Regards,
Ravi Teja

From: Dieter Plaetinck [dieter.plaeti...@intec.ugent.be]
Sent: 01 December 2011 15:11:36
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Very clear.  The comic format works indeed quite well.
I never considered comics as a serious (professional) way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
Alexander C.H. Lorenz wget.n...@googlemail.com wrote:

 Hi all,

 very cool comic!

 Thanks,
  Alex

 On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
 manu.i...@gmail.com
  wrote:

  Hi,
 
  This is indeed a good way to explain, most of the improvement has
  already been discussed. waiting for sequel of this comic.
 
  Regards,
  Abhishek
 
  On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
  mvarsh...@gmail.com
  wrote:
 
   Hi Matthew
  
   I agree with both you and Prashant. The strip needs to be
   modified to explain that these can be default values that can be
   optionally
  overridden
   (which I will fix in the next iteration).
  
   However, from the 'understanding concepts of HDFS' point of view,
   I still think that block size and replication factors are the
   real strengths of HDFS, and the learners must be exposed to them
   so that they get to see
  how
   hdfs is significantly different from conventional file systems.
  
   On personal note: thanks for the first part of your message :)
  
   -Maneesh
  
  
   On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
   matthew.go...@monsanto.com wrote:
  
Maneesh,
   
Firstly, I love the comic :)
   
Secondly, I am inclined to agree with Prashant on this latest
point.
   While
one code path could take us through the user defining command
line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
might
  confuse
   a
person new to Hadoop. The most common flow would be using admin
   determined
values from hdfs-site and the only thing that would need to
change is
   that
conversation happening between client / server and not user /
client.
   
Matt
   
-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com]
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics
   
Sure, its just a case of how readers interpret it.
   
  1. Client is required to specify block size and replication
factor
  each
  time
  2. Client does not need to worry about it since an admin has
set the properties in default configuration files
   
A client could not be allowed to override the default configs
if they
  are
set final (well there are ways to go around it as well as you
suggest
  by
using create() :)
   
The information is great and helpful. Just want to make sure a
beginner
   who
wants to write a WordCount in Mapreduce does not worry about
  specifying
block size' and replication factor in his code.
   
Thanks,
Prashant
   
On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
mvarsh...@gmail.com
wrote:
   
 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
 of
  block
size
 and replication factor. In the source code, I see the
 following in
  the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size,
   DEFAULT_BLOCK_SIZE);

defaultReplication = (short)
 conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following
 chain for
   the
 values:
 1. Manual values (the long form constructor; when a user
 provides
  these
 values)
 2. Configuration file values (these are cluster level
 defaults: dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the
 org.apache.hadoop.hdfs.protocool.ClientProtocol the
   API
to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of
 these
   values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi 
prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does a client

RE: HDFS Explained as Comics

2011-12-01 Thread Ravi teja ch n v

Thats indeed a great piece of work Maneesh...Waiting for the mapreduce comic :)

Regards,
Ravi Teja

From: Dieter Plaetinck [dieter.plaeti...@intec.ugent.be]
Sent: 01 December 2011 15:11:36
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Very clear.  The comic format works indeed quite well.
I never considered comics as a serious (professional) way to get something 
explained efficiently,
but this shows people should think twice before they start writing their next 
documentation.

one question though: if a DN has a corrupted block, why does the NN only remove 
the bad DN from the block's list, and not the block from the DN list?
(also, does it really store the data in 2 separate tables?  This looks to me 
like 2 different views of the same data?)

Dieter

On Thu, 1 Dec 2011 08:53:31 +0100
Alexander C.H. Lorenz wget.n...@googlemail.com wrote:

 Hi all,

 very cool comic!

 Thanks,
  Alex

 On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
 manu.i...@gmail.com
  wrote:

  Hi,
 
  This is indeed a good way to explain, most of the improvement has
  already been discussed. waiting for sequel of this comic.
 
  Regards,
  Abhishek
 
  On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
  mvarsh...@gmail.com
  wrote:
 
   Hi Matthew
  
   I agree with both you and Prashant. The strip needs to be
   modified to explain that these can be default values that can be
   optionally
  overridden
   (which I will fix in the next iteration).
  
   However, from the 'understanding concepts of HDFS' point of view,
   I still think that block size and replication factors are the
   real strengths of HDFS, and the learners must be exposed to them
   so that they get to see
  how
   hdfs is significantly different from conventional file systems.
  
   On personal note: thanks for the first part of your message :)
  
   -Maneesh
  
  
   On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
   matthew.go...@monsanto.com wrote:
  
Maneesh,
   
Firstly, I love the comic :)
   
Secondly, I am inclined to agree with Prashant on this latest
point.
   While
one code path could take us through the user defining command
line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
might
  confuse
   a
person new to Hadoop. The most common flow would be using admin
   determined
values from hdfs-site and the only thing that would need to
change is
   that
conversation happening between client / server and not user /
client.
   
Matt
   
-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com]
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics
   
Sure, its just a case of how readers interpret it.
   
  1. Client is required to specify block size and replication
factor
  each
  time
  2. Client does not need to worry about it since an admin has
set the properties in default configuration files
   
A client could not be allowed to override the default configs
if they
  are
set final (well there are ways to go around it as well as you
suggest
  by
using create() :)
   
The information is great and helpful. Just want to make sure a
beginner
   who
wants to write a WordCount in Mapreduce does not worry about
  specifying
block size' and replication factor in his code.
   
Thanks,
Prashant
   
On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
mvarsh...@gmail.com
wrote:
   
 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
 of
  block
size
 and replication factor. In the source code, I see the
 following in
  the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size,
   DEFAULT_BLOCK_SIZE);

defaultReplication = (short)
 conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following
 chain for
   the
 values:
 1. Manual values (the long form constructor; when a user
 provides
  these
 values)
 2. Configuration file values (these are cluster level
 defaults: dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the
 org.apache.hadoop.hdfs.protocool.ClientProtocol the
   API
to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of
 these
   values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi 
prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does

Re: HDFS Explained as Comics

2011-12-01 Thread maneesh varshney
Hi Dieter

Very clear.  The comic format works indeed quite well.
 I never considered comics as a serious (professional) way to get
 something explained efficiently,
 but this shows people should think twice before they start writing their
 next documentation.


Thanks! :)


 one question though: if a DN has a corrupted block, why does the NN only
 remove the bad DN from the block's list, and not the block from the DN list?


You are right. This needs to be fixed.


 (also, does it really store the data in 2 separate tables?  This looks to
 me like 2 different views of the same data?)


Actually its more than two tables... I have personally found the data
structures rather contrived.

In the org.apache.hadoop.hdfs.server.namenode package, information is kept
in multiple places:
- InodeFile, which has a list of blocks for a given file
- FSNamesystem, has a map of block - {inode, datanodes}
- BlockInfo, which stores information in rather strange manner:

/**

 * This array contains triplets of references.

 * For each i-th data-node the block belongs to

 * triplets[3*i] is the reference to the DatanodeDescriptor

 * and triplets[3*i+1] and triplets[3*i+2] are references

 * to the previous and the next blocks, respectively, in the

 * list of blocks belonging to this data-node.

 */

private Object[] triplets;





 On Thu, 1 Dec 2011 08:53:31 +0100
 Alexander C.H. Lorenz wget.n...@googlemail.com wrote:

  Hi all,
 
  very cool comic!
 
  Thanks,
   Alex
 
  On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh
  manu.i...@gmail.com
   wrote:
 
   Hi,
  
   This is indeed a good way to explain, most of the improvement has
   already been discussed. waiting for sequel of this comic.
  
   Regards,
   Abhishek
  
   On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney
   mvarsh...@gmail.com
   wrote:
  
Hi Matthew
   
I agree with both you and Prashant. The strip needs to be
modified to explain that these can be default values that can be
optionally
   overridden
(which I will fix in the next iteration).
   
However, from the 'understanding concepts of HDFS' point of view,
I still think that block size and replication factors are the
real strengths of HDFS, and the learners must be exposed to them
so that they get to see
   how
hdfs is significantly different from conventional file systems.
   
On personal note: thanks for the first part of your message :)
   
-Maneesh
   
   
On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
matthew.go...@monsanto.com wrote:
   
 Maneesh,

 Firstly, I love the comic :)

 Secondly, I am inclined to agree with Prashant on this latest
 point.
While
 one code path could take us through the user defining command
 line overrides (e.g. hadoop fs -D blah -put foo bar) I think it
 might
   confuse
a
 person new to Hadoop. The most common flow would be using admin
determined
 values from hdfs-site and the only thing that would need to
 change is
that
 conversation happening between client / server and not user /
 client.

 Matt

 -Original Message-
 From: Prashant Kommireddi [mailto:prash1...@gmail.com]
 Sent: Wednesday, November 30, 2011 3:28 PM
 To: common-user@hadoop.apache.org
 Subject: Re: HDFS Explained as Comics

 Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication
 factor
   each
   time
   2. Client does not need to worry about it since an admin has
 set the properties in default configuration files

 A client could not be allowed to override the default configs
 if they
   are
 set final (well there are ways to go around it as well as you
 suggest
   by
 using create() :)

 The information is great and helpful. Just want to make sure a
 beginner
who
 wants to write a WordCount in Mapreduce does not worry about
   specifying
 block size' and replication factor in his code.

 Thanks,
 Prashant

 On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney
 mvarsh...@gmail.com
 wrote:

  Hi Prashant
 
  Others may correct me if I am wrong here..
 
  The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge
  of
   block
 size
  and replication factor. In the source code, I see the
  following in
   the
  DFSClient constructor:
 
 defaultBlockSize = conf.getLong(dfs.block.size,
DEFAULT_BLOCK_SIZE);
 
 defaultReplication = (short)
  conf.getInt(dfs.replication, 3);
 
  My understanding is that the client considers the following
  chain for
the
  values:
  1. Manual values (the long form constructor; when a user
  provides
   these
  values)
  2. Configuration file values (these are cluster level

HDFS Explained as Comics

2011-11-30 Thread maneesh varshney
For your reading pleasure!

PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1


Appreciate if you can spare some time to peruse this little experiment of
mine to use Comics as a medium to explain computer science topics. This
particular issue explains the protocols and internals of HDFS.

I am eager to hear your opinions on the usefulness of this visual medium to
teach complex protocols and algorithms.

[My personal motivations: I have always found text descriptions to be too
verbose as lot of effort is spent putting the concepts in proper time-space
context (which can be easily avoided in a visual medium); sequence diagrams
are unwieldy for non-trivial protocols, and they do not explain concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]

All forms of criticisms, comments (and encouragements) welcome :)

Thanks
Maneesh


Re: HDFS Explained as Comics

2011-11-30 Thread Dejan Menges
Hi Maneesh,

Thanks a lot for this! Just distributed it over the team and comments are
great :)

Best regards,
Dejan

On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.comwrote:

 For your reading pleasure!

 PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):

 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1


 Appreciate if you can spare some time to peruse this little experiment of
 mine to use Comics as a medium to explain computer science topics. This
 particular issue explains the protocols and internals of HDFS.

 I am eager to hear your opinions on the usefulness of this visual medium to
 teach complex protocols and algorithms.

 [My personal motivations: I have always found text descriptions to be too
 verbose as lot of effort is spent putting the concepts in proper time-space
 context (which can be easily avoided in a visual medium); sequence diagrams
 are unwieldy for non-trivial protocols, and they do not explain concepts;
 and finally, animations/videos happen too fast and do not offer
 self-paced learning experience.]

 All forms of criticisms, comments (and encouragements) welcome :)

 Thanks
 Maneesh



Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
Thanks Maneesh.

Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these (set
at cluster level)

-Prashant Kommireddi

On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.comwrote:

 Hi Maneesh,

 Thanks a lot for this! Just distributed it over the team and comments are
 great :)

 Best regards,
 Dejan

 On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
 wrote:

  For your reading pleasure!
 
  PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
 
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
 
 
  Appreciate if you can spare some time to peruse this little experiment of
  mine to use Comics as a medium to explain computer science topics. This
  particular issue explains the protocols and internals of HDFS.
 
  I am eager to hear your opinions on the usefulness of this visual medium
 to
  teach complex protocols and algorithms.
 
  [My personal motivations: I have always found text descriptions to be too
  verbose as lot of effort is spent putting the concepts in proper
 time-space
  context (which can be easily avoided in a visual medium); sequence
 diagrams
  are unwieldy for non-trivial protocols, and they do not explain concepts;
  and finally, animations/videos happen too fast and do not offer
  self-paced learning experience.]
 
  All forms of criticisms, comments (and encouragements) welcome :)
 
  Thanks
  Maneesh
 



Re: HDFS Explained as Comics

2011-11-30 Thread maneesh varshney
Hi Prashant

Others may correct me if I am wrong here..

The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
and replication factor. In the source code, I see the following in the
DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

My understanding is that the client considers the following chain for the
values:
1. Manual values (the long form constructor; when a user provides these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
create a file is
void create(, short replication, long blocksize);

I presume it means that the client already has knowledge of these values
and passes them to the NameNode when creating a new file.

Hope that helps.

thanks
-Maneesh

On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.comwrote:

 Thanks Maneesh.

 Quick question, does a client really need to know Block size and
 replication factor - A lot of times client has no control over these (set
 at cluster level)

 -Prashant Kommireddi

 On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
 wrote:

  Hi Maneesh,
 
  Thanks a lot for this! Just distributed it over the team and comments are
  great :)
 
  Best regards,
  Dejan
 
  On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
  wrote:
 
   For your reading pleasure!
  
   PDF 3.3MB uploaded at (the mailing list has a cap of 1MB attachments):
  
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
  
  
   Appreciate if you can spare some time to peruse this little experiment
 of
   mine to use Comics as a medium to explain computer science topics. This
   particular issue explains the protocols and internals of HDFS.
  
   I am eager to hear your opinions on the usefulness of this visual
 medium
  to
   teach complex protocols and algorithms.
  
   [My personal motivations: I have always found text descriptions to be
 too
   verbose as lot of effort is spent putting the concepts in proper
  time-space
   context (which can be easily avoided in a visual medium); sequence
  diagrams
   are unwieldy for non-trivial protocols, and they do not explain
 concepts;
   and finally, animations/videos happen too fast and do not offer
   self-paced learning experience.]
  
   All forms of criticisms, comments (and encouragements) welcome :)
  
   Thanks
   Maneesh
  
 



Re: HDFS Explained as Comics

2011-11-30 Thread Prashant Kommireddi
Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a WordCount in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.comwrote:

 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
 and replication factor. In the source code, I see the following in the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following chain for the
 values:
 1. Manual values (the long form constructor; when a user provides these
 values)
 2. Configuration file values (these are cluster level defaults:
 dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of these values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does a client really need to know Block size and
  replication factor - A lot of times client has no control over these (set
  at cluster level)
 
  -Prashant Kommireddi
 
  On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
  wrote:
 
   Hi Maneesh,
  
   Thanks a lot for this! Just distributed it over the team and comments
 are
   great :)
  
   Best regards,
   Dejan
  
   On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
   wrote:
  
For your reading pleasure!
   
PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
 attachments):
   
   
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
   
   
Appreciate if you can spare some time to peruse this little
 experiment
  of
mine to use Comics as a medium to explain computer science topics.
 This
particular issue explains the protocols and internals of HDFS.
   
I am eager to hear your opinions on the usefulness of this visual
  medium
   to
teach complex protocols and algorithms.
   
[My personal motivations: I have always found text descriptions to be
  too
verbose as lot of effort is spent putting the concepts in proper
   time-space
context (which can be easily avoided in a visual medium); sequence
   diagrams
are unwieldy for non-trivial protocols, and they do not explain
  concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]
   
All forms of criticisms, comments (and encouragements) welcome :)
   
Thanks
Maneesh
   
  
 



RE: HDFS Explained as Comics

2011-11-30 Thread GOEKE, MATTHEW (AG/1000)
Maneesh,

Firstly, I love the comic :)

Secondly, I am inclined to agree with Prashant on this latest point. While one 
code path could take us through the user defining command line overrides (e.g. 
hadoop fs -D blah -put foo bar) I think it might confuse a person new to 
Hadoop. The most common flow would be using admin determined values from 
hdfs-site and the only thing that would need to change is that conversation 
happening between client / server and not user / client.

Matt

-Original Message-
From: Prashant Kommireddi [mailto:prash1...@gmail.com] 
Sent: Wednesday, November 30, 2011 3:28 PM
To: common-user@hadoop.apache.org
Subject: Re: HDFS Explained as Comics

Sure, its just a case of how readers interpret it.

   1. Client is required to specify block size and replication factor each
   time
   2. Client does not need to worry about it since an admin has set the
   properties in default configuration files

A client could not be allowed to override the default configs if they are
set final (well there are ways to go around it as well as you suggest by
using create() :)

The information is great and helpful. Just want to make sure a beginner who
wants to write a WordCount in Mapreduce does not worry about specifying
block size' and replication factor in his code.

Thanks,
Prashant

On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.comwrote:

 Hi Prashant

 Others may correct me if I am wrong here..

 The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block size
 and replication factor. In the source code, I see the following in the
 DFSClient constructor:

defaultBlockSize = conf.getLong(dfs.block.size, DEFAULT_BLOCK_SIZE);

defaultReplication = (short) conf.getInt(dfs.replication, 3);

 My understanding is that the client considers the following chain for the
 values:
 1. Manual values (the long form constructor; when a user provides these
 values)
 2. Configuration file values (these are cluster level defaults:
 dfs.block.size and dfs.replication)
 3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)

 Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the API to
 create a file is
 void create(, short replication, long blocksize);

 I presume it means that the client already has knowledge of these values
 and passes them to the NameNode when creating a new file.

 Hope that helps.

 thanks
 -Maneesh

 On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:

  Thanks Maneesh.
 
  Quick question, does a client really need to know Block size and
  replication factor - A lot of times client has no control over these (set
  at cluster level)
 
  -Prashant Kommireddi
 
  On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges dejan.men...@gmail.com
  wrote:
 
   Hi Maneesh,
  
   Thanks a lot for this! Just distributed it over the team and comments
 are
   great :)
  
   Best regards,
   Dejan
  
   On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney mvarsh...@gmail.com
   wrote:
  
For your reading pleasure!
   
PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
 attachments):
   
   
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
   
   
Appreciate if you can spare some time to peruse this little
 experiment
  of
mine to use Comics as a medium to explain computer science topics.
 This
particular issue explains the protocols and internals of HDFS.
   
I am eager to hear your opinions on the usefulness of this visual
  medium
   to
teach complex protocols and algorithms.
   
[My personal motivations: I have always found text descriptions to be
  too
verbose as lot of effort is spent putting the concepts in proper
   time-space
context (which can be easily avoided in a visual medium); sequence
   diagrams
are unwieldy for non-trivial protocols, and they do not explain
  concepts;
and finally, animations/videos happen too fast and do not offer
self-paced learning experience.]
   
All forms of criticisms, comments (and encouragements) welcome :)
   
Thanks
Maneesh
   
  
 

This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of Viruses or other Malware.
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained

Re: HDFS Explained as Comics

2011-11-30 Thread Abhishek Pratap Singh
Hi,

This is indeed a good way to explain, most of the improvement has already
been discussed. waiting for sequel of this comic.

Regards,
Abhishek

On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney mvarsh...@gmail.comwrote:

 Hi Matthew

 I agree with both you and Prashant. The strip needs to be modified to
 explain that these can be default values that can be optionally overridden
 (which I will fix in the next iteration).

 However, from the 'understanding concepts of HDFS' point of view, I still
 think that block size and replication factors are the real strengths of
 HDFS, and the learners must be exposed to them so that they get to see how
 hdfs is significantly different from conventional file systems.

 On personal note: thanks for the first part of your message :)

 -Maneesh


 On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
 matthew.go...@monsanto.com wrote:

  Maneesh,
 
  Firstly, I love the comic :)
 
  Secondly, I am inclined to agree with Prashant on this latest point.
 While
  one code path could take us through the user defining command line
  overrides (e.g. hadoop fs -D blah -put foo bar) I think it might confuse
 a
  person new to Hadoop. The most common flow would be using admin
 determined
  values from hdfs-site and the only thing that would need to change is
 that
  conversation happening between client / server and not user / client.
 
  Matt
 
  -Original Message-
  From: Prashant Kommireddi [mailto:prash1...@gmail.com]
  Sent: Wednesday, November 30, 2011 3:28 PM
  To: common-user@hadoop.apache.org
  Subject: Re: HDFS Explained as Comics
 
  Sure, its just a case of how readers interpret it.
 
1. Client is required to specify block size and replication factor each
time
2. Client does not need to worry about it since an admin has set the
properties in default configuration files
 
  A client could not be allowed to override the default configs if they are
  set final (well there are ways to go around it as well as you suggest by
  using create() :)
 
  The information is great and helpful. Just want to make sure a beginner
 who
  wants to write a WordCount in Mapreduce does not worry about specifying
  block size' and replication factor in his code.
 
  Thanks,
  Prashant
 
  On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.com
  wrote:
 
   Hi Prashant
  
   Others may correct me if I am wrong here..
  
   The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of block
  size
   and replication factor. In the source code, I see the following in the
   DFSClient constructor:
  
  defaultBlockSize = conf.getLong(dfs.block.size,
 DEFAULT_BLOCK_SIZE);
  
  defaultReplication = (short) conf.getInt(dfs.replication, 3);
  
   My understanding is that the client considers the following chain for
 the
   values:
   1. Manual values (the long form constructor; when a user provides these
   values)
   2. Configuration file values (these are cluster level defaults:
   dfs.block.size and dfs.replication)
   3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
  
   Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
 API
  to
   create a file is
   void create(, short replication, long blocksize);
  
   I presume it means that the client already has knowledge of these
 values
   and passes them to the NameNode when creating a new file.
  
   Hope that helps.
  
   thanks
   -Maneesh
  
   On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi 
  prash1...@gmail.com
   wrote:
  
Thanks Maneesh.
   
Quick question, does a client really need to know Block size and
replication factor - A lot of times client has no control over these
  (set
at cluster level)
   
-Prashant Kommireddi
   
On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges 
 dejan.men...@gmail.com
wrote:
   
 Hi Maneesh,

 Thanks a lot for this! Just distributed it over the team and
 comments
   are
 great :)

 Best regards,
 Dejan

 On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney 
  mvarsh...@gmail.com
 wrote:

  For your reading pleasure!
 
  PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
   attachments):
 
 

   
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
 
 
  Appreciate if you can spare some time to peruse this little
   experiment
of
  mine to use Comics as a medium to explain computer science
 topics.
   This
  particular issue explains the protocols and internals of HDFS.
 
  I am eager to hear your opinions on the usefulness of this visual
medium
 to
  teach complex protocols and algorithms.
 
  [My personal motivations: I have always found text descriptions
 to
  be
too
  verbose as lot of effort is spent putting the concepts in proper
 time-space
  context (which can be easily avoided in a visual

Re: HDFS Explained as Comics

2011-11-30 Thread Alexander C.H. Lorenz
Hi all,

very cool comic!

Thanks,
 Alex

On Wed, Nov 30, 2011 at 11:58 PM, Abhishek Pratap Singh manu.i...@gmail.com
 wrote:

 Hi,

 This is indeed a good way to explain, most of the improvement has already
 been discussed. waiting for sequel of this comic.

 Regards,
 Abhishek

 On Wed, Nov 30, 2011 at 1:55 PM, maneesh varshney mvarsh...@gmail.com
 wrote:

  Hi Matthew
 
  I agree with both you and Prashant. The strip needs to be modified to
  explain that these can be default values that can be optionally
 overridden
  (which I will fix in the next iteration).
 
  However, from the 'understanding concepts of HDFS' point of view, I still
  think that block size and replication factors are the real strengths of
  HDFS, and the learners must be exposed to them so that they get to see
 how
  hdfs is significantly different from conventional file systems.
 
  On personal note: thanks for the first part of your message :)
 
  -Maneesh
 
 
  On Wed, Nov 30, 2011 at 1:36 PM, GOEKE, MATTHEW (AG/1000) 
  matthew.go...@monsanto.com wrote:
 
   Maneesh,
  
   Firstly, I love the comic :)
  
   Secondly, I am inclined to agree with Prashant on this latest point.
  While
   one code path could take us through the user defining command line
   overrides (e.g. hadoop fs -D blah -put foo bar) I think it might
 confuse
  a
   person new to Hadoop. The most common flow would be using admin
  determined
   values from hdfs-site and the only thing that would need to change is
  that
   conversation happening between client / server and not user / client.
  
   Matt
  
   -Original Message-
   From: Prashant Kommireddi [mailto:prash1...@gmail.com]
   Sent: Wednesday, November 30, 2011 3:28 PM
   To: common-user@hadoop.apache.org
   Subject: Re: HDFS Explained as Comics
  
   Sure, its just a case of how readers interpret it.
  
 1. Client is required to specify block size and replication factor
 each
 time
 2. Client does not need to worry about it since an admin has set the
 properties in default configuration files
  
   A client could not be allowed to override the default configs if they
 are
   set final (well there are ways to go around it as well as you suggest
 by
   using create() :)
  
   The information is great and helpful. Just want to make sure a beginner
  who
   wants to write a WordCount in Mapreduce does not worry about
 specifying
   block size' and replication factor in his code.
  
   Thanks,
   Prashant
  
   On Wed, Nov 30, 2011 at 1:18 PM, maneesh varshney mvarsh...@gmail.com
   wrote:
  
Hi Prashant
   
Others may correct me if I am wrong here..
   
The client (org.apache.hadoop.hdfs.DFSClient) has a knowledge of
 block
   size
and replication factor. In the source code, I see the following in
 the
DFSClient constructor:
   
   defaultBlockSize = conf.getLong(dfs.block.size,
  DEFAULT_BLOCK_SIZE);
   
   defaultReplication = (short) conf.getInt(dfs.replication, 3);
   
My understanding is that the client considers the following chain for
  the
values:
1. Manual values (the long form constructor; when a user provides
 these
values)
2. Configuration file values (these are cluster level defaults:
dfs.block.size and dfs.replication)
3. Finally, the hardcoded values (DEFAULT_BLOCK_SIZE and 3)
   
Moreover, in the org.apache.hadoop.hdfs.protocool.ClientProtocol the
  API
   to
create a file is
void create(, short replication, long blocksize);
   
I presume it means that the client already has knowledge of these
  values
and passes them to the NameNode when creating a new file.
   
Hope that helps.
   
thanks
-Maneesh
   
On Wed, Nov 30, 2011 at 1:04 PM, Prashant Kommireddi 
   prash1...@gmail.com
wrote:
   
 Thanks Maneesh.

 Quick question, does a client really need to know Block size and
 replication factor - A lot of times client has no control over
 these
   (set
 at cluster level)

 -Prashant Kommireddi

 On Wed, Nov 30, 2011 at 12:51 PM, Dejan Menges 
  dejan.men...@gmail.com
 wrote:

  Hi Maneesh,
 
  Thanks a lot for this! Just distributed it over the team and
  comments
are
  great :)
 
  Best regards,
  Dejan
 
  On Wed, Nov 30, 2011 at 9:28 PM, maneesh varshney 
   mvarsh...@gmail.com
  wrote:
 
   For your reading pleasure!
  
   PDF 3.3MB uploaded at (the mailing list has a cap of 1MB
attachments):
  
  
 

   
  
 
 https://docs.google.com/open?id=0B-zw6KHOtbT4MmRkZWJjYzEtYjI3Ni00NTFjLWE0OGItYTU5OGMxYjc0N2M1
  
  
   Appreciate if you can spare some time to peruse this little
experiment
 of
   mine to use Comics as a medium to explain computer science
  topics.
This
   particular issue explains the protocols and internals of HDFS.
  
   I am eager to hear your opinions on the usefulness of this
 visual