DataNode/TaskTracker memory constraints.

2008-12-15 Thread Marcus Herou
Hi.

All Hadoop components are started with -Xmx1000M as per default. I am
planning to throw in some data/task nodes here and there in my arch. However
most machines have only 4G physical RAM so allocating 2G + overhead ~2.5G to
hadoop is a little risky since they could very well become inaccessible if
it needs to compete with other processes for RAM. I have experienced this
many times with java processes going haywire where I run other services in
parallell.
Anyway I would like to understand the reasoning about having 1G allocated
per process. I figure that the DataNode could survive with a little less as
well the TaskTracker if the jobs running in it do not consume so much
memory. Of course each process would like to have even more memory than 1G
but if I need to cut down I would like to know which to cut and what I loose
by doing so.

Any thoughts? Trial and error is of course an option but I would like to
hear the basic thoughts about how memory should be utilized to gain max out
of the boxes.

Kindly

//Marcus





-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


concurrent modification of input files

2008-12-15 Thread Sandhya E
Hi

I have a scenario where, while a map/reduce is working on a file, the
input file may get deleted and copied with a new version of the file.
All my files are compressed and hence each file is worked on by a
single node. I tried simulating the scenario of deleting the file
before mapreduce was over, and map/reduce went ahead without
complaining. Can I assume this will be the case always.

Many thanks in adv
Sandhya


Re: The error occurred when a lot of files created use fuse-dfs

2008-12-15 Thread Brian Bockelman

Hey,

What version of Hadoop are you running?  Have you taken a look at  
HADOOP-4775?


https://issues.apache.org/jira/browse/HADOOP-4775

Basically, fuse-dfs is not usable on Hadoop 0.19.0 without a patch.

Brian

On Dec 15, 2008, at 12:24 AM, zhuweimin wrote:


Dear fuse-dfs users

I copy 1000 files into hadoop from local disk use fuse-dfs,
Display the following error when the 600th files are copied:

cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_33.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_34.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_35.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_36.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_37.dat':

Input/output error
cp: cannot create regular file `/mnt/dfs/user/hadoop/fuse3/10m/ 
10m_38.dat':

Input/output error
...

It is necessary to remount the fuse-dfs.

Do you think about of the error.

thanks






Re: Suggestions of proper usage of key parameter ?

2008-12-15 Thread Aaron Kimball
To expand a bit on Owen's remarks:

It should be pointed out that in the case of a single MapReduce pass from an
input dataset to an output dataset, the keys into the mapper and out of the
reducer may not be particularly interesting to you. However, more
complicated algorithms often involve multiple MapReduce jobs, where an input
dataset has an MR pass over it, yielding an intermediate dataset which
undergoes yet another MR pass, yielding a final dataset.

In such cases, this provides continuity of interface, where every stage has
both key and value components. Very often the dataset gets partitioned
or organized in such a way that it makes sense to stamp a key on to values
early on in the process, and continue to allow the keys to flow through the
system between passes. This interface makes that much more convenient. For
example, you may have an input record which is joined against multiple other
datasets. Each other data set join may involve a separate mapreduce pass,
but the primary key will be the same the whole time.

As for determining the input key types: The default TextInputFormat gives
you a line offset and a line of text. This offset may not be particularly
useful to your application.  On the other hand, the KeyValueTextInputFormat
will read each line of text from the input file, and split this into a key
Text object and a value Text object based on the first tab char in the line.
This matches the formatting of output files done by the default
TextOutputFormat.  Chained MRs should set the input format to this one, as
Hadoop won't know that this is your intended use case.

If your intermediate reducer outputs more complicated datatypes, you may
want to use SequenceFileOutputFormat, which marshals your data types into a
binary file format. The SequenceFileInputFormat will then read in the data,
and demarshal it into the same data types that you had already encoded.
(Your final pass may want to use TextOutputFormat to get everything back to
a human-readable form)

- Aaron


On Sun, Dec 14, 2008 at 11:39 PM, Owen O'Malley omal...@apache.org wrote:


 On Dec 14, 2008, at 4:47 PM, Ricky Ho wrote:

  Yes, I am referring to the key INPUT INTO the map() function and the
 key EMITTED FROM the reduce() function.  Can someone explain why do we
 need a key in these cases and what is the proper use of it ?


 It was a design choice and could have been done as:

 R1 - map - K,V - reduce - R2

 instead of

 K1,V1 - map - K2,V2 - reduce - K3,V3

 but since the input of the reduce is sorted on K2, the output of the reduce
 is also typically sorted and therefore keyed. Since jobs are often chained
 together, it makes sense to make the reduce input match the map input. Of
 course everything you could do with the first option is possible with the
 second using either K1 = R1 or V1 = R1. Having the keys is often
 convenient...

  Who determines what the key should be ?  (by the corresponding
 InputFormat implementation class) ?


 The InputFormat makes the choice.

  In this case, what is the key in the map() call ?  (name of the input
 file) ?


 TextInputFormat uses the byte offset as the key and the line as the value.

  What if the reduce() function emits multiple key, value entries or not
 emitting any entry at all ?  Is this considered OK ?


 Yes.

  What if the reduce() function emits a key, value entry whose key is not
 the same as the input key parameter to the reduce() function ?  Is this OK ?


 Yes, although the reduce output is not re-sorted, so the results won't be
 sorted unless K3 is a subset of K2.

  If there is a two Map/Reduce cycle chained together.  Is the key input
 into the 2nd round map() function determined by the key emitted from the
 1st round reduce() function ?


 Yes.

 -- Owen



RE: Suggestions of proper usage of key parameter ?

2008-12-15 Thread Ricky Ho
Thanks for the detail explanation of Aaron and Owen's response.

One key takeaway point is that the OutputFormat of a previous Job need to be 
compatible to the InputFormat of its subsequent job.  Otherwise, the 
key/value demarcation will screw up.

It still not very clear to me when to emit multiple entries versus single entry 
in the reduce() function.  Here is a typical case where a reduce() function 
counts the number of times a particular word appears in a particular document.  
There are multiple possible choices in how the reduce() function can be 
structured.  And which one to choose is not clear to me ...

Choice 1:  Emit one entry in the reduce(), using doc_name as key
==

# k2 is doc_name,  v2_list is a list of [word, count]

reduce(k2, v2_list) {
count_holder = Hash.new

for v in v2_list {
word = v[0]
count = v[1]
count_holder[word] += count
}

arr = Array.new
for e in count_holder.entries {
word = e.key
count = e.value
arr.add([word, count])
}

doc_name = k2
emit(doc_name, arr)
}

In this case, the output will still be a single entry per invocation of 
reduce() ...

{
  key:file1,
  value:[[apple, 25],
   [orange, 16]]
}

Choice 2:  Emit multiple entries in the reduce(), using [doc_name, word] as key


reduce(k2, v2_list) {
count_holder = Hash.new

for v in v2_list {
word = v[0]
count = v[1]
count_holder[word] += count
}

doc_name = k2
for e in count_holder.entries {
word = e.key
count = e.value
emit([doc_name, word], count)
}
}

In this case, the output will have multiple entries per invocation of reduce() 
...

{
  key:[file1, apple],
  value:25
}

{
  key:[file1, orange],
  value:16
}


Choice 3:  Emit one entry in the reduce(), using null key


reduce(k2, v2_list) {
doc_name = k2
count_holder = Hash.new

for v in v2_list {
word = v[0]
count = v[1]
count_holder[word] += count
}

arr = Array.new
for e in count_holder.entries {
word = e.key
count = e.value
arr.add([doc_name, word, count])
}

emit(null, arr)
}

In this case, the output will be a single entry per invocation of reduce() ...

{
  key:null,
  value:[[file1, apple, 25],
   [file1, orange, 16]]
}

Can you compare the above options and shed some light ?

Rgds, Ricky


-Original Message-
From: Aaron Kimball [mailto:aa...@cloudera.com]
Sent: Monday, December 15, 2008 4:13 AM
To: core-user@hadoop.apache.org
Subject: Re: Suggestions of proper usage of key parameter ?

To expand a bit on Owen's remarks:

It should be pointed out that in the case of a single MapReduce pass from an
input dataset to an output dataset, the keys into the mapper and out of the
reducer may not be particularly interesting to you. However, more
complicated algorithms often involve multiple MapReduce jobs, where an input
dataset has an MR pass over it, yielding an intermediate dataset which
undergoes yet another MR pass, yielding a final dataset.

In such cases, this provides continuity of interface, where every stage has
both key and value components. Very often the dataset gets partitioned
or organized in such a way that it makes sense to stamp a key on to values
early on in the process, and continue to allow the keys to flow through the
system between passes. This interface makes that much more convenient. For
example, you may have an input record which is joined against multiple other
datasets. Each other data set join may involve a separate mapreduce pass,
but the primary key will be the same the whole time.

As for determining the input key types: The default TextInputFormat gives
you a line offset and a line of text. This offset may not be particularly
useful to your application.  On the other hand, the KeyValueTextInputFormat
will read each line of text from the input file, and split this into a key
Text object and a value Text object based on the first tab char in the line.
This matches the formatting of output files done by the default
TextOutputFormat.  Chained MRs should set the input format to this one, as
Hadoop won't know that this is your intended use case.

If your intermediate reducer outputs more complicated datatypes, you may
want to use SequenceFileOutputFormat, which marshals your data types into a
binary file format. The SequenceFileInputFormat will then read in the data,
and demarshal it into the same data types that you had already encoded.
(Your final pass may want to use TextOutputFormat to get everything back to
a human-readable form)

- Aaron


On Sun, Dec 14, 2008 at 11:39 PM, Owen O'Malley omal...@apache.org wrote:


 On Dec 14, 2008, at 4:47 PM, Ricky Ho wrote:

  Yes, I am 

Re: concurrent modification of input files

2008-12-15 Thread Owen O'Malley


On Dec 15, 2008, at 8:08 AM, Sandhya E wrote:


I have a scenario where, while a map/reduce is working on a file, the
input file may get deleted and copied with a new version of the file.
All my files are compressed and hence each file is worked on by a
single node. I tried simulating the scenario of deleting the file
before mapreduce was over, and map/reduce went ahead without
complaining. Can I assume this will be the case always.


No, the results will be completely non-deterministic. Don't do this.  
That said, the thing that will save you in micro-tests of this is that  
if the file is missing at some point, the task will fail and retry.


-- Owen


NameNode fatal crash - 0.18.1

2008-12-15 Thread Jonathan Gray
I have a 10+1 node cluster, each slave running DataNode/TaskTracker/HBase
RegionServer.

At the time of this crash, NameNode and SecondaryNameNode were both hosted
on same master.

We do a nightly backup and about 95% of the way through, HDFS crashed
with...

NameNode shows:

2008-12-15 01:49:31,178 ERROR org.apache.hadoop.fs.FSNamesystem: Unable to
sync edit log. Fatal Error.
2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal Error
: All storage directories are inaccessible.
2008-12-15 01:49:31,179 INFO org.apache.hadoop.dfs.NameNode: SHUTDOWN_MSG:

Every single DataNode shows:

2008-12-15 01:49:32,340 WARN org.apache.hadoop.dfs.DataNode:
java.io.IOException: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.sendHeartbeat(Unknown Source)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:655)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441)


This is virtually all of the information I have.  At the same time as the
backup, we have normal HBase traffic and our hourly batch MR jobs.  So slave
nodes were pretty heavily loaded, but don't see anything in DN logs besides
this Call failed.  There are no space issues or anything else, Ganglia shows
high CPU load around this time which has been typical every night, but I
don't see any issues in DN's or NN about expired leases/no heartbeats/etc. 

Is there a way to prevent this failure from happening in the first place?  I
guess just reduce total load across cluster?

Second question is about how to recover once NameNode does fail...

When trying to bring HDFS back up, we get hundreds of:

2008-12-15 07:54:13,265 ERROR org.apache.hadoop.dfs.LeaseManager: XXX not
found in lease.paths

And then

2008-12-15 07:54:13,267 ERROR org.apache.hadoop.fs.FSNamesystem:
FSNamesystem initialization failed.


Is there a way to recover from this?  As of time of this crash, we had
SecondaryNameNode on the same node.  Moving it to another node with
sufficient memory now, but would that even prevent this kind of FS botching?

Also, my SecondaryNameNode is telling me it cannot connect when trying to do
a checkpoint:

2008-12-15 09:59:48,017 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2008-12-15 09:59:48,018 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)

I changed my masters file to just contain the hostname of the
secondarynamenode, this seems to have properly started the NameNode where I
launched the ./bin/start-dfs.sh from and started SecondaryNameNode on
correct node as well.  But it seems to be unable to connect back to primary.
I have hadoop-site.xml pointing to fs.default.name of primary, but otherwise
there are not links back.  Where would I specify to the secondary where
primary is located?

We're also upgrading to Hadoop 0.19.0 at this time.

Thank you for any help.

Jonathan Gray



Re: occasional createBlockException in Hadoop .18.1

2008-12-15 Thread Sagar Naik

Hi,
Some data points on this issue.
1) du runs for 20-30 secs.
2) after some time , I dont see any activity in datanode logs
3) I cant even jstack the datanode (forced it , gave me a 
DebuggerException, double checked the pid), the datanode:50075/stacks 
takes forever to respond


I can telnet to datanode:50010

I think, the disk is bad or something

Pl suggest some pointers to analyze this problem

-Sagar
Sagar Naik wrote:



CLIENT EXCEPTION:

2008-12-14 08:41:46,919 [Thread-90] INFO 
org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream 
java.net.SocketTimeoutException: 69000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.50.80.133:54045 
remote=/10.50.80.108:50010] 2008-12-14 08:41:46,919 [Thread-90] INFO 
org.apache.hadoop.dfs.DFSClient: Abandoning block 
blk_-7364265396616885025_5870078 2008-12-14 08:41:46,920 [Thread-90] 
INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target node: 
10.50.80.108:50010



DATANODE

2008-12-14 08:40:39,215 INFO org.apache.hadoop.dfs.DataNode: Receiving 
block blk_-7364265396616885025_5870078 src: /10.50.80.133:54045 dest: 
/10.50.80.133:50010

.
.
.
.
. 
I occasionally see the datanode as deadnode. When the datanode is 
deadnode, I see the du forked from datanode. The du is seen in D 
state




Any pointers to debug this information would help me

-Sagar




RE: NameNode fatal crash - 0.18.1

2008-12-15 Thread Jonathan Gray
I have fixed the issue with the SecondaryNameNode not contacting primary
with the 'dfs.http.address' config option.

Other issues still unsolved.

 -Original Message-
 From: Jonathan Gray [mailto:jl...@streamy.com]
 Sent: Monday, December 15, 2008 10:55 AM
 To: core-user@hadoop.apache.org
 Subject: NameNode fatal crash - 0.18.1
 
 I have a 10+1 node cluster, each slave running
 DataNode/TaskTracker/HBase
 RegionServer.
 
 At the time of this crash, NameNode and SecondaryNameNode were both
 hosted
 on same master.
 
 We do a nightly backup and about 95% of the way through, HDFS crashed
 with...
 
 NameNode shows:
 
 2008-12-15 01:49:31,178 ERROR org.apache.hadoop.fs.FSNamesystem: Unable
 to
 sync edit log. Fatal Error.
 2008-12-15 01:49:31,178 FATAL org.apache.hadoop.fs.FSNamesystem: Fatal
 Error
 : All storage directories are inaccessible.
 2008-12-15 01:49:31,179 INFO org.apache.hadoop.dfs.NameNode:
 SHUTDOWN_MSG:
 
 Every single DataNode shows:
 
 2008-12-15 01:49:32,340 WARN org.apache.hadoop.dfs.DataNode:
 java.io.IOException: Call failed on local exception
 at org.apache.hadoop.ipc.Client.call(Client.java:718)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
 at org.apache.hadoop.dfs.$Proxy4.sendHeartbeat(Unknown Source)
 at
 org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:655)
 at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 at java.lang.Thread.run(Thread.java:636)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readInt(DataInputStream.java:392)
 at
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:499
 )
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:441)
 
 
 This is virtually all of the information I have.  At the same time as
 the
 backup, we have normal HBase traffic and our hourly batch MR jobs.  So
 slave
 nodes were pretty heavily loaded, but don't see anything in DN logs
 besides
 this Call failed.  There are no space issues or anything else, Ganglia
 shows
 high CPU load around this time which has been typical every night, but
 I
 don't see any issues in DN's or NN about expired leases/no
 heartbeats/etc.
 
 Is there a way to prevent this failure from happening in the first
 place?  I
 guess just reduce total load across cluster?
 
 Second question is about how to recover once NameNode does fail...
 
 When trying to bring HDFS back up, we get hundreds of:
 
 2008-12-15 07:54:13,265 ERROR org.apache.hadoop.dfs.LeaseManager: XXX
 not
 found in lease.paths
 
 And then
 
 2008-12-15 07:54:13,267 ERROR org.apache.hadoop.fs.FSNamesystem:
 FSNamesystem initialization failed.
 
 
 Is there a way to recover from this?  As of time of this crash, we had
 SecondaryNameNode on the same node.  Moving it to another node with
 sufficient memory now, but would that even prevent this kind of FS
 botching?
 
 Also, my SecondaryNameNode is telling me it cannot connect when trying
 to do
 a checkpoint:
 
 2008-12-15 09:59:48,017 ERROR
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
 doCheckpoint:
 2008-12-15 09:59:48,018 ERROR
 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
 java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 
 I changed my masters file to just contain the hostname of the
 secondarynamenode, this seems to have properly started the NameNode
 where I
 launched the ./bin/start-dfs.sh from and started SecondaryNameNode on
 correct node as well.  But it seems to be unable to connect back to
 primary.
 I have hadoop-site.xml pointing to fs.default.name of primary, but
 otherwise
 there are not links back.  Where would I specify to the secondary where
 primary is located?
 
 We're also upgrading to Hadoop 0.19.0 at this time.
 
 Thank you for any help.
 
 Jonathan Gray



Re: 5 node Hadoop Cluster!!! Some Doubts...

2008-12-15 Thread Jean-Daniel Cryans
Sid,

For such a small cluster, just put the Jobtracker and Namenode on the same
machine and the Tasktrackers and Datanodes in pairs on the other machines. I
can't think of anything else that would have an impact on performance for
you.

J-D

On Thu, Dec 11, 2008 at 6:20 PM, Siddharth Malhotra 
sid86.malho...@gmail.com wrote:

 Hey,

 I am student and working on a project using Hadoop. I have successfully
 implemented the project on single node and Pseudo Distributed Mode.

 I would now be implementing it on a 5 node cluster but I wanted know if
 there would be any specific way I should setup the Tasktraker, Jobtracker
 
 Namenode, Datanode. Like should I setup the Namenode, JOBtracker to the
 same
 node or should I put it on different nodes. Similerly can I also setup the
 Datanode and Tasktracker on the Masternode or only on the slave nodes. Does
 any type of specific configuration help to improve the performance as such?

 Awaiting a reply soon.

 Thanks

 Sid

 --
 Siddharth Malhotra

 Mobile: +41 76 275 3991

 Mail: : sid86.malho...@gmail.com