Sam,

There is no formula for determining how much memory one should give to
datanode and tasktracker. Ther formula is available for how many slots you
want to have on a machine.

In my prior experience, we did give 512MB memory each to a datanode and
tasktracker.


On Mon, May 13, 2013 at 11:18 AM, sam liu <[email protected]> wrote:

> For node3, the memory is:
>                    total       used       free     shared    buffers
> cached
> Mem:          3834       3666        167          0        187       1136
> -/+ buffers/cache:       2342       1491
> Swap:         8196          0       8196
>
> To a 3 nodes cluster as mine, what's the required minimum free/available
> memory for the datanode process and tasktracker process, without running
> any map/reduce task?
> Any formula to determine it?
>
>
> 2013/5/13 Rishi Yadav <[email protected]>
>
>> can you tell specs of node3. Even on a test/demo cluster, anything below
>> 4 GB ram makes the node almost inaccessible as per my experience.
>>
>>
>>
>> On Sun, May 12, 2013 at 8:25 PM, sam liu <[email protected]> wrote:
>>
>>> Got some exceptions on node3:
>>> 1. datanode log:
>>> 2013-04-17 11:13:44,719 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_2478755809192724446_1477 received exception
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>> 2013-04-17 11:13:44,721 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>> 9.50.102.80:50010,
>>> storageID=DS-2038715921-9.50.102.80-50010-1366091297051, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/9.50.102.80:58371remote=/
>>> 9.50.102.79:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
>>>         at java.io.DataInputStream.readShort(DataInputStream.java:306)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:359)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:112)
>>>         at java.lang.Thread.run(Thread.java:738)
>>> 2013-04-17 11:13:44,818 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_8413378381769505032_1477 src: /9.50.102.81:35279 dest: /
>>> 9.50.102.80:50010
>>>
>>>
>>> 2. tasktracker log:
>>> 2013-04-23 11:48:26,783 INFO org.apache.hadoop.mapred.UserLogCleaner:
>>> Deleting user log path job_201304152248_0011
>>> 2013-04-30 14:48:15,506 ERROR org.apache.hadoop.mapred.TaskTracker:
>>> Caught exception: java.io.IOException: Call to node1/9.50.102.81:9001failed 
>>> on local exception: java.io.IOException: Connection reset by peer
>>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:1144)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1112)
>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>>>         at org.apache.hadoop.mapred.$Proxy2.heartbeat(Unknown Source)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:2008)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1802)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2654)
>>>         at
>>> org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3909)
>>> Caused by: java.io.IOException: Connection reset by peer
>>>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>>>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:33)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:210)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:183)
>>>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:257)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>         at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>         at java.io.FilterInputStream.read(FilterInputStream.java:127)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:361)
>>>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
>>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:248)
>>>         at java.io.DataInputStream.readInt(DataInputStream.java:381)
>>>         at
>>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:841)
>>>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:786)
>>>
>>> 2013-04-30 14:48:15,517 INFO org.apache.hadoop.mapred.TaskTracker:
>>> Resending 'status' to 'node1' with reponseId '-12904
>>> 2013-04-30 14:48:16,404 INFO org.apache.hadoop.mapred.TaskTracker:
>>> SHUTDOWN_MSG:
>>>
>>>
>>>
>>> 2013/5/13 Rishi Yadav <[email protected]>
>>>
>>>> do you get any error when trying to connect to cluster, something like
>>>> 'tried n times' or replicated 0 times.
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 7:28 PM, sam liu <[email protected]>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I setup a cluster with 3 nodes, and after that I did not submit any
>>>>> job on it. But, after few days, I found the cluster is unhealthy:
>>>>> - No result returned after issuing command 'hadoop dfs -ls /' or
>>>>> 'hadoop dfsadmin -report' for a while
>>>>> - The page of 'http://namenode:50070' could not be opened as
>>>>> expected...
>>>>> - ...
>>>>>
>>>>> I did not find any usefull info in the logs, but found the avaible
>>>>> memory of the cluster nodes are very low at that time:
>>>>> - node1(NN,JT,DN,TT): 158 mb mem is available
>>>>> - node2(DN,TT): 75 mb mem is available
>>>>> - node3(DN,TT): 174 mb mem is available
>>>>>
>>>>> I guess the issue of my cluster is caused by lacking of memeory, and
>>>>> my questions are:
>>>>> - Without running jobs, what's the minimum memory requirements to
>>>>> datanode and namenode?
>>>>> - How to define the minimum memeory for datanode and namenode?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Sam Liu
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
Nitin Pawar

Reply via email to