Re: problem with completion notification from block movement

2009-02-03 Thread Karl Kleinpaste
On Mon, 2009-02-02 at 20:06 -0800, jason hadoop wrote:
 This can be made significantly worse by your underlying host file
 system and the disks that support it.

Oh, yes, we know...  It was a late-realized mistake just yesterday that
we weren't using noatime on that cluster's slaves.

The attached graph is instructive.  We have our nightly-rotated logs for
DataNode all the way back to when this test cluster was created in
November.  This morning on one node, I sampled the first 10 BlockReport
scan lines from each day's log, up through the current hour today, and
handed it to gnuplot to graph.  The seriously erratic behavior that
begins around the 900K-1M point is very disturbing.

Immediate solutions for us include noatime, nodiratime, BIOS upgrade on
the discs, and eliminating enough small files (blocks) in DFS to get the
total count below 400K.


Re: problem with completion notification from block movement

2009-02-02 Thread Karl Kleinpaste
On Sun, 2009-02-01 at 17:58 -0800, jason hadoop wrote:
 The Datanode's use multiple threads with locking and one of the
 assumptions is that the block report (1ce per hour by default) takes
 little time. The datanode will pause while the block report is running
 and if it happens to take a while weird things start to happen.

Thank you for responding, this is very informative for us.

Having looked through the source code with a co-worker regarding
periodic scan and then checking the logs once again, we find that we are
finding reports of this sort:

BlockReport of 1158499 blocks got processed in 308860 msecs
BlockReport of 1159840 blocks got processed in 237925 msecs
BlockReport of 1161274 blocks got processed in 177853 msecs
BlockReport of 1162408 blocks got processed in 285094 msecs
BlockReport of 1164194 blocks got processed in 184478 msecs
BlockReport of 1165673 blocks got processed in 226401 msecs

The 3rd of these exactly straddles the particular example timeline I
discussed in my original email about this question.  I suspect I'll find
more of the same as I look through other related errors.

--karl



problem with completion notification from block movement

2009-01-30 Thread Karl Kleinpaste
We have a small test cluster, a double master (NameNode+JobTracker) plus
2 slaves, running 0.18.1.  We are seeing an intermittent problem where
our application logs failures out of DFSClient, thus:

2009-01-30 01:59:42,072 WARN org.apache.hadoop.dfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_7603130349014268849_2349933java.net.SocketTimeoutException: 66000
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.0.10.102:54700
remote=/10.0.10.108:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:176)
at java.io.DataInputStream.readLong(DataInputStream.java:380)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$ResponseProcessor.run(DFSClient.java:2044)

(Apologies for paste formatting.  I hate Evolution.)

Our application here is our JobConsole, which is responsible for
taking notifications from an external data-generating application: The
external app scribbles files into DFS and then tells JobConsole about
them.  JobConsole submits jobs to crunch that data in response to the
external app's notifications of data availability.  JobConsole runs on
the master node.

Chasing that block identifier through our JobConsole log plus the
DataNode logs on the slaves, we have an odd timeline, which is this:
01:58:32slave (.108, above): receiving blk from master (.102)
01:58:35other slave (.107): receiving blk from .108
01:58:36.107: received blk
01:58:38.107: terminate PacketResponder
01:59:42JobConsole (.102): 66s t.o. + Error Recovery (above)
01:59:42.107: invoke recoverBlock on that blk
02:01:15.108: received blk + terminate PacketResponder
03:03:24.108: deleting blk, from Linux pathname in DFS storage

What's clear from this is that .108 got the block quickly, because it
was in a position immediately to send a copy to .107, which responded
promptly enough to say that it was in possession.  But .108's DataNode
sat on the block for a full 3 minutes before announcing what appears to
have been ordinary completion and responder termination.  After the
first minute-plus of that long period, JobConsole gave up and did a
recovery operation, which appears to work.  If .108's DataNode sent a
notification when it finally logged completed reception, no doubt there
was nobody listening for it any more.

What's particularly of interest to us is that the NameNode log shows us
that the data being moved is job.jar:

2009-01-30 01:58:32,353 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock: 
/usr/local/rinera/hadoop/hadoop-runtime/0/mapred/system/job_200901291752_3021/job.jar.
 blk_7603130349014268849_2349933

Note block name and timestamp.

Does anyone else have knowledge or history with such glitches?  We've
recently begun seeing a number of problems in communication between task
management processes and DFS that previously had not been seen, and
we're trying to nail down where they're coming from, without success.



Re: question: NameNode hanging on startup as it intends to leave safe mode

2008-12-11 Thread Karl Kleinpaste
On Wed, 2008-12-10 at 11:52 -0800, Konstantin Shvachko wrote:
 This is probably related to HADOOP-4795.

Thanx for the observation and reference.  However, my sense is that the
bug report you reference reflects NameNode going into an infloop spin,
whereas the situation we have faced concerns NameNode being stuck/hung,
as though in a resource embrace error: Using no CPU, reacting to no
outside events other than to accept incoming connections without
proceeding to handle them.

Can your referenced bug manifest itself as a NameNode hang as well as a
spin?



question: NameNode hanging on startup as it intends to leave safe mode

2008-12-10 Thread Karl Kleinpaste
We have a cluster comprised of 21 nodes holding a total capacity of
about 55T where we have had a problem twice in the last couple weeks on
startup of NameNode.  We are running 0.18.1.  DFS space is currently
just below the halfway point of actual occupation, about 25T.

Symptom is that there is normal startup logging on NameNode's part,
where it self-analyzes its expected DFS content, reports #files known,
and begins to accept reports from slaves' DataNodes about blocks they
hold.  During this time, NameNode is in safe mode pending adequate block
discovery from slaves.  As the fraction of reported blocks rises,
eventually it hits the required 0.9990 threshold and announces that it
will leave safe mode in 30 seconds.

The problem occurs when, at the point of logging 0 seconds to leave
safe mode, NameNode hangs: It uses no more CPU; it logs nothing
further; it stops responding on its port 50070 web interface; hadoop
fs commands report no contact with NameNode; netstat -atp shows a
number of open connections on 9000 and 50070, indicating the connections
are being accepted, but NameNode never processes them.

This has happened twice in the last 2 weeks and it has us fairly
concerned.  Both times, it has been adequate simply to start over again,
and NameNode successfully comes to life the 2nd time around.  Is anyone
else familiar with this sort of hang, and do you know of any solutions?