During map reduce, hadoop creates a number of temporary files. These
include the output of maps, and any dumps that the sort/merge algorithm
has to do. All these are written to local fs. Only final outputs are
written to hdfs. That's why you're seeing so much more local io.
Alan.
Haijun Cao wrote:
I am getting worried on the huge number of bytes written to local fs. I
have a 2 machine cluster, one has 100% io util, one has 10-20% io util
during map phase, the input data is replicated on both machines
(replication = 2). So I suspect the extra 80-90% io on the first machine
is caused by read/write to local fs.
Which machine and which directory does this "local fs" refer to? So that
I can check myself if it is the culprit.
Thanks.
Haijun
-----Original Message-----
From: Haijun Cao [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 04, 2008 10:44 PM
To: [EMAIL PROTECTED]
Subject: local bytes read/written
Hi,
I just started using pig, it is really fun to write pig query.
I noticed in the map reduce job page, it reports bytes read/written
from/to local file system, and the number is 2x, 3x of the bytes
read/write to hadoop. Just want to understand the internal working of
pig a little bit better, what operations read/write to local fs? For
what purpose? Is it to the local fs of the data nodes? which directory?
Thanks
Haijun