Hadoop IO performance, prefetch etc

2009-02-04 Thread Songting Chen
Hi, Most of our map jobs are IO bound. However, for the same node, the IO throughput during the map phase is only 20% of its real sequential IO capability (we tested the sequential IO throughput by iozone) I think the reason is that while each map has a sequential IO request, since there

issue map/reduce job to linux hadoop cluster from MS Windows, Eclipse

2008-12-13 Thread Songting Chen
Is it possible to do that? I can access files at HDFS by specifying the URI below. FileSystem fileSys = FileSystem.get(new URI(hdfs://server:9000), conf); But I don't know how to do that for JobConf. Thanks, -Songting

HDFS bytes read counter

2008-12-09 Thread Songting Chen
I am reading the performance counter and having the following question: Map Input Bytes = 6.8G, while HDFS Bytes Read = 10G. What are the additional 3.2G? Thanks -Songting

End of block/file for Map

2008-12-09 Thread Songting Chen
Is there a way for the Map process to know it's the end of records? I need to flush some additional data at the end of the Map process, but wondering where I should put that code. Thanks, -Songting

Re: slow shuffle

2008-12-06 Thread Songting Chen
, at 2:43 PM, Songting Chen wrote: To summarize the slow shuffle issue: 1. I think one problem is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? http://issues.apache.org/jira/browse/HADOOP-3136

slow shuffle

2008-12-05 Thread Songting Chen
We encountered a bottleneck during the shuffle phase. However, there is not much data to be shuffled across the network at all - total less than 10MBytes (the combiner aggregated most of the data). Are there any parameters or anything we can tune to improve the shuffle performance? Thanks,

Re: slow shuffle

2008-12-05 Thread Songting Chen
this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered a bottleneck during the shuffle phase. However

Re: slow shuffle

2008-12-05 Thread Songting Chen
map outputs in memory must consume less than this threshold before the reduce can begin. /description /property How long did the shuffle take relative to the rest of the job? Alex On Fri, Dec 5, 2008 at 11:17 AM, Songting Chen [EMAIL PROTECTED]wrote: We encountered

Re: slow shuffle

2008-12-05 Thread Songting Chen
@hadoop.apache.org Date: Friday, December 5, 2008, 12:28 PM How many reduce tasks do you have? Look into increasing mapred.reduce.parallel.copies from the default of 5 to something more like 20 or 30. - Aaron On Fri, Dec 5, 2008 at 10:00 PM, Songting Chen [EMAIL PROTECTED]wrote: A little

Re: slow shuffle

2008-12-05 Thread Songting Chen
I think one of the issues is that the Reducer starts very late in the process, slowing the entire job significantly. Is there a way to let reducer start earlier? --- On Fri, 12/5/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: Re: slow shuffle

Re: slow shuffle

2008-12-05 Thread Songting Chen
puzzles me what's behind the scene. (note that sorting takes 1 sec) Thanks, -Songting --- On Fri, 12/5/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: Re: slow shuffle To: core-user@hadoop.apache.org Date: Friday, December 5, 2008, 1:27 PM

Issues with V0.19 upgrade

2008-12-03 Thread Songting Chen
1. The namenode webpage shows: Upgrades: Upgrade for version -18 has been completed. Upgrade is not finalized. 2. SequenceFile.Writer failed when trying to creating a new file with the following error: (we have two HaDoop clusters, both have issue 1; one has issue 2, but the other is

namenode failure

2008-10-27 Thread Songting Chen
Hi, I modified the classpath in hadoop-env.sh in namenode and datanodes before shutting down the cluster. Then problem appears: I cannot stop hadoop cluster at all. The stop-all.sh shows no datanode/namenode, while all the java processes are running. So I manually killed the java process.

Update Fw: namenode failure

2008-10-27 Thread Songting Chen
to go away. So basically my problem was fixed - just hope my experience may help find some potential bugs. Thanks, -Songting --- On Mon, 10/27/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: namenode failure To: core-user@hadoop.apache.org Date

RE: LZO and native hadoop libraries

2008-10-10 Thread Songting Chen
It seems that I encountered a similar problem: Zlib , lzo installed. Running ant -Dcompile.native=true gave the following error. [exec] /server/hadoop-0.18.1/src/native/src/org/apache/hadoop/io/compress/lzo/LzoCompressor.c: In function

Re: How to make LZO work?

2008-10-10 Thread Songting Chen
@hadoop.apache.org Date: Friday, October 10, 2008, 7:44 AM On 10/9/08 6:46 PM, Songting Chen [EMAIL PROTECTED] wrote: Does that mean I have to rebuild the native library? Also, the LZO installation puts liblzo2.a and liblzo2.la under /usr/local/lib. There is no liblzo2.so there. Do I need to rename

Re: How to make LZO work?

2008-10-10 Thread Songting Chen
I switched to lzo-2.02 package. This time liblzo2.so was built. Now everything worked. Thanks, -Songting --- On Fri, 10/10/08, Songting Chen [EMAIL PROTECTED] wrote: From: Songting Chen [EMAIL PROTECTED] Subject: Re: How to make LZO work? To: core-user@hadoop.apache.org Date: Friday

How to make LZO work?

2008-10-09 Thread Songting Chen
Hi, I have installed lzo-2.03 to my Linux box. But still my code for writing a SequenceFile using LZOcodec returns the following error: util.NativeCodeLoader: Loaded the native-hadoop library java.lang.UnsatisfiedLinkError: Cannot load liblzo2.so! What needs to be done to make this

Re: How to make LZO work?

2008-10-09 Thread Songting Chen
: Arun C Murthy [EMAIL PROTECTED] Subject: Re: How to make LZO work? To: core-user@hadoop.apache.org Date: Thursday, October 9, 2008, 6:35 PM On Oct 9, 2008, at 5:58 PM, Songting Chen wrote: Hi, I have installed lzo-2.03 to my Linux box. But still my code for writing a SequenceFile using

compressed files on HFDS

2008-09-25 Thread Songting Chen
datanodes, that could introduce significant network transfer cost. Any ideas of that? Thanks, -Songting Chen