question on write failure and commit in HDFS

2011-07-02 Thread Yang
128MB block, and then tries to replicate it to all 3 replicas ? or is every byte immediately copied to 3 replicas? Thanks Yang

AvatarNode by FB?

2011-07-18 Thread Yang
ifferent, is the FB AvatarNode feature is open-sourced ? if so, is it a better or worse solution than the current BackUpNode/CheckpointNode in apache distribution? Thanks yang

reducer out of memory?

2012-05-09 Thread Yang
-Xmx since my box is ultimately bounded in memory capacity) thanks Yang

Re: reducer out of memory?

2012-05-10 Thread Yang
as shuffle > buffer. > > On Thu, May 10, 2012 at 2:50 AM, Yang wrote: > >> it seems that if I put too many records into the same mapper output >> key, all these records are grouped into one key one one reducer, >> >> then the reducer became out of memory. >>

Re: reducer out of memory?

2012-05-10 Thread Yang
vided values/keys in memory in your > implementation, it can easily cause an OOME if not handled properly. > The reducer by itself does read the values off a sorted file on the > disk and doesn't cache the whole group in memory. > > On Thu, May 10, 2012 at 12:20 AM, Yang wrote: &g

Re: can't disable speculative execution?

2012-07-11 Thread Yang
d basically fed empty input to 2 of the mappers Thanks Yang On Wed, Jul 11, 2012 at 10:00 PM, Harsh J wrote: > Yang, > > No, those three are individual task attempts. > > This is how you may generally dissect an attempt ID when reading it: > > attempt_201207111710_0024_m_0

Re: can't disable speculative execution?

2012-07-11 Thread Yang
TT, and multiple mappers are getting run at the same time, all > trying to bind to the same port. Limit your TT's max map tasks to 1 > when you're relying on such techniques to debug, or use the > LocalJobRunner/Apache MRUnit instead. > > On Thu, Jul 12, 2012 at 9:16 AM, Yang

Re: can't disable speculative execution?

2012-07-11 Thread Yang
simple pig test Yang On Wed, Jul 11, 2012 at 10:15 PM, Harsh J wrote: > Er, sorry I meant mapred.map.tasks = 1 > > On Thu, Jul 12, 2012 at 10:44 AM, Harsh J wrote: > > Try passing mapred.map.tasks = 0 or set a higher min-split size? > >t > > On Thu, Jul 12, 20

Re: Current available Memory

2011-02-23 Thread Yang Xiaoliang
I had also encuntered the smae problem a few days ago. any one has another method? 2011/2/24 maha > Based on the Java function documentation, it gives approximately the > available memory, so I need to tweak it with other functions. > So it's a Java issue not Hadoop. > > Thanks anyways, > Maha

Re: Current available Memory

2011-02-24 Thread Yang Xiaoliang
Thanks a lot! Yang Xiaoliang 2011/2/25 maha > Hi Yang, > > The problem could be solved using the following link: > http://www.roseindia.net/java/java-get-example/get-memory-usage.shtml > You need to use other memory managers like the Garbage collector and its > finalize

Re: Chukwa?

2011-03-21 Thread Eric Yang
Chukwa is waiting on a official release of Hadoop and HBase which works together. In Chukwa trunk, Chukwa is using HBase as data storage, and using pig+hbase for data analytics. Unfortunately, Hadoop security release branch and Hadoop trunk are both broken for HBase. Hence, Chukwa is in hibernat

Re: Chukwa?

2011-03-21 Thread Eric Yang
primarily used for analytics or log aggregation. I thought it was the latter but it seems more and more its like the former. On 3/21/11 8:27 AM, Eric Yang wrote: > Chukwa is waiting on a official release of Hadoop and HBase which > works together. In Chukwa trunk, Chukwa is using HBase a

TestDFSIO error: libhdfs.so.1 does not exist

2011-07-28 Thread Yang Xiaoliang
Hi all, I am benchmarking a Hadoop Cluster with the hadoop-*-test.jar TestDFSIO but the following error returns: File /usr/hadoop-0.20.2/libhdfs/libhdfs.so.1 does not exist. How to solve this problem? Thanks!

Subscribe to List

2011-09-27 Thread Yu Yang
hello, Subscribe to List thx

Re: Run hadoop Map/Reduce app from another machine

2011-10-05 Thread Yang Xiaoliang
Install hadoop on your local machine, copy the configuration files from the remote hadoop culuster server to your local machine(including the hosts file), then you can just submit a *.jar locally as before. 2011/10/5 oleksiy > > Hello, > > I'm trying to find a way how to run hadoop MapReduce ap

Re: hadoop input buffer size

2011-10-05 Thread Yang Xiaoliang
Hi, Hadoop neither read one line each time, nor fetching dfs.block.size of lines into a buffer, Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text into a buffer each time, this can be seen from the hadoop source file LineReader.java 2011/10/5 Mark question > Hello, >

Would I call a DLL written in C++ in the Hadoop-version MapReduce?

2012-05-23 Thread jason Yang
Hi, Currently, I'm trying to rewrite an algorithm into a parallel form. Since the algorithm depends on lots of third-party DLLs, I was wondering would I call the DLL written in C++ in the Hadoop-version MapReduce by using JNI? Thanks. -- YANG, Lin

Would I call DLL in the hadoop-version MR?

2012-05-23 Thread jason Yang
Hi, Currently, I'm trying to rewrite an algorithm into a parallel form. Since the algorithm depends on lots of third-party DLLs, I was wondering would I call the DLL written in C++ in the Hadoop-version MapReduce by using JNI? Thanks. -- YANG, Lin

Re: how to dump data from a mysql cluster to hdfs?

2009-08-05 Thread Yang Zhou
Write a Java program which will dump data from mysql cluster and save them into HDFS at the same time. Run it on namenode. I assume namenode should be able to connect to mysql gateway. Will it work? On Thu, Aug 6, 2009 at 12:02 PM, Min Zhou wrote: > Hi Aaron, > > We couldnot run mysqldump on the

What will we encounter if we add a lot of nodes into the current cluster?

2009-08-11 Thread yang song
Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the proces

Re: What will we encounter if we add a lot of nodes into the current cluster?

2009-08-12 Thread yang song
u should rebalance the storage to avoid age > related surprises in how files are arranged in your cluster. > > Other than that, your addition should cause little in the way of surprises. > > On Tue, Aug 11, 2009 at 11:00 PM, yang song > wrote: > > > Dear all > >

What will we encounter if we add a lot of nodes into the current cluster?

2009-08-13 Thread yang song
Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the proces

What will we encounter if we add a lot of nodes into the current cluster?

2009-08-13 Thread yang song
Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the proces

Why the jobs are suspended when I add new nodes?

2009-08-16 Thread yang song
Hi, all When I add another 50 nodes into the current cluster(200 nodes) at the same time, the jobs run very smoothly at first. However, after a while, all the jobs are suspended and never continue. I have no idea but to remove the new nodes. And the jobs run smoothly again. Now I have to ad

Re: Why the jobs are suspended when I add new nodes?

2009-08-17 Thread yang song
The situation is I can't find any unusual thing from the logs. Maybe there is a lot of data to transfer since so many new nodes and the jobs are waiting for it 2009/8/17 Ted Dunning > Have you looked at the logs? > > On Sun, Aug 16, 2009 at 11:36 PM, yang song > wro

How to deal with "too many fetch failures"?

2009-08-18 Thread yang song
Hello, all I have met the problem "too many fetch failures" when I submit a big job(e.g. tasks>1). And I know this error occurs when several reducers are unable to fetch the given map output. However, I'm sure slaves can contact each other. I feel puzzled and have no idea to deal with i

Re: Hadoop for Independant Tasks not using Map/Reduce?

2009-08-18 Thread yang song
Hadoop streaming is the utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. I'm not familiar with it, but I think you can find something useful here http://hadoop.apache.org/common/docs/current/streaming.html 2009/8/19 Poole, Samuel

Re: How to deal with "too many fetch failures"?

2009-08-19 Thread yang song
I'm sorry, the version is 0.19.1 2009/8/19 Ted Dunning > Which version of hadoop are you running? > > On Tue, Aug 18, 2009 at 10:23 PM, yang song > wrote: > > > Hello, all > >I have met the problem "too many fetch failures" when I submit a big

How does hadoop deal with hadoop-site.xml?

2009-08-19 Thread yang song
Hello, everybody I feel puzzled about setting properties in hadoop-site.xml. Suppose I submit the job from machine A, and JobTracker runs on machine B. So there are two hadoop-site.xml files. Now, I increase "mapred.reduce.parallel.copies"(e.g. 10) on machine B since I want to make copy phr

Re: Why the jobs are suspended when I add new nodes?

2009-08-19 Thread yang song
ith running > jobs > without issue - when the machines were correctly configured for the > cluster, > so this is known to work at least in the 0.18 release series (when I was > doing this operation). > > On Mon, Aug 17, 2009 at 6:56 AM, yang song > wrote: > > > The s

Re: How does hadoop deal with hadoop-site.xml?

2009-08-19 Thread yang song
licitly put in your code, > are drawn from the hadoop-site.xml file on the machine where the job is > submitted from. > > In general, I strongly recommend you save yourself some pain by keeping > your > configuration files as identical as possible :) > Good luck, > - Aaron >

Re: How to deal with "too many fetch failures"?

2009-08-19 Thread yang song
er using an updated 19 or moving to 20 as well. > > On Wed, Aug 19, 2009 at 5:19 AM, yang song > wrote: > > > I'm sorry, the version is 0.19.1 > > > > >

Re: How does hadoop deal with hadoop-site.xml?

2009-08-20 Thread yang song
Thank you very much! I'm clear about it now. 2009/8/20 Aaron Kimball > On Wed, Aug 19, 2009 at 8:39 PM, yang song > wrote: > > >Thank you, Aaron. I've benefited a lot. "per-node" means some settings > > associated with the node. e.g., "fs.defa

How to speed up the copy phrase?

2009-08-23 Thread yang song
Hello, everyone When I submit a big job(e.g. maptasks:1, reducetasks:500), I find that the copy phrase will last for a long long time. From WebUI, the message "reduce > copy ( of 1 at 0.01 MB/s) >" tells me the transfer speed is just 0.01 MB/s. Does it a regular value? How can I solve

Re: Failed to install Hadoop on WinXP

2010-01-28 Thread Yang Li
e, it seems to be a JVM issue on windows. Hope this helps. Best regards, --- Li Yang, ext. 22056, User Technologies Development, Shanghai, China Yura Taras 2010-01-28 00:41 Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org cc Subject Fail

Re: java.io.FileNotFoundException

2010-05-09 Thread Yang Li
ed on my laptop. Best regards, --- Li Yang, ext. 22056, User Technologies Development, Shanghai, China Carlos Eduardo Moreira dos Santos Sent by: cem...@gmail.com 2010-05-02 12:42 Please respond to common-user@hadoop.apache.org To common-user cc Subject java.io.FileNo

Re: java.io.FileNotFoundException

2010-05-09 Thread Yang Li
e can run wordcount example at least, like `hadoop/bin/hadoop jar hadoop/hadoop-0.20.2-examples.jar wordcount input output`. I never tried a distributed cluster on windows, you'll definitely go linux if decided to do something more serious. Best regards, --- Li Yang,

RE: Configure Secondary Namenode

2010-08-18 Thread xiujin yang
quot;conf/secondarynamenode" & list machine name in it. Best, Xiujin Yang. > Date: Wed, 18 Aug 2010 13:08:03 +0530 > From: adarsh.sha...@orkash.com > To: core-u...@hadoop.apache.org > Subject: Configure Secondary Namenode > > I am not able to find any command or paramet

RE: mapreduce doesn't work in my cluster

2010-08-18 Thread xiujin yang
Hi Shangan, Please check your /etc/hosts, if all machines are setted. Best, Yang. > Date: Wed, 18 Aug 2010 15:01:46 +0800 > From: shan...@corp.kaixin001.com > To: common-user@hadoop.apache.org > Subject: mapreduce doesn't work in my cluster > > my cluster consists

RE: mapreduce doesn't work in my cluster

2010-08-18 Thread xiujin yang
300 The number of seconds between two periodic checkpoints. [shan...@vm153 conf]$ more hdfs-site.xml dfs.replication 2 dfs.hosts.exclude /home/shangan/bin/hadoop-0.20.2/conf/exclude Best, Xiujin Yang > From: akasha...@gmail.com > Date: Wed, 18 Aug 2010 19:30:34

RE: mapreduce doesn't work in my cluster

2010-08-18 Thread xiujin yang
condary.http.address,dfs.datanode.address, the ip of which is 0.0.0.0,do I need to change them ? No, default will be ok. Best, Yang. > Date: Wed, 18 Aug 2010 17:16:42 +0800 > From: shan...@corp.kaixin001.com > To: common-user@hadoop.apache.org > Subject: Re: RE: mapreduce doesn

RE: HDFS efficiently concat&split files

2010-08-20 Thread xiujin yang
Hi For mapred it is easy to realize the first job's output to be second job's input. You just need to point out the path will be ok. Xiujinyang > Date: Thu, 19 Aug 2010 19:11:53 +0200 > From: teodor.maci...@epfl.ch > To: common-user@hadoop.apache.org > Subject: Re: HDFS efficiently conca

RE: how to add user identity authentication

2010-08-20 Thread xiujin yang
For 0.20.2, (Which version do you use?) The UI part only have show function, It can't be admin. (Kill, stop, or so) Except the fair scheduler can be controlled. You need to dev module based on Hadoop to realize User identity. It's not very difficult to add this function. X

multi-thread problem in map

2010-08-27 Thread xiujin yang
Hi All, Under Hadoop 0.20.2, according to map introduction, If you want to use multi-threaded in Map, you can override the run method of Mapper. Is anyone successful in using it? And who can give an example of using it. Thank you.

RE: multi-thread problem in map

2010-08-27 Thread xiujin yang
Hi Amareshwari, Thank you for your great help. I will check the source in 0.21 or trunk. Best Xiujin Yang. > From: amar...@yahoo-inc.com > To: common-user@hadoop.apache.org > Date: Fri, 27 Aug 2010 13:50:50 +0530 > Subject: Re: multi-thread problem in map > > You

RE: cluster startup problem

2010-08-29 Thread xiujin yang
> Date: Mon, 30 Aug 2010 10:49:50 +0800 > From: lgpub...@yahoo.com.cn > Subject: cluster startup problem > To: common-user@hadoop.apache.org > > Hi all, > I am trying to configure and start a hadoop cluster on EC2. I got some > problems > here. > > > 1. Can I share hadoop code and its config

Does fair scheduler in Hadoop 0.20.2 support preemption or not?

2010-08-29 Thread xiujin yang
Best, Xiujin Yang

[UI] HTTP ERROR: 410 Failed to retrieve syslog log for task: attempt*

2010-10-11 Thread xiujin yang
009090239_0693_m_000 001_0# ll total 28 -rw-r--r-- 1 root root83 Oct 11 04:07 log.index -rw-r--r-- 1 root root 0 Oct 11 04:06 stderr -rw-r--r-- 1 root root 22440 Oct 11 04:07 stdout Could someone tell me what's the matter? Thank you in advance. Best Xiujin Yang.

found an inconsistent entry in 0.21 API

2011-01-13 Thread Yang Sun
I searched the MultipleOutputs class in google and found a 0.21 API documentation page that describes the class in the new version of hadoop. But the downloaded jar file doesn't support this class. There are also a few errors in the example on MultipleOutputs API document page.