128MB
block, and then tries to
replicate it to all 3 replicas ? or is every byte immediately copied to 3
replicas?
Thanks
Yang
ifferent, is the FB AvatarNode feature is open-sourced ?
if so, is it a better or worse solution than the current
BackUpNode/CheckpointNode in apache distribution?
Thanks
yang
-Xmx since my box is ultimately bounded in memory
capacity)
thanks
Yang
as shuffle
> buffer.
>
> On Thu, May 10, 2012 at 2:50 AM, Yang wrote:
>
>> it seems that if I put too many records into the same mapper output
>> key, all these records are grouped into one key one one reducer,
>>
>> then the reducer became out of memory.
>>
vided values/keys in memory in your
> implementation, it can easily cause an OOME if not handled properly.
> The reducer by itself does read the values off a sorted file on the
> disk and doesn't cache the whole group in memory.
>
> On Thu, May 10, 2012 at 12:20 AM, Yang wrote:
&g
d basically fed
empty input to 2 of the mappers
Thanks
Yang
On Wed, Jul 11, 2012 at 10:00 PM, Harsh J wrote:
> Yang,
>
> No, those three are individual task attempts.
>
> This is how you may generally dissect an attempt ID when reading it:
>
> attempt_201207111710_0024_m_0
TT, and multiple mappers are getting run at the same time, all
> trying to bind to the same port. Limit your TT's max map tasks to 1
> when you're relying on such techniques to debug, or use the
> LocalJobRunner/Apache MRUnit instead.
>
> On Thu, Jul 12, 2012 at 9:16 AM, Yang
simple pig test
Yang
On Wed, Jul 11, 2012 at 10:15 PM, Harsh J wrote:
> Er, sorry I meant mapred.map.tasks = 1
>
> On Thu, Jul 12, 2012 at 10:44 AM, Harsh J wrote:
> > Try passing mapred.map.tasks = 0 or set a higher min-split size?
> >t
> > On Thu, Jul 12, 20
I had also encuntered the smae problem a few days ago.
any one has another method?
2011/2/24 maha
> Based on the Java function documentation, it gives approximately the
> available memory, so I need to tweak it with other functions.
> So it's a Java issue not Hadoop.
>
> Thanks anyways,
> Maha
Thanks a lot!
Yang Xiaoliang
2011/2/25 maha
> Hi Yang,
>
> The problem could be solved using the following link:
> http://www.roseindia.net/java/java-get-example/get-memory-usage.shtml
> You need to use other memory managers like the Garbage collector and its
> finalize
Chukwa is waiting on a official release of Hadoop and HBase which
works together. In Chukwa trunk, Chukwa is using HBase as data
storage, and using pig+hbase for data analytics. Unfortunately,
Hadoop security release branch and Hadoop trunk are both broken for
HBase. Hence, Chukwa is in hibernat
primarily used for analytics or log aggregation. I thought it
was the latter but it seems more and more its like the former.
On 3/21/11 8:27 AM, Eric Yang wrote:
> Chukwa is waiting on a official release of Hadoop and HBase which
> works together. In Chukwa trunk, Chukwa is using HBase a
Hi all,
I am benchmarking a Hadoop Cluster with the hadoop-*-test.jar TestDFSIO
but the following error returns:
File /usr/hadoop-0.20.2/libhdfs/libhdfs.so.1 does not exist.
How to solve this problem?
Thanks!
hello,
Subscribe to List
thx
Install hadoop on your local machine, copy the configuration files from the
remote
hadoop culuster server to your local machine(including the hosts file), then
you can
just submit a *.jar locally as before.
2011/10/5 oleksiy
>
> Hello,
>
> I'm trying to find a way how to run hadoop MapReduce ap
Hi,
Hadoop neither read one line each time, nor fetching dfs.block.size of lines
into a buffer,
Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text
into a buffer each time,
this can be seen from the hadoop source file LineReader.java
2011/10/5 Mark question
> Hello,
>
Hi,
Currently, I'm trying to rewrite an algorithm into a parallel form. Since
the algorithm depends on lots of third-party DLLs, I was wondering would I
call the DLL written in C++ in the Hadoop-version MapReduce by using JNI?
Thanks.
--
YANG, Lin
Hi,
Currently, I'm trying to rewrite an algorithm into a parallel form. Since
the algorithm depends on lots of third-party DLLs, I was wondering would I
call the DLL written in C++ in the Hadoop-version MapReduce by using JNI?
Thanks.
--
YANG, Lin
Write a Java program which will dump data from mysql cluster and save them
into HDFS at the same time.
Run it on namenode. I assume namenode should be able to connect to mysql
gateway.
Will it work?
On Thu, Aug 6, 2009 at 12:02 PM, Min Zhou wrote:
> Hi Aaron,
>
> We couldnot run mysqldump on the
Dear all
I'm sorry to disturb you.
Our cluster has 200 nodes now. In order to improve its ability, we hope
to add 60 nodes into the current cluster. However, we all don't know what
will happen if we add so many nodes at the same time. Could you give me some
tips and notes? During the proces
u should rebalance the storage to avoid age
> related surprises in how files are arranged in your cluster.
>
> Other than that, your addition should cause little in the way of surprises.
>
> On Tue, Aug 11, 2009 at 11:00 PM, yang song
> wrote:
>
> > Dear all
> >
Dear all
I'm sorry to disturb you.
Our cluster has 200 nodes now. In order to improve its ability, we hope
to add 60 nodes into the current cluster. However, we all don't know what
will happen if we add so many nodes at the same time. Could you give me some
tips and notes? During the proces
Dear all
I'm sorry to disturb you.
Our cluster has 200 nodes now. In order to improve its ability, we hope
to add 60 nodes into the current cluster. However, we all don't know what
will happen if we add so many nodes at the same time. Could you give me some
tips and notes? During the proces
Hi, all
When I add another 50 nodes into the current cluster(200 nodes) at the
same time, the jobs run very smoothly at first. However, after a while, all
the jobs are suspended and never continue.
I have no idea but to remove the new nodes. And the jobs run smoothly
again. Now I have to ad
The situation is I can't find any unusual thing from the logs.
Maybe there is a lot of data to transfer since so many new nodes and the
jobs are waiting for it
2009/8/17 Ted Dunning
> Have you looked at the logs?
>
> On Sun, Aug 16, 2009 at 11:36 PM, yang song
> wro
Hello, all
I have met the problem "too many fetch failures" when I submit a big
job(e.g. tasks>1). And I know this error occurs when several reducers
are unable to fetch the given map output. However, I'm sure slaves can
contact each other.
I feel puzzled and have no idea to deal with i
Hadoop streaming is the utility allows you to create and run Map/Reduce jobs
with any executable or script as the mapper and/or the reducer. I'm not
familiar with it, but I think you can find something useful here
http://hadoop.apache.org/common/docs/current/streaming.html
2009/8/19 Poole, Samuel
I'm sorry, the version is 0.19.1
2009/8/19 Ted Dunning
> Which version of hadoop are you running?
>
> On Tue, Aug 18, 2009 at 10:23 PM, yang song
> wrote:
>
> > Hello, all
> >I have met the problem "too many fetch failures" when I submit a big
Hello, everybody
I feel puzzled about setting properties in hadoop-site.xml.
Suppose I submit the job from machine A, and JobTracker runs on machine
B. So there are two hadoop-site.xml files. Now, I increase
"mapred.reduce.parallel.copies"(e.g. 10) on machine B since I want to make
copy phr
ith running
> jobs
> without issue - when the machines were correctly configured for the
> cluster,
> so this is known to work at least in the 0.18 release series (when I was
> doing this operation).
>
> On Mon, Aug 17, 2009 at 6:56 AM, yang song
> wrote:
>
> > The s
licitly put in your code,
> are drawn from the hadoop-site.xml file on the machine where the job is
> submitted from.
>
> In general, I strongly recommend you save yourself some pain by keeping
> your
> configuration files as identical as possible :)
> Good luck,
> - Aaron
>
er using an updated 19 or moving to 20 as well.
>
> On Wed, Aug 19, 2009 at 5:19 AM, yang song
> wrote:
>
> > I'm sorry, the version is 0.19.1
> >
> >
>
Thank you very much! I'm clear about it now.
2009/8/20 Aaron Kimball
> On Wed, Aug 19, 2009 at 8:39 PM, yang song
> wrote:
>
> >Thank you, Aaron. I've benefited a lot. "per-node" means some settings
> > associated with the node. e.g., "fs.defa
Hello, everyone
When I submit a big job(e.g. maptasks:1, reducetasks:500), I find that
the copy phrase will last for a long long time. From WebUI, the message
"reduce > copy ( of 1 at 0.01 MB/s) >" tells me the transfer speed
is just 0.01 MB/s. Does it a regular value? How can I solve
e, it seems to be a JVM issue on windows. Hope this helps.
Best regards,
---
Li Yang, ext. 22056, User Technologies Development, Shanghai, China
Yura Taras
2010-01-28 00:41
Please respond to
common-user@hadoop.apache.org
To
common-user@hadoop.apache.org
cc
Subject
Fail
ed on my laptop.
Best regards,
---
Li Yang, ext. 22056, User Technologies Development, Shanghai, China
Carlos Eduardo Moreira dos Santos
Sent by: cem...@gmail.com
2010-05-02 12:42
Please respond to
common-user@hadoop.apache.org
To
common-user
cc
Subject
java.io.FileNo
e can run wordcount example at least, like `hadoop/bin/hadoop jar
hadoop/hadoop-0.20.2-examples.jar wordcount input output`. I never tried
a distributed cluster on windows, you'll definitely go linux if decided to
do something more serious.
Best regards,
---
Li Yang,
quot;conf/secondarynamenode" & list machine name in it.
Best,
Xiujin Yang.
> Date: Wed, 18 Aug 2010 13:08:03 +0530
> From: adarsh.sha...@orkash.com
> To: core-u...@hadoop.apache.org
> Subject: Configure Secondary Namenode
>
> I am not able to find any command or paramet
Hi Shangan,
Please check your /etc/hosts, if all machines are setted.
Best,
Yang.
> Date: Wed, 18 Aug 2010 15:01:46 +0800
> From: shan...@corp.kaixin001.com
> To: common-user@hadoop.apache.org
> Subject: mapreduce doesn't work in my cluster
>
> my cluster consists
300
The number of seconds between two periodic checkpoints.
[shan...@vm153 conf]$ more hdfs-site.xml
dfs.replication
2
dfs.hosts.exclude
/home/shangan/bin/hadoop-0.20.2/conf/exclude
Best,
Xiujin Yang
> From: akasha...@gmail.com
> Date: Wed, 18 Aug 2010 19:30:34
condary.http.address,dfs.datanode.address, the ip of which is
0.0.0.0,do I need to change them ?
No, default will be ok.
Best,
Yang.
> Date: Wed, 18 Aug 2010 17:16:42 +0800
> From: shan...@corp.kaixin001.com
> To: common-user@hadoop.apache.org
> Subject: Re: RE: mapreduce doesn
Hi
For mapred it is easy to realize the first job's output to be second job's
input.
You just need to point out the path will be ok.
Xiujinyang
> Date: Thu, 19 Aug 2010 19:11:53 +0200
> From: teodor.maci...@epfl.ch
> To: common-user@hadoop.apache.org
> Subject: Re: HDFS efficiently conca
For 0.20.2, (Which version do you use?)
The UI part only have show function, It can't be admin. (Kill, stop, or so)
Except the fair scheduler can be controlled.
You need to dev module based on Hadoop to realize User identity.
It's not very difficult to add this function.
X
Hi All,
Under Hadoop 0.20.2, according to map introduction,
If you want to use multi-threaded in Map, you can override the run method of
Mapper.
Is anyone successful in using it? And who can give an example of using it.
Thank you.
Hi Amareshwari,
Thank you for your great help.
I will check the source in 0.21 or trunk.
Best
Xiujin Yang.
> From: amar...@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Fri, 27 Aug 2010 13:50:50 +0530
> Subject: Re: multi-thread problem in map
>
> You
> Date: Mon, 30 Aug 2010 10:49:50 +0800
> From: lgpub...@yahoo.com.cn
> Subject: cluster startup problem
> To: common-user@hadoop.apache.org
>
> Hi all,
> I am trying to configure and start a hadoop cluster on EC2. I got some
> problems
> here.
>
>
> 1. Can I share hadoop code and its config
Best,
Xiujin Yang
009090239_0693_m_000
001_0# ll
total 28
-rw-r--r-- 1 root root83 Oct 11 04:07 log.index
-rw-r--r-- 1 root root 0 Oct 11 04:06 stderr
-rw-r--r-- 1 root root 22440 Oct 11 04:07 stdout
Could someone tell me what's the matter?
Thank you in advance.
Best
Xiujin Yang.
I searched the MultipleOutputs class in google and found a 0.21 API
documentation page that describes the class in the new version of hadoop.
But the downloaded jar file doesn't support this class. There are also a few
errors in the example on MultipleOutputs API document page.
49 matches
Mail list logo