Hey there,
I've set up rack awareness on my hadoop cluster with replication 3. I have 2
racks and each contains 50% of the nodes.
I can see that the blocks are spread on the 2 racks, the problem is that all
nodes from a rack are storing 2 replicas and the nodes of the other rack
just one. If I
Jobs run on the whole cluster. After rebalancing everything is properly
allocated. Then I start running jobs using all the slots of the 2 racks and
the problem starts to happen.
Maybe I'm missing something. When using the rack awareness, do you have to
specify to the jobs to run in slots form both
When you rebalance, the block is fully written, so the writer locality does
not have to be taken into account (there is no writer anymore), hence it
can rebalance across the racks. That's why jobs asymmetry was the easy
guess. What's your hadoop version by the way? I remember a bug around rack
I'm on cdh3u4 (0.20.2), gonna try to read a bit on this bug
--
View this message in context:
http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4086049.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hi Rohit
Did you succeed in running R script from Oozie action?
If so can you share you action configuration?
I am trying to figure out how to run a R script from Oozie
--
View this message in context:
I'm not aware of a bug in 0.20.2 that would not honor the Rack
Awareness, but have you done the two below checks as well?
1. Ensuring JT has the same rack awareness scripts and configuration
so it can use it for scheduling, and,
2. Checking if the map and reduce tasks are being evenly spread
Rack aware is an artificial concept.
Meaning you can define where a node is regardless of is real position in the
rack.
Going from memory, and its probably been changed in later versions of the
code...
Isn't the replication... Copy on node 1, copy on same rack, third copy on
different rack?
For 3 replicas, the replication sequence is: 1st on local node of Writer, 2nd
on remote rack node of 1st replica, 3rd on same rack of 2nd replica.
There could be some special cases like: disk is full on 1st node, or no node
available for 2nd replica rack, and Hadoop already take care it well.
For readability, I haven't posted the code, output etc. in this mail - please
check the thread below :
http://stackoverflow.com/questions/18354664/spring-data-hadoop-connectivity
I'm trying to connect to a remote hadoop(1.1.2) cluster from my local Windows
machine via Spring data(later,
Hello,
Here is the new bie question of the day.
For one of my use cases, I want to use hadoop map reduce without HDFS.
Here, I will have a text file containing a list of file names to process.
Assume that I have 10 lines (10 files to process) in the input text file
and I wish to generate 10 map
hi,all:
Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh , where i can tune JVM options?
hi,all:
i have a big problem!!
i try the two cluster ,one cluster is upgrade from cdh3 to cdh4 ,i
changed hadoop-env.sh ,and restart node ,the heapsize will change,
but on another cluster new installed cdh4.3 i find hadoop-env.sh has no
use,why? how can i change heapsize on the new cluster??
create a new file . Its bug and has been resolved in 2.0.2-alpha
https://issues.apache.org/jira/browse/HADOOP-8287
On Thu, Aug 22, 2013 at 2:44 PM, ch huang justlo...@gmail.com wrote:
hi,all:
i have a big problem!!
i try the two cluster ,one cluster is upgrade from cdh3 to cdh4 ,i
-- Forwarded message --
From: rab ra rab...@gmail.com
Date: 22 Aug 2013 15:14
Subject: Create a file in local file system in map method
To: us...@hadoop.apache.org us...@hadoop.apache.org
Hi
i am not able to create a file in my local file system from my map method.
Is there a way
Can you share what error you run into in trying to write to a local
filesystem location from within a map task?
Note that the map tasks will run as the same user as the TaskTracker
daemon in insecure environments, or as the job submitting user in
secure environments. The location you're writing
If you don't plan to use HDFS, what kind of sharing file system you are going
to use between cluster? NFS?For what you want to do, even though it doesn't
make too much sense, but you need to the first problem as the shared file
system.
Second, if you want to process the files file by file,
Yes
It was permission issue. I could fix this now.
Thanks
On 22 Aug 2013 15:49, Harsh J ha...@cloudera.com wrote:
Can you share what error you run into in trying to write to a local
filesystem location from within a map task?
Note that the map tasks will run as the same user as the
Just because I always appreciate it when someone posts the answer to
their own question:
We have some java that does
BZip2Codec bz2 = new BZip2Codec();
CompressionOutputStream cout = bz2.createOutputStream(out);
for compression.
We just wrote another version that does
friends does anyone know how webhdfs internally works or how it uses jetty
server within hado0p
Pls ask CDH lists. Thanks.
On Aug 22, 2013, at 1:39 AM, ch huang justlo...@gmail.com wrote:
hi,all:
Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh , where i can tune JVM options?
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is
I think you have to use Cloudera Manager. Select Hadoop service and set
it there.
Kim
On Thu, Aug 22, 2013 at 11:28 AM, Arun C Murthy a...@hortonworks.com wrote:
Pls ask CDH lists. Thanks.
On Aug 22, 2013, at 1:39 AM, ch huang justlo...@gmail.com wrote:
hi,all:
Hadoop 2.0.0-cdh4.3.0 has
Following up on this, how exactly does one *install* the jar(s) for auxiliary
service? Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting
there in the right place, but if one wanted to make a whole new
Auxiliary services are essentially administer-configured services. So, they
have to be set up at install time - before NM is started.
+Vinod
On Thu, Aug 22, 2013 at 1:38 PM, John Lilley john.lil...@redpoint.netwrote:
Following up on this, how exactly does one *install* the jar(s) for
hi,all:
i use cdh4.3 yarn , it's default scheduler is capacity scheduler
,i want to switch to fair scheduler,but i see doc says *NOTE:* The Fair
Scheduler implementation is currently under development and should be
considered experimental.,i do not know if it's the time to use it in
hi,i have a question about fair scheduler
doc says When there is a single app running, that app uses the entire
cluster. When other apps are submitted, resources that free up are assigned
to the new apps, so that each app gets roughly the same amount of
resources,
suppose i have only a big app
here is link to the doc
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
- yarn.scheduler.fair.minimum-allocation-mb
- The smallest container size the scheduler can allocate, in MB of
memory.
- yarn.scheduler.fair.minimum-allocation-mb
Hi,
here is a draft of the paper describing our running Hadoop NN on
persistent memory. Two promises are: (1) survives power failure, (2) has no
limitation on memory site. Your critique is welcome
Moving to cdh-user,
Hi,
The Fair Scheduler in 4.3 is stable and is recommended by Cloudera.
-Sandy
On Aug 22, 2013, at 6:20 PM, ch huang justlo...@gmail.com wrote:
hi,all:
i use cdh4.3 yarn , it's default scheduler is capacity scheduler ,i
want to switch to fair scheduler,but i
Hi,
It should be fixed in new version (2.1.0-beta), please refer:
https://issues.apache.org/jira/browse/YARN-646
Thanks,
Junping
- Original Message -
From: ch huang justlo...@gmail.com
To: user@hadoop.apache.org
Sent: Friday, August 23, 2013 10:30:42 AM
Subject: find a doc bug
Hi ,
You can definitely run the Driver (ClassWithMain) to a remote hadoop
cluster from say Eclipse following the steps under
a) Have the jar (Some.jar) in your classpath of your project in Eclipse .
b) Ensure you have set both the Namenode and Job Tracker information either
in core-site.xml
30 matches
Mail list logo