rack awarness unexpected behaviour

2013-08-22 Thread Marc Sturlese
Hey there, I've set up rack awareness on my hadoop cluster with replication 3. I have 2 racks and each contains 50% of the nodes. I can see that the blocks are spread on the 2 racks, the problem is that all nodes from a rack are storing 2 replicas and the nodes of the other rack just one. If I

Re: rack awarness unexpected behaviour

2013-08-22 Thread Marc Sturlese
Jobs run on the whole cluster. After rebalancing everything is properly allocated. Then I start running jobs using all the slots of the 2 racks and the problem starts to happen. Maybe I'm missing something. When using the rack awareness, do you have to specify to the jobs to run in slots form both

Re: rack awarness unexpected behaviour

2013-08-22 Thread Nicolas Liochon
When you rebalance, the block is fully written, so the writer locality does not have to be taken into account (there is no writer anymore), hence it can rebalance across the racks. That's why jobs asymmetry was the easy guess. What's your hadoop version by the way? I remember a bug around rack

Re: rack awarness unexpected behaviour

2013-08-22 Thread Marc Sturlese
I'm on cdh3u4 (0.20.2), gonna try to read a bit on this bug -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4086049.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Submit RHadoop job using Ozzie in Cloudera Manager

2013-08-22 Thread guydou
Hi Rohit Did you succeed in running R script from Oozie action? If so can you share you action configuration? I am trying to figure out how to run a R script from Oozie -- View this message in context:

Re: rack awarness unexpected behaviour

2013-08-22 Thread Harsh J
I'm not aware of a bug in 0.20.2 that would not honor the Rack Awareness, but have you done the two below checks as well? 1. Ensuring JT has the same rack awareness scripts and configuration so it can use it for scheduling, and, 2. Checking if the map and reduce tasks are being evenly spread

Re: rack awarness unexpected behaviour

2013-08-22 Thread Michel Segel
Rack aware is an artificial concept. Meaning you can define where a node is regardless of is real position in the rack. Going from memory, and its probably been changed in later versions of the code... Isn't the replication... Copy on node 1, copy on same rack, third copy on different rack?

Re: rack awarness unexpected behaviour

2013-08-22 Thread Jun Ping Du
For 3 replicas, the replication sequence is: 1st on local node of Writer, 2nd on remote rack node of 1st replica, 3rd on same rack of 2nd replica. There could be some special cases like: disk is full on 1st node, or no node available for 2nd replica rack, and Hadoop already take care it well.

Hadoop - impersonation doubts/issues while accessing from remote machine

2013-08-22 Thread Omkar Joshi
For readability, I haven't posted the code, output etc. in this mail - please check the thread below : http://stackoverflow.com/questions/18354664/spring-data-hadoop-connectivity I'm trying to connect to a remote hadoop(1.1.2) cluster from my local Windows machine via Spring data(later,

running map tasks in remote node

2013-08-22 Thread rab ra
Hello, Here is the new bie question of the day. For one of my use cases, I want to use hadoop map reduce without HDFS. Here, I will have a text file containing a list of file names to process. Assume that I have 10 lines (10 files to process) in the input text file and I wish to generate 10 map

Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh

2013-08-22 Thread ch huang
hi,all: Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh , where i can tune JVM options?

hadoop-env.sh file not exist in hadoop2.0 anymore ,and where i can tune JVM HEAPSIZE?

2013-08-22 Thread ch huang
hi,all: i have a big problem!! i try the two cluster ,one cluster is upgrade from cdh3 to cdh4 ,i changed hadoop-env.sh ,and restart node ,the heapsize will change, but on another cluster new installed cdh4.3 i find hadoop-env.sh has no use,why? how can i change heapsize on the new cluster??

Re: hadoop-env.sh file not exist in hadoop2.0 anymore ,and where i can tune JVM HEAPSIZE?

2013-08-22 Thread bharath vissapragada
create a new file . Its bug and has been resolved in 2.0.2-alpha https://issues.apache.org/jira/browse/HADOOP-8287 On Thu, Aug 22, 2013 at 2:44 PM, ch huang justlo...@gmail.com wrote: hi,all: i have a big problem!! i try the two cluster ,one cluster is upgrade from cdh3 to cdh4 ,i

Fwd: Create a file in local file system in map method

2013-08-22 Thread rab ra
-- Forwarded message -- From: rab ra rab...@gmail.com Date: 22 Aug 2013 15:14 Subject: Create a file in local file system in map method To: us...@hadoop.apache.org us...@hadoop.apache.org Hi i am not able to create a file in my local file system from my map method. Is there a way

Re: Create a file in local file system in map method

2013-08-22 Thread Harsh J
Can you share what error you run into in trying to write to a local filesystem location from within a map task? Note that the map tasks will run as the same user as the TaskTracker daemon in insecure environments, or as the job submitting user in secure environments. The location you're writing

RE: running map tasks in remote node

2013-08-22 Thread java8964 java8964
If you don't plan to use HDFS, what kind of sharing file system you are going to use between cluster? NFS?For what you want to do, even though it doesn't make too much sense, but you need to the first problem as the shared file system. Second, if you want to process the files file by file,

Re: Create a file in local file system in map method

2013-08-22 Thread Balachandar R.A.
Yes It was permission issue. I could fix this now. Thanks On 22 Aug 2013 15:49, Harsh J ha...@cloudera.com wrote: Can you share what error you run into in trying to write to a local filesystem location from within a map task? Note that the map tasks will run as the same user as the

Re: bz2 decompress in place

2013-08-22 Thread Zac Shepherd
Just because I always appreciate it when someone posts the answer to their own question: We have some java that does BZip2Codec bz2 = new BZip2Codec(); CompressionOutputStream cout = bz2.createOutputStream(out); for compression. We just wrote another version that does

HOW WEBHDFS WORKS

2013-08-22 Thread Visioner Sadak
friends does anyone know how webhdfs internally works or how it uses jetty server within hado0p

Re: Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh

2013-08-22 Thread Arun C Murthy
Pls ask CDH lists. Thanks. On Aug 22, 2013, at 1:39 AM, ch huang justlo...@gmail.com wrote: hi,all: Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh , where i can tune JVM options? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is

Re: Hadoop 2.0.0-cdh4.3.0 has no hadoop-env.sh

2013-08-22 Thread Kim Chew
I think you have to use Cloudera Manager. Select Hadoop service and set it there. Kim On Thu, Aug 22, 2013 at 11:28 AM, Arun C Murthy a...@hortonworks.com wrote: Pls ask CDH lists. Thanks. On Aug 22, 2013, at 1:39 AM, ch huang justlo...@gmail.com wrote: hi,all: Hadoop 2.0.0-cdh4.3.0 has

RE: yarn-site.xml and aux-services

2013-08-22 Thread John Lilley
Following up on this, how exactly does one *install* the jar(s) for auxiliary service? Can it be shipped out with the LocalResources of an AM? MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new

Re: yarn-site.xml and aux-services

2013-08-22 Thread Vinod Kumar Vavilapalli
Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started. +Vinod On Thu, Aug 22, 2013 at 1:38 PM, John Lilley john.lil...@redpoint.netwrote: Following up on this, how exactly does one *install* the jar(s) for

Is fair scheduler still experimental?

2013-08-22 Thread ch huang
hi,all: i use cdh4.3 yarn , it's default scheduler is capacity scheduler ,i want to switch to fair scheduler,but i see doc says *NOTE:* The Fair Scheduler implementation is currently under development and should be considered experimental.,i do not know if it's the time to use it in

question about fair scheduler

2013-08-22 Thread ch huang
hi,i have a question about fair scheduler doc says When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps, so that each app gets roughly the same amount of resources, suppose i have only a big app

find a doc bug in description of fair-scheduler in yarn

2013-08-22 Thread ch huang
here is link to the doc http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html - yarn.scheduler.fair.minimum-allocation-mb - The smallest container size the scheduler can allocate, in MB of memory. - yarn.scheduler.fair.minimum-allocation-mb

NameNode on persistent memory

2013-08-22 Thread Mark Kerzner
Hi, here is a draft of the paper describing our running Hadoop NN on persistent memory. Two promises are: (1) survives power failure, (2) has no limitation on memory site. Your critique is welcome

Re: Is fair scheduler still experimental?

2013-08-22 Thread sandy . ryza
Moving to cdh-user, Hi, The Fair Scheduler in 4.3 is stable and is recommended by Cloudera. -Sandy On Aug 22, 2013, at 6:20 PM, ch huang justlo...@gmail.com wrote: hi,all: i use cdh4.3 yarn , it's default scheduler is capacity scheduler ,i want to switch to fair scheduler,but i

Re: find a doc bug in description of fair-scheduler in yarn

2013-08-22 Thread Jun Ping Du
Hi, It should be fixed in new version (2.1.0-beta), please refer: https://issues.apache.org/jira/browse/YARN-646 Thanks, Junping - Original Message - From: ch huang justlo...@gmail.com To: user@hadoop.apache.org Sent: Friday, August 23, 2013 10:30:42 AM Subject: find a doc bug

Re: is it possible to run a executable jar with ClientAPI?

2013-08-22 Thread Ravi Kiran
Hi , You can definitely run the Driver (ClassWithMain) to a remote hadoop cluster from say Eclipse following the steps under a) Have the jar (Some.jar) in your classpath of your project in Eclipse . b) Ensure you have set both the Namenode and Job Tracker information either in core-site.xml