Re: mapred queue -list

2013-06-14 Thread Arun C Murthy
: Capacity: 100.0, MaximumCapacity: 1.0, CurrentCapacity: 0.0 > > What is the difference between Capacity and MaximumCapacity fields? > > > > -- > Best regards, -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Save configuration data in job configuration file.

2013-01-19 Thread Arun C Murthy
jobConf.set(String, String)? On Jan 19, 2013, at 7:31 AM, Pedro Sá da Costa wrote: > Hi > > I want to save some configuration data in the configuration files that > belongs to the job. How can I do it? > > -- > Best regards, -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?

2012-12-06 Thread Arun C Murthy
>> >> I will decrease >> mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not >> fully confident for the scalability of the process. >> >> Any help would be welcomed >> >> once again, many thanks >> Olivier >> >> >> P.S: sorry if I misunderstood the code, any explanation would be really >> welcomed >> >> -- >> >> >> >> >> > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: [ANNOUNCE] - New user@ mailing list for hadoop users in-lieu of (common,hdfs,mapreduce)-user@

2012-08-07 Thread Arun C Murthy
Apologies (again) for the cross-post, I've filed https://issues.apache.org/jira/browse/INFRA-5123 to close down (common, hdfs, mapreduce)-user@ since user@ is functional now. thanks, Arun On Aug 4, 2012, at 9:59 PM, Arun C Murthy wrote: > All, > > Given our recent di

[ANNOUNCE] - New user@ mailing list for hadoop users in-lieu of (common,hdfs,mapreduce)-user@

2012-08-04 Thread Arun C Murthy
All, Given our recent discussion (http://s.apache.org/hv), the new u...@hadoop.apache.org mailing list has been created and all existing users in (common,hdfs,mapreduce)-user@ have been migrated over. I'm in the process of changing the website to reflect this (HADOOP-8652). Henceforth, ple

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Arun C Murthy
'hadoop jar ...'. > > Thanks, > > stan > > On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy wrote: >> Stan, >> >> You can ask TT to create a symlink to your jar shipped via DistCache: >> >> http://hadoop.apache.org/common/docs/r1.0.3/map

Re: task jvm bootstrapping via distributed cache

2012-08-02 Thread Arun C Murthy
distributed cache? If not, is the > use case appealing enough to open a jira ticket? > > Thanks, > > stan -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: What are the basic Hadoop Java Classes?

2012-07-30 Thread Arun C Murthy
ential in running programs in Hadoop. > > Thanks, > > Andrew Botelho > EMC Corporation > 55 Constitution Blvd., Franklin, MA > andrew.bote...@emc.com > Mobile: 508-813-2026 > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Reducer - getMapOutput

2012-07-30 Thread Arun C Murthy
null but seems it always remains in > the loop :(. > > Probably is more a HttpServlet related problem but I am not very familiar > with that. > > Do you have any idea how can I do it ? > > Thanks, > Robert -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Hadoop compile - delete conf files

2012-07-17 Thread Arun C Murthy
files content (hadoop-env.sh, > core-site.xml, mapred-site.xml, hdfs-site.xml) > So I have to recover from backup these files all the time. > > Does anybody face similar issues ? > > Thanks, > Robert > > > > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: MRv2 jobs fail when run with more than one slave

2012-07-17 Thread Arun C Murthy
ve running > the AM. The Node Manager logs on both AM and non-AM slaves appear fairly > similar, and I don't see any errors in the non-AM logs. > > Another strange data point: These failures occur running the slaves on ARM > systems. Running the slaves on x86 with the same configuration works. I'm > using the same tarball on both, which means that the native-hadoop library > isn't loaded on ARM. The master/client is the same x86 system in both > scenarios. All nodes are running Ubuntu 12.04. > > Thanks for any guidance, > Trevor > > > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: MRv2 jobs fail when run with more than one slave

2012-07-17 Thread Arun C Murthy
logs. > > Another strange data point: These failures occur running the slaves on ARM > systems. Running the slaves on x86 with the same configuration works. I'm > using the same tarball on both, which means that the native-hadoop library > isn't loaded on ARM. The master/client is the same x86 system in both > scenarios. All nodes are running Ubuntu 12.04. > > Thanks for any guidance, > Trevor > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: What exactly does the "CPU time spent (ms)" measure?

2012-07-17 Thread Arun C Murthy
e case, I observe that it is longer than the real running time. > What exactly does this counter measure? Thanks. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > guojun_...@freddiemac.com > Financial Engineering > Freddie Mac -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Mapper basic question

2012-07-11 Thread Arun C Murthy
limit the > no of mappers size without increasing the HDFS block size? > > Thanks in advance. > > Cheers! > Manoj. > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
27; and the actual 'key' in the map-output of as the 'secondary key'. hth, Arun > Thanks, > Robert > > From: Arun C Murthy > To: mapreduce-user@hadoop.apache.org > Sent: Monday, July 9, 2012 9:24 AM > Subject: Re: Basic question on how reducer works

Re: Basic question on how reducer works

2012-07-09 Thread Arun C Murthy
output > what portion it has to retrieve only ? To add to Harsh's comment. Essentially the TT *knows* where the output of a given map-id/reduce-id pair is present via an output-file/index-file combination. Arun -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: MR job runs on CDH3 but not on CDH4

2012-07-04 Thread Arun C Murthy
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:208) > > > Alan -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Out of memory (heap space) errors on job tracker

2012-06-10 Thread Arun C Murthy
users). > > Try it out and let us know! > > On Sat, Jun 9, 2012 at 12:37 AM, David Rosenstrauch wrote: >> We're running 0.20.2 (Cloudera cdh3u4). >> >> What configs are you referring to? >> >> Thanks, >> >> DR >> >> >&g

Re: Out of memory (heap space) errors on job tracker

2012-06-08 Thread Arun C Murthy
Anyone have any thoughts on the matter? > > Thanks, > > DR -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Idle nodes with terasort and MRv2/YARN (0.23.1)

2012-06-05 Thread Arun C Murthy
ed systems, which have > 4GB RAM and generally fewer, smaller (2.5" form factor) disks per > node. It sounds like the smaller RAM will force better distribution, > but the disk capacity/utilization situation will be more severe. > Right, smaller RAM should force better distribution.

Re: Idle nodes with terasort and MRv2/YARN (0.23.1)

2012-05-29 Thread Arun C Murthy
t; > In case it's significant, I've scripted the cluster setup and terasort > jobs, so everything runs back-to-back instantly, except that I poll to > ensure that HDFS is up and has active data nodes before running > teragen. I've also tried adding delays, but they didn&

Re: Good learning resources for 0.23?

2012-05-23 Thread Arun C Murthy
wiley.com > music.keithwiley.com > > "And what if we picked the wrong religion? Every week, we're just making God > madder and madder!" > -- Homer Simpson > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: API to get info for deprecated key

2012-05-22 Thread Arun C Murthy
vides a way to get the corresponding new > key. > > Cheers, > Subroto Sanyal -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Hadoop YARN/MapReduce Meetup during Hadoop Summit 2012

2012-05-14 Thread Arun C Murthy
g new YARN frameworks # MapReduce futures - What next for Hadoop MR framework? If you are interested, please sign up at: http://www.meetup.com/Hadoop-Contributors/events/64747342/ I look forward to a fun (technical) conversation and to put faces to names! thanks, Arun -- Arun C. Murthy Hortonworks

Terasort

2012-05-10 Thread Arun C Murthy
k for either MR1 or MR2). Arun > Jeff > > From: Arun C Murthy [mailto:a...@hortonworks.com] > Sent: Thursday, May 10, 2012 1:27 PM > To: mapreduce-user@hadoop.apache.org > Subject: Re: max 1 mapper per node > > For terasort you want to fill up your entire cluster with

Re: max 1 mapper per node

2012-05-10 Thread Arun C Murthy
mber of reducers per node > > maximum percentage of non data local tasks > maximum percentage of rack local tasks > > and set this in job properties. > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: max 1 mapper per node

2012-05-09 Thread Arun C Murthy
for AM is overkill, something simpler can be made like: > > maximum number of mappers per node > maximum number of reducers per node > > maximum percentage of non data local tasks > maximum percentage of rack local tasks > > and set this in job properties. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Cleanup after a Job

2012-04-29 Thread Arun C Murthy
happen in a afterJob( ) method while is available for each > Job.How do i make sure that afterJob () method is called for each Job added > to the controller before running the jobs that are depending on it. > > > Thanks -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: JVM reuse option in MRV2

2012-04-11 Thread Arun C Murthy
luding, but > not limited to, total or partial disclosure, r tion) by persons other than > the intended recipient's) is prohibited. If you receive this e-mail in error, > please notify the sender by phone or email immediately and delete it! > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: CompressionCodec in MapReduce

2012-04-11 Thread Arun C Murthy
rk with MapReduce jobs, >>> but I haven't found a way to inject it during the reading of input data, or >>> during the write of the job results. >>> Am I missing something, or is there no support for compressed files in the >>> filesystem? >>> >>> I am well aware of how to set it up to work during the intermitent phases >>> of the MapReduce operation, but I just can't find a way to apply it BEFORE >>> the job takes place... >>> Is there any other way except simply uncompressing the files I need prior >>> to scheduling a job? >>> >>> Huge thanks for any help you can give me! >>> -- >>> Greg >>> >> >> > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Question on adding new MR application into hadoop.examples.jar

2012-03-19 Thread Arun C Murthy
development in eclipse. -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: MR output to a file instead of directory?

2012-03-04 Thread Arun C Murthy
e any way to configure an OutputFormat to write all > data into a file? > > Thanks, > James -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: reducers output

2012-02-04 Thread Arun C Murthy
On Feb 3, 2012, at 11:46 PM, Alieh Saeedi wrote: > Hi > 1- How does Hadoop decide where to save file blocks (I mean all files include > files written by reducers)? Could you please give me a reference link? > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Best practices for hadoop shuffling/tunning ?

2012-01-31 Thread Arun C Murthy
above tuning parameters, and suggest any > further improvements ? > My mappers are running fine. Shuffling and reducing part is comparatively > slower, than expected for normal jobs. Wanted to know what I am doing > wrong/missing. > > Thanks, > Praveenesh -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: hadoop-1.0.0 and errors with log.index

2012-01-31 Thread Arun C Murthy
41_0003_r_00_1/log.index (No such file or directory) >>> >>>at java.io.FileInputStream.open(Native Method) >>> >>> These errors seem related to this two problems: >>> >>> http://grokbase.com/t/hadoop.apache.org/mapreduce-user/2012/01/error-read >>> ing-task-output-and-log-filenotfoundexceptions/03mjwctewcnxlgp2jkcrhvsgep >>> 4e >>> >>> https://issues.apache.org/jira/browse/MAPREDUCE-2846 >>> >>> But I've looked into the source code and the fix from MAPREDUCE-2846 is >>> there. Perhaps there is some other reason? >>> >>> Regards >>> Marcin >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: hadoop-1.0.0 and errors with log.index

2012-01-31 Thread Arun C Murthy
d to this two problems: > > http://grokbase.com/t/hadoop.apache.org/mapreduce-user/2012/01/error-reading-task-output-and-log-filenotfoundexceptions/03mjwctewcnxlgp2jkcrhvsgep4e > > https://issues.apache.org/jira/browse/MAPREDUCE-2846 > > But I've looked into the source cod

Re: reducers outputs

2012-01-29 Thread Arun C Murthy
p like NameNode and make JobTracker to > > consult with it too (I mean I want to make JobTracker to consult with > > NameNode AND myNewComponent both)? > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > > > > > > > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/

Re: Reduce node in 0.23

2012-01-19 Thread Arun C Murthy
Currently it's a random node. You might be interested in https://issues.apache.org/jira/browse/MAPREDUCE-199. Arun On Jan 19, 2012, at 10:10 AM, Ann Pal wrote: > Hi, > How is the reduce node choosen in 0.23? What parameters determine choosing > the reduce node. Does it depend on map node place

Re: Container size

2012-01-17 Thread Arun C Murthy
Sorry, wasn't clear - default is 1024 for both schedulers. Use those configs to tune them. Arun On Jan 17, 2012, at 11:56 PM, Arun C Murthy wrote: > The default is 1000MB. > > Try bumping down yarn.scheduler.fifo.minimum-allocation-mb to 100 (default is > 1024) for

Re: Container size

2012-01-17 Thread Arun C Murthy
g... > > On Wed, Jan 18, 2012 at 1:01 PM, Arun C Murthy wrote: > Removing common-user@, please do not cross-post. > > On Jan 17, 2012, at 11:24 PM, raghavendhra rahul wrote: > > > Hi, > > > > What is the minimum size of the container in hadoop yarn

Re: Container size

2012-01-17 Thread Arun C Murthy
Removing common-user@, please do not cross-post. On Jan 17, 2012, at 11:24 PM, raghavendhra rahul wrote: > Hi, > > What is the minimum size of the container in hadoop yarn. >capability.setmemory(xx); The AM gets this information from RM via the return value for AMRMProtocol.reg

Re: Yarn Container Memory

2012-01-12 Thread Arun C Murthy
What scheduler are you using? On Jan 11, 2012, at 11:48 PM, raghavendhra rahul wrote: > min container size 100mb > AM size is 1000mb > > On Thu, Jan 12, 2012 at 1:06 PM, Arun C Murthy wrote: > What is your min container size? > > How much did you allocate to AM itself?

Re: Yarn Container Memory

2012-01-11 Thread Arun C Murthy
What is your min container size? How much did you allocate to AM itself? On Jan 11, 2012, at 9:51 PM, raghavendhra rahul wrote: > Any suggestions.. > > On Wed, Jan 11, 2012 at 2:09 PM, raghavendhra rahul > wrote: > Hi, > I formed a hadoop cluster with 3 nodes of 3500mb alloted fo

Re: Application start stop

2012-01-11 Thread Arun C Murthy
You'll have to implement it yourself for your AM. The necessary apis are present in the protocol to do so. On Jan 11, 2012, at 3:33 AM, raghavendhra rahul wrote: > Hi > Is there a specific way to stop the application master other than timeout > option in the client. > Is there a command like >

Re: Queries on next gen MR architecture

2012-01-07 Thread Arun C Murthy
On Jan 7, 2012, at 6:47 PM, Praveen Sripati wrote: > Thanks for the response. > > I was just thinking why some of the design decisions were made with MRv2. > > > No, the OR condition is implied by the hierarchy of requests (node, rack, > > *). > > If InputSplit1 is on Node11 and Node12 and In

Re: Queries on next gen MR architecture

2012-01-07 Thread Arun C Murthy
On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote: > Hi, > > I had been going through the MRv2 documentation and have the following queries > > 1) Let's say that an InputSplit is on Node1 and Node2. > > Can the ApplicationMaster ask the ResourceManager for a container either on > Node1 or Nod

Re: Yarn related questions:

2012-01-06 Thread Arun C Murthy
Responses inline: On Jan 6, 2012, at 9:34 AM, Ann Pal wrote: > Thanks for your reply. Some additional questions: > [1] How does the application master determine the size (memory requirement) > of the container ? Can the container viewed as a JVM with CPU, memory? Pretty much. It's related to t

Re: Yarn related questions:

2012-01-06 Thread Arun C Murthy
Please don't hijack threads, start a new one. Thanks. On Jan 6, 2012, at 10:41 AM, Arun C Murthy wrote: > You're probably hitting MAPREDUCE-3537. Try using the hadoop-0.23.1-SNAPSHOT > or build it yourself from branch-0.23 on ASF svn. > > Arun > > On Jan 5, 201

Re: Yarn related questions:

2012-01-06 Thread Arun C Murthy
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > > Any ideas. > > > On Fri, Jan 6, 2012 at 12:41 AM, Arun C Murthy wrote: > Are you writing your own application i.e. custom ApplicationMaster? &g

Re: Yarn related questions:

2012-01-05 Thread Arun C Murthy
Are you writing your own application i.e. custom ApplicationMaster? You need to pass ResourceRequest (RR) with a valid hostname alongwith (optionally) RR with rack and also a mandatory RR with * as the resource-name. Arun On Jan 4, 2012, at 8:04 PM, raghavendhra rahul wrote: > Hi, > > I tried

Re: Exception from Yarn Launch Container

2012-01-03 Thread Arun C Murthy
Bing, Are you using the released version of hadoop-0.23? If so, you might want to upgrade to latest build off branch-0.23 (i.e. hadoop-0.23.1-SNAPSHOT) which has the fix for MAPREDUCE-3537. Arun On Dec 29, 2011, at 12:27 AM, Bing Jiang wrote: > Hi, I use Yarn as resource management to deploy

Re: Information regarding completed MapTask

2011-12-27 Thread Arun C Murthy
The reduces get it from the JobTracker. Take a look at TaskCompletionEvents.java. Arun On Dec 27, 2011, at 1:26 AM, hadoop anis wrote: > > > Friends, > > I want to know where does information regarding completed MapTask get stored? > i.e. how reduce task know about completed map output data

Re: how does Hadoop Yarn support different programming models?

2011-12-27 Thread Arun C Murthy
Yep! Take a look at the link Mahadev sent on how to get your application to work inside YARN. > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html > http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html Arun On

Re: A new map reduce framework for iterative/pipelined jobs.

2011-12-27 Thread Arun C Murthy
On Dec 26, 2011, at 10:30 PM, Kevin Burton wrote: > One key point I wanted to mention for Hadoop developers (but then check out > the announcement). > > I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and > would be more than happy to move it out and put it in another ded

Re: AlreadyExistsException for log file on 0.20.205.0

2011-12-26 Thread Arun C Murthy
Is this with jvm reuse turned on? On Dec 26, 2011, at 9:38 AM, Markus Jelsma wrote: > Hi, > > We're sometimes seeing this exception if a map task already failed before due > to, for example, an OOM error. Any ideas on how to address this issue? > > org.apache.hadoop.io.SecureIOUtils$AlreadyExi

Re: Reducers fail without messages on 20.205.0

2011-12-26 Thread Arun C Murthy
I wouldn't use jvm reuse at this point. It's had a number of issues over time and I've consistently switched it off for a long while now. Arun On Dec 26, 2011, at 2:50 PM, Markus Jelsma wrote: > Hi, > >> Markus, >> >> Good to know you fixed it now :) >> >> Also consider raising reduce slowst

Re: Performance of direct vs indirect shuffling

2011-12-20 Thread Arun C Murthy
On Dec 20, 2011, at 3:55 PM, Kevin Burton wrote: > The current hadoop implementation shuffles directly to disk and then those > disk files are eventually requested by the target nodes which are responsible > for doing the reduce() on the intermediate data. > > However, this requires more 2x IO

Re: Application Master:

2011-12-20 Thread Arun C Murthy
On Dec 20, 2011, at 4:05 PM, Ann Pal wrote: > Hi, > I had the following questions related to Yarn: > [1] How does the Application Master know where the data is, to give a list to > Resource Manager? Is it talking to the Name Node? Yes, it's the responsibility of the AM to talk to the NN to figu

Re: One task per Tasktracker

2011-12-20 Thread Arun C Murthy
Just use multiple slots per each map. See: http://hadoop.apache.org/common/docs/stable/capacity_scheduler.html#Resource+based+scheduling Arun On Dec 20, 2011, at 3:46 AM, Nitin Khandelwal wrote: > Hey, > > We use capacity scheduler and divide our map slots among queues. For a > particular ki

Re: Map and Reduce process hang out at 0%

2011-12-20 Thread Arun C Murthy
Can you look at the /nodes web-page to see how many nodes you have? Also, do you see any exceptions in the ResourceManager logs on dn5? Arun On Dec 20, 2011, at 5:14 AM, Jingui Lee wrote: > Hi,all > > I am running hadoop 0.23 on 5 nodes. > > I could run any YARN application or Mapreduce Job o

Re: Variable mapreduce.tasktracker.*.tasks.maximum per job

2011-12-19 Thread Arun C Murthy
Markus, The CapacityScheduler in 0.20.205 (in fact since 0.20.203) supports the notion of 'high memory jobs' with which you can specify, for each job, the number of 'slots' for each map/reduce. For e.g. you can say for job1 that each map needs 2 slots and so on. Unfortunately, I don't know how

Re: Tasktracker Task Attempts Stuck (mapreduce.task.timeout not working)

2011-12-15 Thread Arun C Murthy
Hi John, It's hard for folks on this list to diagnose CDH (you might have to ask their lists). However, I haven't seen similar issues with hadoop-0.20.2xx in a while. One thing to check would be to grab a stack trace (jstack) on the tasks to see what they are upto. Next, try get a tcpdump to

Re: Where JobTracker stores Task'sinformation

2011-12-14 Thread Arun C Murthy
Take a look at JobInProgress.java. There is one object per job. Arun On Dec 14, 2011, at 1:14 AM, hadoop anis wrote: > > > > Hi Friends, > I want to know, where JobTracker stores Task's Information, > i.e. which task is being executed on which tasktracker, and h

Re: How Jobtracler stores tasktracker's information

2011-12-13 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Take a look at JobTracker.heartbeat -> *Scheduler.assignTasks. After the scheduler 'assigns' tasks, the JT sends the corresponding 'LaunchTaskAction' to the TaskTracker. hth, Arun On Dec 13, 2011, at 12:59 AM, had

Re: Pause and Resume Hadoop map reduce job

2011-12-12 Thread Arun C Murthy
The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and start it again. That will give you the behavior you described. Arun On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote: > Hi Hadoop users, > > In my company we have been using Hadoop for 2 years and we have need to pause

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

2011-12-09 Thread Arun C Murthy
tibilites (i.e. UserGroupInformation, etc.). Yuck. > > Avery > > On 12/6/11 3:50 PM, Arun C Murthy wrote: >> Avery, >> >> If you could take a look at what it would take, I'd be grateful. I'm hoping >> it isn't very much effort. >> >&g

Re: Are the values available in job.xml the actual values used for job

2011-12-09 Thread Arun C Murthy
'final' is meant for admins to ensure certain values aren't overridable. However, in the example you gave, you'll see 15 (since it's 'final'). Arun On Dec 8, 2011, at 10:44 PM, Bejoy Ks wrote: > Hi experts > I have a query with the job.xml file in map reduce.I set some > value in

Re: OOM Error Map output copy.

2011-12-09 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Niranjan, If you average as 0.5G output per-map, it's 5000 maps *0.5G -> 2.5TB over 12 reduces i.e. nearly 250G per reduce - compressed! If you think you have 4:1 compression you are doing nearly a Terabyte per re

Re: Not able to post a job in Hadoop 0.23.0

2011-12-08 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Can you see any errors in the logs? Typically this happens when you have no NodeManagers. Check the 'nodes' link and then RM logs. Arun On Nov 29, 2011, at 8:36 PM, Nitin Khandelwal wrote: > HI , > > I have successfully setup Hadoop 0.23.0 in a si

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

2011-12-06 Thread Arun C Murthy
ll be slow to upgrade HDFS with all their important data on > it. I could also go that route I guess. > > Avery > > On 12/6/11 8:51 AM, Arun C Murthy wrote: >> Avery, >> >> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.

Re: Running YARN on top of legacy HDFS (i.e. 0.20)

2011-12-06 Thread Arun C Murthy
Avery, They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23. We have used the new HDFS apis in YARN in some places. hth, Arun On Dec 5, 2011, at 10:59 PM, Avery Ching wrote: >

Re: is there a way to just abandon a map task?

2011-11-20 Thread Arun C Murthy
job -fail-task ' 4 times to abandon the task. If you use '-kill-task' it will continue to be re-run. Arun > Cheers, > Mat > > On 20 November 2011 16:43, Arun C Murthy wrote: >> Mat, >> >> Take a look at mapred.max.(map|reduce).failures.percent. >&

Re: is there a way to just abandon a map task?

2011-11-20 Thread Arun C Murthy
Mat, Take a look at mapred.max.(map|reduce).failures.percent. See: http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapred/JobConf.html#setMaxMapTaskFailuresPercent(int) http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapred/JobConf.html#setMa

Re: Business logic in cleanup?

2011-11-18 Thread Arun C Murthy
On Nov 18, 2011, at 10:44 AM, Harsh J wrote: > > If you could follow up on that patch, and see it through, its wish granted > for a lot of us as well, as we move ahead with the newer APIs in the future > Hadoop releases ;-) > The plan is to support both mapred and mapreduce MR apis for the fo

Re: Dropping 0.20.203 capacity scheduler into 0.20.2

2011-10-26 Thread Arun C Murthy
Sorry. This mostly won't work... we have significant changes in the interface between the JobTracker and schedulers (FS/CS) b/w 20.2 and 20.203 (performance, better limits etc.). Your best bet might be to provision Hadoop yourself on EC2 with 0.20.203+. Good luck! Arun On Oct 26, 2011, at 2:5

Re: Streaming jar creates only 1 reducer

2011-10-21 Thread Arun C Murthy
You can also use -numReduceTasks <#reduces> option to streaming. On Oct 21, 2011, at 10:22 PM, Mapred Learn wrote: > Thanks Harsh ! > This is exactly what I thought ! > > And don't know what you mean by cross-post ? I just posted to mapred and HDFS > mailing lists ? What's your point about cros

Re: output from one map reduce job as the input to another map reduce job?

2011-09-27 Thread Arun C Murthy
On Sep 27, 2011, at 12:09 PM, Kevin Burton wrote: > Is it possible to connect the output of one map reduce job so that it is the > input to another map reduce job. > > Basically… then reduce() outputs a key, that will be passed to another map() > function without having to store intermediate d

Re: quotas for size of intermediate map/reduce output?

2011-09-21 Thread Arun C Murthy
We do track intermediate output used and if a job is using too much and can't be scheduled anywhere on a cluster the CS/JT will fail it. You'll need hadoop-0.20.204 for this though. Also, with MRv2 we are in the process of adding limits on disk usage for intermediate outputs, logs etc. hth, Ar

Re: Capacity scheduler uses all slots from the same tasktracker

2011-08-23 Thread Arun C Murthy
Were they all 'data-local' or 'rack-local' tasks? If so, it's expected. Arun On Aug 23, 2011, at 3:51 PM, Sulabh Choudhury wrote: > Hi, > > So I just started using capacity scheduler for M/R jobs. I have 4 task > trackers each with 4 map/reduce slots. > Configured a queue so that it uses 25% (

Re: Task re-scheduling in hadoop

2011-08-23 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@ On Aug 23, 2011, at 2:31 AM, Vaibhav Pol wrote: > Hi All, > I have some query regarding task re-scheduling.Can it possible > to make Job tracker wait for some time before re-scheduling of failed > tracker's tasks. > Why would you want to

Re: Question regarding Capacity Scheduler

2011-08-17 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@ On Aug 17, 2011, at 10:53 AM, Matt Davies wrote: > Hello, > > I'm playing around with the Capacity Scheduler (coming from the Fair > Scheduler), and it appears that a queue with jobs submitted by the same user > are treated as FIFO. So, for example,

Re: Can I use the cores of each CPU to be the datanodes instead of CPU?

2011-08-08 Thread Arun C Murthy
Jun, On Aug 8, 2011, at 2:19 AM, 谭军 wrote: > 2 computers with 2 CPUs. > Each CPU has 2 cores. > Now I have 2 physical datanodes. > Can I get 4 physical datanodes? > I don't know wether I make my point clear? Running multiple datanodes on the same machine really doesn't buy you anything - the und

Re: Performance of mappers

2011-08-05 Thread Arun C Murthy
_0035_m_00_0 1.0% > 2011-08-05 14:33:06,625 INFO org.apache.hadoop.mapred.TaskTracker: Task > attempt_201108041814_0035_m_00_0 is done. > > Thanks > Iman > > > > From: Arun C Murthy > To: mapreduce-user@hadoop.apache.org; Iman E > Sent: Friday, August

Re: Performance of mappers

2011-08-05 Thread Arun C Murthy
Which release of Hadoop are you running? What do the logs on the TaskTracker tell you during the time the slow tasks are getting launched? hadoop-0.20.203 has a ton of bug fixes since hadoop-0.20.2 which help fix issues with slow launches - you might want to upgrade. Arun On Aug 5, 2011, at 1

Re: Reducer Run on Which Machine?

2011-08-04 Thread Arun C Murthy
Nope, currently we don't do any smart scheduling for reduces since they need to fetch map outputs from many nodes anyway. Arun On Aug 4, 2011, at 10:24 PM, Suhendry Effendy wrote: > I understand that we can decide which task run by which reducer in Hadoop by > using custom partitioner, but is

Re: MapReduce jobs hanging or failing near completion

2011-08-03 Thread Arun C Murthy
t; I'm currently using the fair scheduler, but it doesn't look like I've > specified any allocations. Perhaps I'll dig into this further with the > Cloudera team to see if there is indeed a problem with the job tracker or > scheduler. Otherwise, I'll give 0.20.

Re: MapReduce jobs hanging or failing near completion

2011-08-01 Thread Arun C Murthy
. Good luck. Arun > > Is there something in particular I should be looking for on my local disks? > Hadoop fsck shows all clear, but I'll have to wait until morning to take > individual nodes offline to check their disks. Any further details you might > have would be ve

Re: Chaining Map Jobs

2011-07-29 Thread Arun C Murthy
Moving to mapreduce-user@, bcc common-user@. Use JobControl: http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Control Arun On Jul 29, 2011, at 4:24 PM, Roger Chen wrote: > Has anyone had experience with chaining map jobs in Hadoop framework 0.20.2? > Thanks. > > -- > Rog

Re: what happen in my hadoop cluster?

2011-07-27 Thread Arun C Murthy
Moving to hdfs-user@, bcc mapreduce-user@. You NameNode isn't coming out of safemode since all the datanodes haven't rejoined the cluster... > The ratio of reported blocks 0.2915 has not reached the threshold 0.9990. > Safe mode will be turned off automatically. Can you check why your datanod

Re: Merge Reducers Outputs

2011-07-26 Thread Arun C Murthy
No, you either have small enough data that you can have all go to a single reducer or you can setup a (sampling) partitioner so that the partitions are sorted and you can get globally sorted output from multiple reduces - take a look at the TeraSort example for this. Arun On Jul 26, 2011, at 3

Re: Job tracker error

2011-07-24 Thread Arun C Murthy
On Jul 24, 2011, at 2:34 AM, Joey Echeverria wrote: > You're running out of memory trying to generate the splits. You need to set a > bigger heap for your driver program. Assuming you're using the hadoop jar > command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a > larger

Re: MapReduce jobs hanging or failing near completion

2011-07-19 Thread Arun C Murthy
Is this reproducible? If so, I'd urge you to check your local disks... Arun On Jul 19, 2011, at 12:41 PM, Kai Ju Liu wrote: > Hi Marcos. The issue appears to be the following. A reduce task is unable to > fetch results from a map task on HDFS. The map task is re-run, but the map > task is now

Re: Too many fetch-failures

2011-07-18 Thread Arun C Murthy
On Jul 18, 2011, at 3:02 PM, Geoffry Roberts wrote: > All, > > I am getting the following errors during my MR jobs (see below). Ultimately > the jobs finish well enough, but these errors do slow things down. I've done > some reading and I understand that this is all caused by failures in my

Re: New to hadoop, trying to write a customary file split

2011-07-18 Thread Arun C Murthy
Hey Steve, Want to contribute it as an example to MR? Would love to help. thanks, Arun On Jul 11, 2011, at 12:11 PM, Steve Lewis wrote: > Look at this sample > = > package org.systemsbiology.hadoop; > > > > import org.apache.hadoop.conf.*; > impo

Re: Lack of data locality in Hadoop-0.20.2

2011-07-12 Thread Arun C Murthy
27;t really have any meaning. Obviously > everything will run in the same rack. I am concerned about data-local maps. I > assumed that Hadoop would do a much better job at ensuring data-local maps > but it doesnt seem to be the case here. > > -Virajith > > On Tue, Jul

Re: Lack of data locality in Hadoop-0.20.2

2011-07-12 Thread Arun C Murthy
Why are you running with replication factor of 1? Also, it depends on the scheduler you are using. The CapacityScheduler in 0.20.203 (not 0.20.2) has much better locality for jobs, similarly with FairScheduler. IAC, running on a single rack with replication of 1 implies rack-locality for all t

Re: Distribute native library within a job jar

2011-07-10 Thread Arun C Murthy
into the DC only once and re-used. I'd highly recommend that. hth, Arun > Thanks, > Jarod > > On Sat, Jul 9, 2011 at 3:20 PM, Arun C Murthy wrote: >> Jarod, >> >> On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote: >> >>> Hey all, >>

Re: About the combiner execution

2011-07-10 Thread Arun C Murthy
(Moving to mapreduce-user@, bcc hdfs-user@. Please use appropriate project lists - thanks) On Jul 10, 2011, at 4:42 AM, Florin P wrote: > Hello! > I've read on > http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html > (cite): > "The execution of combiner is not guaranteed,

Re: Distribute native library within a job jar

2011-07-09 Thread Arun C Murthy
Jarod, On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote: > Hey all, > > I'm working on a project that uses a native c library. Although I can > use DistributedCache as a way to distribute the c library, I'd like to > use the jar to do the job. What I mean is packing the c library into > t

Re: Distribute native library within a job jar

2011-07-09 Thread Arun C Murthy
Jarod, On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote: > Hey all, > > I'm working on a project that uses a native c library. Although I can > use DistributedCache as a way to distribute the c library, I'd like to > use the jar to do the job. What I mean is packing the c library into > t

  1   2   >