: Capacity: 100.0, MaximumCapacity: 1.0, CurrentCapacity: 0.0
>
> What is the difference between Capacity and MaximumCapacity fields?
>
>
>
> --
> Best regards,
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
jobConf.set(String, String)?
On Jan 19, 2013, at 7:31 AM, Pedro Sá da Costa wrote:
> Hi
>
> I want to save some configuration data in the configuration files that
> belongs to the job. How can I do it?
>
> --
> Best regards,
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
>>
>> I will decrease
>> mapred.job.shuffle.input.buffer.percent to limit the errors, but I am not
>> fully confident for the scalability of the process.
>>
>> Any help would be welcomed
>>
>> once again, many thanks
>> Olivier
>>
>>
>> P.S: sorry if I misunderstood the code, any explanation would be really
>> welcomed
>>
>> --
>>
>>
>>
>>
>>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
Apologies (again) for the cross-post, I've filed
https://issues.apache.org/jira/browse/INFRA-5123 to close down (common, hdfs,
mapreduce)-user@ since user@ is functional now.
thanks,
Arun
On Aug 4, 2012, at 9:59 PM, Arun C Murthy wrote:
> All,
>
> Given our recent di
All,
Given our recent discussion (http://s.apache.org/hv), the new
u...@hadoop.apache.org mailing list has been created and all existing users in
(common,hdfs,mapreduce)-user@ have been migrated over.
I'm in the process of changing the website to reflect this (HADOOP-8652).
Henceforth, ple
'hadoop jar ...'.
>
> Thanks,
>
> stan
>
> On Fri, Aug 3, 2012 at 2:39 AM, Arun C Murthy wrote:
>> Stan,
>>
>> You can ask TT to create a symlink to your jar shipped via DistCache:
>>
>> http://hadoop.apache.org/common/docs/r1.0.3/map
distributed cache? If not, is the
> use case appealing enough to open a jira ticket?
>
> Thanks,
>
> stan
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
ential in running programs in Hadoop.
>
> Thanks,
>
> Andrew Botelho
> EMC Corporation
> 55 Constitution Blvd., Franklin, MA
> andrew.bote...@emc.com
> Mobile: 508-813-2026
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
null but seems it always remains in
> the loop :(.
>
> Probably is more a HttpServlet related problem but I am not very familiar
> with that.
>
> Do you have any idea how can I do it ?
>
> Thanks,
> Robert
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
files content (hadoop-env.sh,
> core-site.xml, mapred-site.xml, hdfs-site.xml)
> So I have to recover from backup these files all the time.
>
> Does anybody face similar issues ?
>
> Thanks,
> Robert
>
>
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
ve running
> the AM. The Node Manager logs on both AM and non-AM slaves appear fairly
> similar, and I don't see any errors in the non-AM logs.
>
> Another strange data point: These failures occur running the slaves on ARM
> systems. Running the slaves on x86 with the same configuration works. I'm
> using the same tarball on both, which means that the native-hadoop library
> isn't loaded on ARM. The master/client is the same x86 system in both
> scenarios. All nodes are running Ubuntu 12.04.
>
> Thanks for any guidance,
> Trevor
>
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
logs.
>
> Another strange data point: These failures occur running the slaves on ARM
> systems. Running the slaves on x86 with the same configuration works. I'm
> using the same tarball on both, which means that the native-hadoop library
> isn't loaded on ARM. The master/client is the same x86 system in both
> scenarios. All nodes are running Ubuntu 12.04.
>
> Thanks for any guidance,
> Trevor
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
e case, I observe that it is longer than the real running time.
> What exactly does this counter measure? Thanks.
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_...@freddiemac.com
> Financial Engineering
> Freddie Mac
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
limit the
> no of mappers size without increasing the HDFS block size?
>
> Thanks in advance.
>
> Cheers!
> Manoj.
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
27; and the
actual 'key' in the map-output of as the 'secondary key'.
hth,
Arun
> Thanks,
> Robert
>
> From: Arun C Murthy
> To: mapreduce-user@hadoop.apache.org
> Sent: Monday, July 9, 2012 9:24 AM
> Subject: Re: Basic question on how reducer works
output
> what portion it has to retrieve only ?
To add to Harsh's comment. Essentially the TT *knows* where the output of a
given map-id/reduce-id pair is present via an output-file/index-file
combination.
Arun
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
>
> Alan
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
users).
>
> Try it out and let us know!
>
> On Sat, Jun 9, 2012 at 12:37 AM, David Rosenstrauch wrote:
>> We're running 0.20.2 (Cloudera cdh3u4).
>>
>> What configs are you referring to?
>>
>> Thanks,
>>
>> DR
>>
>>
>&g
Anyone have any thoughts on the matter?
>
> Thanks,
>
> DR
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
ed systems, which have
> 4GB RAM and generally fewer, smaller (2.5" form factor) disks per
> node. It sounds like the smaller RAM will force better distribution,
> but the disk capacity/utilization situation will be more severe.
>
Right, smaller RAM should force better distribution.
t;
> In case it's significant, I've scripted the cluster setup and terasort
> jobs, so everything runs back-to-back instantly, except that I poll to
> ensure that HDFS is up and has active data nodes before running
> teragen. I've also tried adding delays, but they didn&
wiley.com
> music.keithwiley.com
>
> "And what if we picked the wrong religion? Every week, we're just making God
> madder and madder!"
> -- Homer Simpson
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
vides a way to get the corresponding new
> key.
>
> Cheers,
> Subroto Sanyal
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
g new YARN frameworks
# MapReduce futures
- What next for Hadoop MR framework?
If you are interested, please sign up at:
http://www.meetup.com/Hadoop-Contributors/events/64747342/
I look forward to a fun (technical) conversation and to put faces to names!
thanks,
Arun
--
Arun C. Murthy
Hortonworks
k for either
MR1 or MR2).
Arun
> Jeff
>
> From: Arun C Murthy [mailto:a...@hortonworks.com]
> Sent: Thursday, May 10, 2012 1:27 PM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: max 1 mapper per node
>
> For terasort you want to fill up your entire cluster with
mber of reducers per node
>
> maximum percentage of non data local tasks
> maximum percentage of rack local tasks
>
> and set this in job properties.
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
for AM is overkill, something simpler can be made like:
>
> maximum number of mappers per node
> maximum number of reducers per node
>
> maximum percentage of non data local tasks
> maximum percentage of rack local tasks
>
> and set this in job properties.
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
happen in a afterJob( ) method while is available for each
> Job.How do i make sure that afterJob () method is called for each Job added
> to the controller before running the jobs that are depending on it.
>
>
> Thanks
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
luding, but
> not limited to, total or partial disclosure, r tion) by persons other than
> the intended recipient's) is prohibited. If you receive this e-mail in error,
> please notify the sender by phone or email immediately and delete it!
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
rk with MapReduce jobs,
>>> but I haven't found a way to inject it during the reading of input data, or
>>> during the write of the job results.
>>> Am I missing something, or is there no support for compressed files in the
>>> filesystem?
>>>
>>> I am well aware of how to set it up to work during the intermitent phases
>>> of the MapReduce operation, but I just can't find a way to apply it BEFORE
>>> the job takes place...
>>> Is there any other way except simply uncompressing the files I need prior
>>> to scheduling a job?
>>>
>>> Huge thanks for any help you can give me!
>>> --
>>> Greg
>>>
>>
>>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
development in eclipse.
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
e any way to configure an OutputFormat to write all
> data into a file?
>
> Thanks,
> James
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
On Feb 3, 2012, at 11:46 PM, Alieh Saeedi wrote:
> Hi
> 1- How does Hadoop decide where to save file blocks (I mean all files include
> files written by reducers)? Could you please give me a reference link?
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
above tuning parameters, and suggest any
> further improvements ?
> My mappers are running fine. Shuffling and reducing part is comparatively
> slower, than expected for normal jobs. Wanted to know what I am doing
> wrong/missing.
>
> Thanks,
> Praveenesh
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
41_0003_r_00_1/log.index (No such file or directory)
>>>
>>>at java.io.FileInputStream.open(Native Method)
>>>
>>> These errors seem related to this two problems:
>>>
>>> http://grokbase.com/t/hadoop.apache.org/mapreduce-user/2012/01/error-read
>>> ing-task-output-and-log-filenotfoundexceptions/03mjwctewcnxlgp2jkcrhvsgep
>>> 4e
>>>
>>> https://issues.apache.org/jira/browse/MAPREDUCE-2846
>>>
>>> But I've looked into the source code and the fix from MAPREDUCE-2846 is
>>> there. Perhaps there is some other reason?
>>>
>>> Regards
>>> Marcin
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
d to this two problems:
>
> http://grokbase.com/t/hadoop.apache.org/mapreduce-user/2012/01/error-reading-task-output-and-log-filenotfoundexceptions/03mjwctewcnxlgp2jkcrhvsgep4e
>
> https://issues.apache.org/jira/browse/MAPREDUCE-2846
>
> But I've looked into the source cod
p like NameNode and make JobTracker to
> > consult with it too (I mean I want to make JobTracker to consult with
> > NameNode AND myNewComponent both)?
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>
>
>
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
Currently it's a random node.
You might be interested in https://issues.apache.org/jira/browse/MAPREDUCE-199.
Arun
On Jan 19, 2012, at 10:10 AM, Ann Pal wrote:
> Hi,
> How is the reduce node choosen in 0.23? What parameters determine choosing
> the reduce node. Does it depend on map node place
Sorry, wasn't clear - default is 1024 for both schedulers. Use those configs to
tune them.
Arun
On Jan 17, 2012, at 11:56 PM, Arun C Murthy wrote:
> The default is 1000MB.
>
> Try bumping down yarn.scheduler.fifo.minimum-allocation-mb to 100 (default is
> 1024) for
g...
>
> On Wed, Jan 18, 2012 at 1:01 PM, Arun C Murthy wrote:
> Removing common-user@, please do not cross-post.
>
> On Jan 17, 2012, at 11:24 PM, raghavendhra rahul wrote:
>
> > Hi,
> >
> > What is the minimum size of the container in hadoop yarn
Removing common-user@, please do not cross-post.
On Jan 17, 2012, at 11:24 PM, raghavendhra rahul wrote:
> Hi,
>
> What is the minimum size of the container in hadoop yarn.
>capability.setmemory(xx);
The AM gets this information from RM via the return value for
AMRMProtocol.reg
What scheduler are you using?
On Jan 11, 2012, at 11:48 PM, raghavendhra rahul wrote:
> min container size 100mb
> AM size is 1000mb
>
> On Thu, Jan 12, 2012 at 1:06 PM, Arun C Murthy wrote:
> What is your min container size?
>
> How much did you allocate to AM itself?
What is your min container size?
How much did you allocate to AM itself?
On Jan 11, 2012, at 9:51 PM, raghavendhra rahul wrote:
> Any suggestions..
>
> On Wed, Jan 11, 2012 at 2:09 PM, raghavendhra rahul
> wrote:
> Hi,
> I formed a hadoop cluster with 3 nodes of 3500mb alloted fo
You'll have to implement it yourself for your AM.
The necessary apis are present in the protocol to do so.
On Jan 11, 2012, at 3:33 AM, raghavendhra rahul wrote:
> Hi
> Is there a specific way to stop the application master other than timeout
> option in the client.
> Is there a command like
>
On Jan 7, 2012, at 6:47 PM, Praveen Sripati wrote:
> Thanks for the response.
>
> I was just thinking why some of the design decisions were made with MRv2.
>
> > No, the OR condition is implied by the hierarchy of requests (node, rack,
> > *).
>
> If InputSplit1 is on Node11 and Node12 and In
On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote:
> Hi,
>
> I had been going through the MRv2 documentation and have the following queries
>
> 1) Let's say that an InputSplit is on Node1 and Node2.
>
> Can the ApplicationMaster ask the ResourceManager for a container either on
> Node1 or Nod
Responses inline:
On Jan 6, 2012, at 9:34 AM, Ann Pal wrote:
> Thanks for your reply. Some additional questions:
> [1] How does the application master determine the size (memory requirement)
> of the container ? Can the container viewed as a JVM with CPU, memory?
Pretty much. It's related to t
Please don't hijack threads, start a new one. Thanks.
On Jan 6, 2012, at 10:41 AM, Arun C Murthy wrote:
> You're probably hitting MAPREDUCE-3537. Try using the hadoop-0.23.1-SNAPSHOT
> or build it yourself from branch-0.23 on ASF svn.
>
> Arun
>
> On Jan 5, 201
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
>
> Any ideas.
>
>
> On Fri, Jan 6, 2012 at 12:41 AM, Arun C Murthy wrote:
> Are you writing your own application i.e. custom ApplicationMaster?
&g
Are you writing your own application i.e. custom ApplicationMaster?
You need to pass ResourceRequest (RR) with a valid hostname alongwith
(optionally) RR with rack and also a mandatory RR with * as the resource-name.
Arun
On Jan 4, 2012, at 8:04 PM, raghavendhra rahul wrote:
> Hi,
>
> I tried
Bing,
Are you using the released version of hadoop-0.23? If so, you might want to
upgrade to latest build off branch-0.23 (i.e. hadoop-0.23.1-SNAPSHOT) which has
the fix for MAPREDUCE-3537.
Arun
On Dec 29, 2011, at 12:27 AM, Bing Jiang wrote:
> Hi, I use Yarn as resource management to deploy
The reduces get it from the JobTracker. Take a look at
TaskCompletionEvents.java.
Arun
On Dec 27, 2011, at 1:26 AM, hadoop anis wrote:
>
>
> Friends,
>
> I want to know where does information regarding completed MapTask get stored?
> i.e. how reduce task know about completed map output data
Yep!
Take a look at the link Mahadev sent on how to get your application to work
inside YARN.
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
Arun
On
On Dec 26, 2011, at 10:30 PM, Kevin Burton wrote:
> One key point I wanted to mention for Hadoop developers (but then check out
> the announcement).
>
> I implemented a version of sysstat (iostat, vmstat, etc) in Peregrine and
> would be more than happy to move it out and put it in another ded
Is this with jvm reuse turned on?
On Dec 26, 2011, at 9:38 AM, Markus Jelsma wrote:
> Hi,
>
> We're sometimes seeing this exception if a map task already failed before due
> to, for example, an OOM error. Any ideas on how to address this issue?
>
> org.apache.hadoop.io.SecureIOUtils$AlreadyExi
I wouldn't use jvm reuse at this point. It's had a number of issues over time
and I've consistently switched it off for a long while now.
Arun
On Dec 26, 2011, at 2:50 PM, Markus Jelsma wrote:
> Hi,
>
>> Markus,
>>
>> Good to know you fixed it now :)
>>
>> Also consider raising reduce slowst
On Dec 20, 2011, at 3:55 PM, Kevin Burton wrote:
> The current hadoop implementation shuffles directly to disk and then those
> disk files are eventually requested by the target nodes which are responsible
> for doing the reduce() on the intermediate data.
>
> However, this requires more 2x IO
On Dec 20, 2011, at 4:05 PM, Ann Pal wrote:
> Hi,
> I had the following questions related to Yarn:
> [1] How does the Application Master know where the data is, to give a list to
> Resource Manager? Is it talking to the Name Node?
Yes, it's the responsibility of the AM to talk to the NN to figu
Just use multiple slots per each map.
See:
http://hadoop.apache.org/common/docs/stable/capacity_scheduler.html#Resource+based+scheduling
Arun
On Dec 20, 2011, at 3:46 AM, Nitin Khandelwal wrote:
> Hey,
>
> We use capacity scheduler and divide our map slots among queues. For a
> particular ki
Can you look at the /nodes web-page to see how many nodes you have?
Also, do you see any exceptions in the ResourceManager logs on dn5?
Arun
On Dec 20, 2011, at 5:14 AM, Jingui Lee wrote:
> Hi,all
>
> I am running hadoop 0.23 on 5 nodes.
>
> I could run any YARN application or Mapreduce Job o
Markus,
The CapacityScheduler in 0.20.205 (in fact since 0.20.203) supports the notion
of 'high memory jobs' with which you can specify, for each job, the number of
'slots' for each map/reduce. For e.g. you can say for job1 that each map needs
2 slots and so on.
Unfortunately, I don't know how
Hi John,
It's hard for folks on this list to diagnose CDH (you might have to ask their
lists). However, I haven't seen similar issues with hadoop-0.20.2xx in a while.
One thing to check would be to grab a stack trace (jstack) on the tasks to see
what they are upto. Next, try get a tcpdump to
Take a look at JobInProgress.java. There is one object per job.
Arun
On Dec 14, 2011, at 1:14 AM, hadoop anis wrote:
>
>
>
> Hi Friends,
> I want to know, where JobTracker stores Task's Information,
> i.e. which task is being executed on which tasktracker, and h
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.
Take a look at JobTracker.heartbeat -> *Scheduler.assignTasks.
After the scheduler 'assigns' tasks, the JT sends the corresponding
'LaunchTaskAction' to the TaskTracker.
hth,
Arun
On Dec 13, 2011, at 12:59 AM, had
The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and
start it again.
That will give you the behavior you described.
Arun
On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote:
> Hi Hadoop users,
>
> In my company we have been using Hadoop for 2 years and we have need to pause
tibilites (i.e. UserGroupInformation, etc.). Yuck.
>
> Avery
>
> On 12/6/11 3:50 PM, Arun C Murthy wrote:
>> Avery,
>>
>> If you could take a look at what it would take, I'd be grateful. I'm hoping
>> it isn't very much effort.
>>
>&g
'final' is meant for admins to ensure certain values aren't overridable.
However, in the example you gave, you'll see 15 (since it's 'final').
Arun
On Dec 8, 2011, at 10:44 PM, Bejoy Ks wrote:
> Hi experts
> I have a query with the job.xml file in map reduce.I set some
> value in
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists.
Niranjan,
If you average as 0.5G output per-map, it's 5000 maps *0.5G -> 2.5TB over 12
reduces i.e. nearly 250G per reduce - compressed!
If you think you have 4:1 compression you are doing nearly a Terabyte per
re
Moving to mapreduce-user@, bcc common-user@.
Can you see any errors in the logs? Typically this happens when you have no
NodeManagers.
Check the 'nodes' link and then RM logs.
Arun
On Nov 29, 2011, at 8:36 PM, Nitin Khandelwal wrote:
> HI ,
>
> I have successfully setup Hadoop 0.23.0 in a si
ll be slow to upgrade HDFS with all their important data on
> it. I could also go that route I guess.
>
> Avery
>
> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>> Avery,
>>
>> They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.
Avery,
They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka
FileContext apis). Both the old (FileSystem apis) and new are supported in
hadoop-0.23.
We have used the new HDFS apis in YARN in some places.
hth,
Arun
On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>
job -fail-task ' 4 times to
abandon the task. If you use '-kill-task' it will continue to be re-run.
Arun
> Cheers,
> Mat
>
> On 20 November 2011 16:43, Arun C Murthy wrote:
>> Mat,
>>
>> Take a look at mapred.max.(map|reduce).failures.percent.
>&
Mat,
Take a look at mapred.max.(map|reduce).failures.percent.
See:
http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapred/JobConf.html#setMaxMapTaskFailuresPercent(int)
http://hadoop.apache.org/common/docs/r0.20.205.0/api/org/apache/hadoop/mapred/JobConf.html#setMa
On Nov 18, 2011, at 10:44 AM, Harsh J wrote:
>
> If you could follow up on that patch, and see it through, its wish granted
> for a lot of us as well, as we move ahead with the newer APIs in the future
> Hadoop releases ;-)
>
The plan is to support both mapred and mapreduce MR apis for the fo
Sorry. This mostly won't work... we have significant changes in the interface
between the JobTracker and schedulers (FS/CS) b/w 20.2 and 20.203 (performance,
better limits etc.).
Your best bet might be to provision Hadoop yourself on EC2 with 0.20.203+.
Good luck!
Arun
On Oct 26, 2011, at 2:5
You can also use -numReduceTasks <#reduces> option to streaming.
On Oct 21, 2011, at 10:22 PM, Mapred Learn wrote:
> Thanks Harsh !
> This is exactly what I thought !
>
> And don't know what you mean by cross-post ? I just posted to mapred and HDFS
> mailing lists ? What's your point about cros
On Sep 27, 2011, at 12:09 PM, Kevin Burton wrote:
> Is it possible to connect the output of one map reduce job so that it is the
> input to another map reduce job.
>
> Basically… then reduce() outputs a key, that will be passed to another map()
> function without having to store intermediate d
We do track intermediate output used and if a job is using too much and can't
be scheduled anywhere on a cluster the CS/JT will fail it. You'll need
hadoop-0.20.204 for this though.
Also, with MRv2 we are in the process of adding limits on disk usage for
intermediate outputs, logs etc.
hth,
Ar
Were they all 'data-local' or 'rack-local' tasks? If so, it's expected.
Arun
On Aug 23, 2011, at 3:51 PM, Sulabh Choudhury wrote:
> Hi,
>
> So I just started using capacity scheduler for M/R jobs. I have 4 task
> trackers each with 4 map/reduce slots.
> Configured a queue so that it uses 25% (
Moving to mapreduce-user@, bcc common-user@
On Aug 23, 2011, at 2:31 AM, Vaibhav Pol wrote:
> Hi All,
> I have some query regarding task re-scheduling.Can it possible
> to make Job tracker wait for some time before re-scheduling of failed
> tracker's tasks.
>
Why would you want to
Moving to mapreduce-user@, bcc common-user@
On Aug 17, 2011, at 10:53 AM, Matt Davies wrote:
> Hello,
>
> I'm playing around with the Capacity Scheduler (coming from the Fair
> Scheduler), and it appears that a queue with jobs submitted by the same user
> are treated as FIFO. So, for example,
Jun,
On Aug 8, 2011, at 2:19 AM, 谭军 wrote:
> 2 computers with 2 CPUs.
> Each CPU has 2 cores.
> Now I have 2 physical datanodes.
> Can I get 4 physical datanodes?
> I don't know wether I make my point clear?
Running multiple datanodes on the same machine really doesn't buy you anything
- the und
_0035_m_00_0 1.0%
> 2011-08-05 14:33:06,625 INFO org.apache.hadoop.mapred.TaskTracker: Task
> attempt_201108041814_0035_m_00_0 is done.
>
> Thanks
> Iman
>
>
>
> From: Arun C Murthy
> To: mapreduce-user@hadoop.apache.org; Iman E
> Sent: Friday, August
Which release of Hadoop are you running?
What do the logs on the TaskTracker tell you during the time the slow tasks are
getting launched?
hadoop-0.20.203 has a ton of bug fixes since hadoop-0.20.2 which help fix
issues with slow launches - you might want to upgrade.
Arun
On Aug 5, 2011, at 1
Nope, currently we don't do any smart scheduling for reduces since they need to
fetch map outputs from many nodes anyway.
Arun
On Aug 4, 2011, at 10:24 PM, Suhendry Effendy wrote:
> I understand that we can decide which task run by which reducer in Hadoop by
> using custom partitioner, but is
t; I'm currently using the fair scheduler, but it doesn't look like I've
> specified any allocations. Perhaps I'll dig into this further with the
> Cloudera team to see if there is indeed a problem with the job tracker or
> scheduler. Otherwise, I'll give 0.20.
.
Good luck.
Arun
>
> Is there something in particular I should be looking for on my local disks?
> Hadoop fsck shows all clear, but I'll have to wait until morning to take
> individual nodes offline to check their disks. Any further details you might
> have would be ve
Moving to mapreduce-user@, bcc common-user@.
Use JobControl:
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Control
Arun
On Jul 29, 2011, at 4:24 PM, Roger Chen wrote:
> Has anyone had experience with chaining map jobs in Hadoop framework 0.20.2?
> Thanks.
>
> --
> Rog
Moving to hdfs-user@, bcc mapreduce-user@.
You NameNode isn't coming out of safemode since all the datanodes haven't
rejoined the cluster...
> The ratio of reported blocks 0.2915 has not reached the threshold 0.9990.
> Safe mode will be turned off automatically.
Can you check why your datanod
No, you either have small enough data that you can have all go to a single
reducer or you can setup a (sampling) partitioner so that the partitions are
sorted and you can get globally sorted output from multiple reduces - take a
look at the TeraSort example for this.
Arun
On Jul 26, 2011, at 3
On Jul 24, 2011, at 2:34 AM, Joey Echeverria wrote:
> You're running out of memory trying to generate the splits. You need to set a
> bigger heap for your driver program. Assuming you're using the hadoop jar
> command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a
> larger
Is this reproducible? If so, I'd urge you to check your local disks...
Arun
On Jul 19, 2011, at 12:41 PM, Kai Ju Liu wrote:
> Hi Marcos. The issue appears to be the following. A reduce task is unable to
> fetch results from a map task on HDFS. The map task is re-run, but the map
> task is now
On Jul 18, 2011, at 3:02 PM, Geoffry Roberts wrote:
> All,
>
> I am getting the following errors during my MR jobs (see below). Ultimately
> the jobs finish well enough, but these errors do slow things down. I've done
> some reading and I understand that this is all caused by failures in my
Hey Steve,
Want to contribute it as an example to MR? Would love to help.
thanks,
Arun
On Jul 11, 2011, at 12:11 PM, Steve Lewis wrote:
> Look at this sample
> =
> package org.systemsbiology.hadoop;
>
>
>
> import org.apache.hadoop.conf.*;
> impo
27;t really have any meaning. Obviously
> everything will run in the same rack. I am concerned about data-local maps. I
> assumed that Hadoop would do a much better job at ensuring data-local maps
> but it doesnt seem to be the case here.
>
> -Virajith
>
> On Tue, Jul
Why are you running with replication factor of 1?
Also, it depends on the scheduler you are using. The CapacityScheduler in
0.20.203 (not 0.20.2) has much better locality for jobs, similarly with
FairScheduler.
IAC, running on a single rack with replication of 1 implies rack-locality for
all t
into the DC only once and re-used. I'd highly
recommend that.
hth,
Arun
> Thanks,
> Jarod
>
> On Sat, Jul 9, 2011 at 3:20 PM, Arun C Murthy wrote:
>> Jarod,
>>
>> On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote:
>>
>>> Hey all,
>>
(Moving to mapreduce-user@, bcc hdfs-user@. Please use appropriate project
lists - thanks)
On Jul 10, 2011, at 4:42 AM, Florin P wrote:
> Hello!
> I've read on
> http://www.fromdev.com/2010/12/interview-questions-hadoop-mapreduce.html
> (cite):
> "The execution of combiner is not guaranteed,
Jarod,
On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote:
> Hey all,
>
> I'm working on a project that uses a native c library. Although I can
> use DistributedCache as a way to distribute the c library, I'd like to
> use the jar to do the job. What I mean is packing the c library into
> t
Jarod,
On Jul 9, 2011, at 12:08 PM, Donghan (Jarod) Wang wrote:
> Hey all,
>
> I'm working on a project that uses a native c library. Although I can
> use DistributedCache as a way to distribute the c library, I'd like to
> use the jar to do the job. What I mean is packing the c library into
> t
1 - 100 of 139 matches
Mail list logo