You can chain job submissions at the client. Also, you can run more than one
job in parallel (if you have enough task slots). An example of chaining jobs
is there in src/examples/org/apache/hadoop/examples/Grep.java where the jobs
grep-search and grep-sort are chained..
On 1/18/09 9:58 AM,
On 1/6/09 9:47 AM, Meng Mao meng...@gmail.com wrote:
Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as
soon as we upgrade our hardware (long story).
From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the
4545 patch, I don't see anything that seems
IIRC, enabling symlink creation for your files should solve the problem.
Call DistributedCache.createSymLink(); before submitting your job.
On 12/25/08 10:40 AM, Sean Shanny ssha...@tripadvisor.com wrote:
To all,
Version: hadoop-0.17.2.1-core.jar
I created a MapFile on a local node.
on slow machines).
The other thing to note is that faster machines will execute more tasks than
the slower machines when there are lots of tasks to execute, since machines
pull tasks from the JobTracker when they are done running the current tasks.
- Aaron
On Wed, Dec 24, 2008 at 1:12 AM, Devaraj
You can enable speculative execution for your jobs.
On 12/24/08 10:25 AM, Jeremy Chow coderp...@gmail.com wrote:
Hi list,
I've come up against a scenario like this, to finish a same task, one of my
hadoop cluster only needs 5 seconds, and another one needs more than 2
minutes.
It's a
Hi Christian, there is no notable change to the merge algorithm except that
it uses IFile instead of SequenceFile for the input and output.
Is your application running with intermediate compression on? What's the
value configured for fs.inmemory.size.mb? What is the typical map output
size (if you
On 12/7/08 11:32 PM, Andy Sautins [EMAIL PROTECTED] wrote:
I'm having trouble finding a way to do what I want, so I'm wondering
if I'm just not looking at the right place or if I'm thinking about the
problem in the wrong way. Any insight would be appreciated.
Let's say I
-
From: Devaraj Das [mailto:[EMAIL PROTECTED]
Sent: Sunday, December 07, 2008 12:11 PM
To: core-user@hadoop.apache.org
Subject: Re: Can mapper get access to filename being processed?
On 12/7/08 11:32 PM, Andy Sautins [EMAIL PROTECTED] wrote:
I'm having trouble finding a way
On 12/6/08 2:42 PM, deng chao [EMAIL PROTECTED] wrote:
Hi,
we have met a case need your help
The case: In the Mapper class, named MapperA, we define a map() function,
and in this map() function, we want to submit another new job, named jobB.
does hadoop support this case?
Although you
attempt) would launch the
second job again and this may not be what you want...
I am a novice, but it looks like the slaves know about the Master
NameNode and JobTracker (in the Masters file), so it I think it is
worth trying.
Cheers,
Tim
On Sat, Dec 6, 2008 at 5:17 PM, Devaraj Das [EMAIL
On 10/30/08 3:13 AM, Aaron Kimball [EMAIL PROTECTED] wrote:
The system load and memory consumption on the JT are both very close to
idle states -- it's not overworked, I don't think
I may have an idea of the problem, though. Digging back up a ways into the
JT logs, I see this:
a
stack trace of the JobTracker threads (without your patch) when the TTs are
unable to talk to it. Access the url http://jt-host:jt-info-port/stacks
That will tell us what the handlers are up to.
- Aaron
Devaraj Das wrote:
On 10/30/08 3:13 AM, Aaron Kimball [EMAIL PROTECTED] wrote
Quick question (I haven't looked at your comparator code yet) - is this
reproducible/consistent?
On 10/28/08 11:52 PM, Deepika Khera [EMAIL PROTECTED] wrote:
I am getting a similar exception too with Hadoop 0.18.1(See stacktrace
below), though its an EOFException. Does anyone have any idea
No that is not possible today. However, you might want to look at the
TaskScheduler to see if you can implement a scheduler to provide this kind
of task scheduling.
In the current hadoop, one point regarding computationally intensive task is
that if the machine is not able to keep up with the
of
free memory (so it's not resource starvation).
Espen
On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das [EMAIL PROTECTED] wrote:
I started a profile of the reduce-task. I've attached the profiling output.
It seems from the samples that ramManager.waitForDataToMerge() doesn't
actually wait.
Has
Hadoop doesn't support this natively. So if you need this kind of a
functionality, you'd need to code your application in such a way. But I am
worried about the race conditions in determining which task should first
create the ramfs and load the data.
If you can provide atomicity in determining
I started a profile of the reduce-task. I've attached the profiling output.
It seems from the samples that ramManager.waitForDataToMerge() doesn't
actually wait.
Has anybody seen this behavior.
This has been fixed in HADOOP-3940
On 9/4/08 6:36 PM, Espen Amble Kolstad [EMAIL PROTECTED]
to HDFS, and back since I work with many small files (10kb) and
hadoop seem to behave poorly with them.
Perhaps HBASE is another option. Is anyone using it in production mode?
And do I really need to downgrade to 17.x to install it?
-Original Message-
From: Devaraj Das [mailto
and uploading it. (small files lower
transfer speed from 40-70MB/s to hundreds ok kbps :(
-Original Message-
From: Devaraj Das [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 03, 2008 4:00 AM
To: core-user@hadoop.apache.org
Subject: Re: har/unhar utility
You could create a har archive
Could you try to kill the tasktracker hosting the task the next time when it
happens? I just want to isolate the problem - whether it is a problem in the
TT-JT communication or in the Task-TT communication. From your description
it looks like the problem is between the JT-TT communication. But pls
On 7/25/08 12:09 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote:
On Thursday 24 July 2008 15:19:22 Devaraj Das wrote:
Could you try to kill the tasktracker hosting the task the next time when
it happens? I just want to isolate the problem - whether it is a problem in
the TT-JT communication
This is strange. If you don't mind, pls send the script to me.
-Original Message-
From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
Sent: Thursday, July 03, 2008 9:49 AM
To: core-user@hadoop.apache.org
Subject: topology.script.file.name
Hello,
I have been trying to figure out
It should be out within a couple of days. As of now voting is on and will
end on 23rd.
-Original Message-
From: Joman Chu [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 19, 2008 4:48 PM
To: core-user@hadoop.apache.org
Subject: Release Date of Hadoop 0.17.1
Hello, I was wondering
Hadoop does provide a ulimit based way to control the memory consumption by
the tasks it spawns via the config mapred.child.ulimit. Look at
http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Task+Executi
on+%26+Environment
However, what is lacking is a way to get the cumulative
at 9:53 PM, Devaraj Das
[EMAIL PROTECTED] wrote:
Hi Iver,
The implementation of the script depends on your setup. The
main thing
is that it should be able to accept a bunch of IP addresses and DNS
names and be able to give back the rackIDs for each. It is a
one-to-one correspondence
No the PID is not logged. So is it the framework side java tasks not getting
killed or is it the Streaming children? By the way, the handling of process
groups should be handled better when we have HADOOP-1380.
-Original Message-
From: Andreas Kostyrka [mailto:[EMAIL PROTECTED]
Sent:
Hi Andreas,
Here is what I did:
bin/hadoop jar build/hadoop-0.18.0-dev-examples.jar randomtextwriter
-Dtest.randomtextwrite.min_words_key=40
-Dtest.randomtextwrite.max_words_key=50
-Dtest.randomtextwrite.maps_per_host=1 textinput
(this would generate 1GB of text data with pretty long sentences.
Hi, do you have a testcase that we can run to reproduce this? Thanks!
-Original Message-
From: jkupferman [mailto:[EMAIL PROTECTED]
Sent: Monday, June 02, 2008 9:22 AM
To: core-user@hadoop.apache.org
Subject: Stack Overflow When Running Job
Hi everyone,
I have a job running
-Original Message-
From: Taeho Kang [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 22, 2008 3:41 PM
To: core-user@hadoop.apache.org
Subject: Re: Questions on how to use DistributedCache
Thanks for your reply.
Just one more thing to ask..
From what I see from the source
, for a
given task tracker, under Non-runnign tasks, there are at
least 200 or 300 COMMIT_PENDING tasks. It appears they stuck too.
Thanks a lot for your help!
Lili
On Wed, Apr 30, 2008 at 2:14 PM, Devaraj Das
[EMAIL PROTECTED] wrote:
Hi Lili, the jobconf memory consumption seems
Long term we need to see how we can minimize the memory consumption by
objects corresponding to completed tasks in the tasktracker.
-Original Message-
From: Devaraj Das [mailto:[EMAIL PROTECTED]
Sent: Friday, May 02, 2008 1:29 AM
To: 'core-user@hadoop.apache.org'
Subject: RE: OOM
Hi Lili, the jobconf memory consumption seems quite high. Could you please
let us know if you pass anything in the jobconf of jobs that you run? I
think you are seeing the 572 objects since a job is running and the
TaskInProgress objects for tasks of the running job are kept in memory (but
I need
Will your requirement be addressed if, from within the map method, you
create a sequence file using SequenceFile.createWriter api, write a
key/value using the writer's append(key,value) API and then close the file
? You can do this for every key/value.
Pls have a look at createWriter APIs and
Jason, didn't get that. The jvm should exit naturally even without calling
System.exit. Where exactly did you insert the System.exit? Please clarify.
Thanks!
-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED]
Sent: Friday, April 18, 2008 6:48 PM
To:
as the other mapper is doing.
Devaraj Das [EMAIL PROTECTED] wrote: Will your requirement
be addressed if, from within the map method, you create a
sequence file using SequenceFile.createWriter api, write a
key/value using the writer's append(key,value) API and then
close the file
? You
written to file system
Yes, but Kayla is likely misguided in this respect.
(my apologies for sounding doctrinaire)
On 4/18/08 11:08 AM, Devaraj Das [EMAIL PROTECTED] wrote:
Ted, note that Kayla wants one file per output key/value.
-Original Message-
From: Ted Dunning
on
it?
kind regards,
ud
Devaraj Das [EMAIL PROTECTED]
04/16/2008 01:18 PM
Please respond to
core-user@hadoop.apache.org
To
core-user@hadoop.apache.org
cc
Subject
RE: Counters giving double values
Pls file a jira for the counter updates part. It will be excellent
Hi Grant, could you please copy-paste the exact command you used to run the
program. Also the associated config files, etc. will help
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 15, 2008 6:03 PM
To: core-user@hadoop.apache.org
Subject:
Hi Bradford,
Could you please check what your mapred.local.dir is set to?
Devaraj.
-Original Message-
From: Bradford Stephens [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 29, 2008 1:54 AM
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: hadoop 0.15.3 r612257
It might have something to do with your application itself. By any chance
are you doing a lot of huge object allocation (directly or indirectly)
within the map method? Which version of hadoop are you on?
-Original Message-
From: Stefan Groschupf [mailto:[EMAIL PROTECTED]
Sent:
the wordcount
example many other users report the same problem:
See:
http://markmail.org/search/?q=org.apache.hadoop.mapred.MapTask
%24MapOutputBuffer.collect+order%3Adate-backward
Thanks for your help!
Stefan
On Mar 15, 2008, at 11:02 PM, Devaraj Das wrote:
It might have something
Pipes won't work in local mode. It assumes support from HDFS. You should be
able to run it in a single node pseudo-distributed setup.
Devaraj
-Original Message-
From: Cox Wood [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 23, 2008 1:41 PM
To: [EMAIL PROTECTED]
Subject: Does the
no purpose).
perhaps i am totally off - would like to learn about other
people's experience.
-Original Message-
From: Devaraj Das [mailto:[EMAIL PROTECTED]
Sent: Tue 1/22/2008 8:22 PM
To: core-user@hadoop.apache.org
Subject: RE: speculative task execution and writing side-effect
1. In what situation would speculative task execution kick
in if it's enabled
It would be based on tasks' progress. A speculative instance of a running
task is launched if the task is question is lagging behind the others in
terms of progress it has made. It also depends on whether there are
44 matches
Mail list logo