Re: Calling a mapreduce job from inside another

2009-01-18 Thread Devaraj Das
You can chain job submissions at the client. Also, you can run more than one job in parallel (if you have enough task slots). An example of chaining jobs is there in src/examples/org/apache/hadoop/examples/Grep.java where the jobs grep-search and grep-sort are chained.. On 1/18/09 9:58 AM,

Re: correct pattern for using setOutputValueGroupingComparator?

2009-01-06 Thread Devaraj Das
On 1/6/09 9:47 AM, Meng Mao meng...@gmail.com wrote: Unfortunately, my team is on 0.15 :(. We are looking to upgrade to 0.18 as soon as we upgrade our hardware (long story). From comparing the 0.15 and 0.19 mapreduce tutorials, and looking at the 4545 patch, I don't see anything that seems

Re: Having trouble accessing MapFiles in the DistributedCache

2008-12-25 Thread Devaraj Das
IIRC, enabling symlink creation for your files should solve the problem. Call DistributedCache.createSymLink(); before submitting your job. On 12/25/08 10:40 AM, Sean Shanny ssha...@tripadvisor.com wrote: To all, Version: hadoop-0.17.2.1-core.jar I created a MapFile on a local node.

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-24 Thread Devaraj Das
on slow machines). The other thing to note is that faster machines will execute more tasks than the slower machines when there are lots of tasks to execute, since machines pull tasks from the JobTracker when they are done running the current tasks. - Aaron On Wed, Dec 24, 2008 at 1:12 AM, Devaraj

Re: How to coordinate nodes of different computing powers in a same cluster?

2008-12-23 Thread Devaraj Das
You can enable speculative execution for your jobs. On 12/24/08 10:25 AM, Jeremy Chow coderp...@gmail.com wrote: Hi list, I've come up against a scenario like this, to finish a same task, one of my hadoop cluster only needs 5 seconds, and another one needs more than 2 minutes. It's a

Re: How are records with equal key sorted in hadoop-0.18?

2008-12-08 Thread Devaraj Das
Hi Christian, there is no notable change to the merge algorithm except that it uses IFile instead of SequenceFile for the input and output. Is your application running with intermediate compression on? What's the value configured for fs.inmemory.size.mb? What is the typical map output size (if you

Re: Can mapper get access to filename being processed?

2008-12-07 Thread Devaraj Das
On 12/7/08 11:32 PM, Andy Sautins [EMAIL PROTECTED] wrote: I'm having trouble finding a way to do what I want, so I'm wondering if I'm just not looking at the right place or if I'm thinking about the problem in the wrong way. Any insight would be appreciated. Let's say I

Re: Can mapper get access to filename being processed?

2008-12-07 Thread Devaraj Das
- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Sunday, December 07, 2008 12:11 PM To: core-user@hadoop.apache.org Subject: Re: Can mapper get access to filename being processed? On 12/7/08 11:32 PM, Andy Sautins [EMAIL PROTECTED] wrote: I'm having trouble finding a way

Re: does hadoop support submit a new different job in map function?

2008-12-06 Thread Devaraj Das
On 12/6/08 2:42 PM, deng chao [EMAIL PROTECTED] wrote: Hi, we have met a case need your help The case: In the Mapper class, named MapperA, we define a map() function, and in this map() function, we want to submit another new job, named jobB. does hadoop support this case? Although you

Re: does hadoop support submit a new different job in map function?

2008-12-06 Thread Devaraj Das
attempt) would launch the second job again and this may not be what you want... I am a novice, but it looks like the slaves know about the Master NameNode and JobTracker (in the Masters file), so it I think it is worth trying. Cheers, Tim On Sat, Dec 6, 2008 at 5:17 PM, Devaraj Das [EMAIL

Re: TaskTrackers disengaging from JobTracker

2008-10-29 Thread Devaraj Das
On 10/30/08 3:13 AM, Aaron Kimball [EMAIL PROTECTED] wrote: The system load and memory consumption on the JT are both very close to idle states -- it's not overworked, I don't think I may have an idea of the problem, though. Digging back up a ways into the JT logs, I see this:

Re: TaskTrackers disengaging from JobTracker

2008-10-29 Thread Devaraj Das
a stack trace of the JobTracker threads (without your patch) when the TTs are unable to talk to it. Access the url http://jt-host:jt-info-port/stacks That will tell us what the handlers are up to. - Aaron Devaraj Das wrote: On 10/30/08 3:13 AM, Aaron Kimball [EMAIL PROTECTED] wrote

Re: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1

2008-10-28 Thread Devaraj Das
Quick question (I haven't looked at your comparator code yet) - is this reproducible/consistent? On 10/28/08 11:52 PM, Deepika Khera [EMAIL PROTECTED] wrote: I am getting a similar exception too with Hadoop 0.18.1(See stacktrace below), though its an EOFException. Does anyone have any idea

Re: task assignment managemens.

2008-09-07 Thread Devaraj Das
No that is not possible today. However, you might want to look at the TaskScheduler to see if you can implement a scheduler to provide this kind of task scheduling. In the current hadoop, one point regarding computationally intensive task is that if the machine is not able to keep up with the

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-06 Thread Devaraj Das
of free memory (so it's not resource starvation). Espen On Thu, Sep 4, 2008 at 3:33 PM, Devaraj Das [EMAIL PROTECTED] wrote: I started a profile of the reduce-task. I've attached the profiling output. It seems from the samples that ramManager.waitForDataToMerge() doesn't actually wait. Has

Re: Sharing Memory across Map task [multiple cores] runing in same machine

2008-09-05 Thread Devaraj Das
Hadoop doesn't support this natively. So if you need this kind of a functionality, you'd need to code your application in such a way. But I am worried about the race conditions in determining which task should first create the ramfs and load the data. If you can provide atomicity in determining

Re: Could not obtain block: blk_-2634319951074439134_1129 file=/user/root/crawl_debug/segments/20080825053518/content/part-00002/data

2008-09-04 Thread Devaraj Das
I started a profile of the reduce-task. I've attached the profiling output. It seems from the samples that ramManager.waitForDataToMerge() doesn't actually wait. Has anybody seen this behavior. This has been fixed in HADOOP-3940 On 9/4/08 6:36 PM, Espen Amble Kolstad [EMAIL PROTECTED]

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
to HDFS, and back since I work with many small files (10kb) and hadoop seem to behave poorly with them. Perhaps HBASE is another option. Is anyone using it in production mode? And do I really need to downgrade to 17.x to install it? -Original Message- From: Devaraj Das [mailto

Re: har/unhar utility

2008-09-03 Thread Devaraj Das
and uploading it. (small files lower transfer speed from 40-70MB/s to hundreds ok kbps :( -Original Message- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 03, 2008 4:00 AM To: core-user@hadoop.apache.org Subject: Re: har/unhar utility You could create a har archive

Re: hadoop 0.17.1 reducer not fetching map output problem

2008-07-24 Thread Devaraj Das
Could you try to kill the tasktracker hosting the task the next time when it happens? I just want to isolate the problem - whether it is a problem in the TT-JT communication or in the Task-TT communication. From your description it looks like the problem is between the JT-TT communication. But pls

Re: hadoop 0.17.1 reducer not fetching map output problem

2008-07-24 Thread Devaraj Das
On 7/25/08 12:09 AM, Andreas Kostyrka [EMAIL PROTECTED] wrote: On Thursday 24 July 2008 15:19:22 Devaraj Das wrote: Could you try to kill the tasktracker hosting the task the next time when it happens? I just want to isolate the problem - whether it is a problem in the TT-JT communication

RE: topology.script.file.name

2008-07-03 Thread Devaraj Das
This is strange. If you don't mind, pls send the script to me. -Original Message- From: Yunhong Gu1 [mailto:[EMAIL PROTECTED] Sent: Thursday, July 03, 2008 9:49 AM To: core-user@hadoop.apache.org Subject: topology.script.file.name Hello, I have been trying to figure out

RE: Release Date of Hadoop 0.17.1

2008-06-19 Thread Devaraj Das
It should be out within a couple of days. As of now voting is on and will end on 23rd. -Original Message- From: Joman Chu [mailto:[EMAIL PROTECTED] Sent: Thursday, June 19, 2008 4:48 PM To: core-user@hadoop.apache.org Subject: Release Date of Hadoop 0.17.1 Hello, I was wondering

RE: Question on HadoopStreaming and Memory Usage

2008-06-15 Thread Devaraj Das
Hadoop does provide a ulimit based way to control the memory consumption by the tasks it spawns via the config mapred.child.ulimit. Look at http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Task+Executi on+%26+Environment However, what is lacking is a way to get the cumulative

RE: Hadoop topology.script.file.name Form

2008-06-09 Thread Devaraj Das
at 9:53 PM, Devaraj Das [EMAIL PROTECTED] wrote: Hi Iver, The implementation of the script depends on your setup. The main thing is that it should be able to accept a bunch of IP addresses and DNS names and be able to give back the rackIDs for each. It is a one-to-one correspondence

RE: problem with streaming map jobs not getting killed

2008-06-09 Thread Devaraj Das
No the PID is not logged. So is it the framework side java tasks not getting killed or is it the Streaming children? By the way, the handling of process groups should be handled better when we have HADOOP-1380. -Original Message- From: Andreas Kostyrka [mailto:[EMAIL PROTECTED] Sent:

RE: Stackoverflow

2008-06-04 Thread Devaraj Das
Hi Andreas, Here is what I did: bin/hadoop jar build/hadoop-0.18.0-dev-examples.jar randomtextwriter -Dtest.randomtextwrite.min_words_key=40 -Dtest.randomtextwrite.max_words_key=50 -Dtest.randomtextwrite.maps_per_host=1 textinput (this would generate 1GB of text data with pretty long sentences.

RE: Stack Overflow When Running Job

2008-06-02 Thread Devaraj Das
Hi, do you have a testcase that we can run to reproduce this? Thanks! -Original Message- From: jkupferman [mailto:[EMAIL PROTECTED] Sent: Monday, June 02, 2008 9:22 AM To: core-user@hadoop.apache.org Subject: Stack Overflow When Running Job Hi everyone, I have a job running

RE: Questions on how to use DistributedCache

2008-05-22 Thread Devaraj Das
-Original Message- From: Taeho Kang [mailto:[EMAIL PROTECTED] Sent: Thursday, May 22, 2008 3:41 PM To: core-user@hadoop.apache.org Subject: Re: Questions on how to use DistributedCache Thanks for your reply. Just one more thing to ask.. From what I see from the source

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
, for a given task tracker, under Non-runnign tasks, there are at least 200 or 300 COMMIT_PENDING tasks. It appears they stuck too. Thanks a lot for your help! Lili On Wed, Apr 30, 2008 at 2:14 PM, Devaraj Das [EMAIL PROTECTED] wrote: Hi Lili, the jobconf memory consumption seems

RE: OOM error with large # of map tasks

2008-05-01 Thread Devaraj Das
Long term we need to see how we can minimize the memory consumption by objects corresponding to completed tasks in the tasktracker. -Original Message- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Friday, May 02, 2008 1:29 AM To: 'core-user@hadoop.apache.org' Subject: RE: OOM

RE: OOM error with large # of map tasks

2008-04-30 Thread Devaraj Das
Hi Lili, the jobconf memory consumption seems quite high. Could you please let us know if you pass anything in the jobconf of jobs that you run? I think you are seeing the 572 objects since a job is running and the TaskInProgress objects for tasks of the running job are kept in memory (but I need

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
Will your requirement be addressed if, from within the map method, you create a sequence file using SequenceFile.createWriter api, write a key/value using the writer's append(key,value) API and then close the file ? You can do this for every key/value. Pls have a look at createWriter APIs and

RE: Reusing jobs

2008-04-18 Thread Devaraj Das
Jason, didn't get that. The jvm should exit naturally even without calling System.exit. Where exactly did you insert the System.exit? Please clarify. Thanks! -Original Message- From: Jason Venner [mailto:[EMAIL PROTECTED] Sent: Friday, April 18, 2008 6:48 PM To:

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
as the other mapper is doing. Devaraj Das [EMAIL PROTECTED] wrote: Will your requirement be addressed if, from within the map method, you create a sequence file using SequenceFile.createWriter api, write a key/value using the writer's append(key,value) API and then close the file ? You

RE: Map Intermediate key/value pairs written to file system

2008-04-18 Thread Devaraj Das
written to file system Yes, but Kayla is likely misguided in this respect. (my apologies for sounding doctrinaire) On 4/18/08 11:08 AM, Devaraj Das [EMAIL PROTECTED] wrote: Ted, note that Kayla wants one file per output key/value. -Original Message- From: Ted Dunning

RE: Counters giving double values

2008-04-17 Thread Devaraj Das
on it? kind regards, ud Devaraj Das [EMAIL PROTECTED] 04/16/2008 01:18 PM Please respond to core-user@hadoop.apache.org To core-user@hadoop.apache.org cc Subject RE: Counters giving double values Pls file a jira for the counter updates part. It will be excellent

RE: _temporary doesn't exist

2008-04-15 Thread Devaraj Das
Hi Grant, could you please copy-paste the exact command you used to run the program. Also the associated config files, etc. will help -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 15, 2008 6:03 PM To: core-user@hadoop.apache.org Subject:

RE: hadoop 0.15.3 r612257 freezes on reduce task

2008-03-28 Thread Devaraj Das
Hi Bradford, Could you please check what your mapred.local.dir is set to? Devaraj. -Original Message- From: Bradford Stephens [mailto:[EMAIL PROTECTED] Sent: Saturday, March 29, 2008 1:54 AM To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.15.3 r612257

RE: [memory leak?] Re: MapReduce failure

2008-03-16 Thread Devaraj Das
It might have something to do with your application itself. By any chance are you doing a lot of huge object allocation (directly or indirectly) within the map method? Which version of hadoop are you on? -Original Message- From: Stefan Groschupf [mailto:[EMAIL PROTECTED] Sent:

RE: [memory leak?] Re: MapReduce failure

2008-03-16 Thread Devaraj Das
the wordcount example many other users report the same problem: See: http://markmail.org/search/?q=org.apache.hadoop.mapred.MapTask %24MapOutputBuffer.collect+order%3Adate-backward Thanks for your help! Stefan On Mar 15, 2008, at 11:02 PM, Devaraj Das wrote: It might have something

RE: Does the local mode of hadoop support pipes?

2008-01-23 Thread Devaraj Das
Pipes won't work in local mode. It assumes support from HDFS. You should be able to run it in a single node pseudo-distributed setup. Devaraj -Original Message- From: Cox Wood [mailto:[EMAIL PROTECTED] Sent: Wednesday, January 23, 2008 1:41 PM To: [EMAIL PROTECTED] Subject: Does the

RE: speculative task execution and writing side-effect files

2008-01-23 Thread Devaraj Das
no purpose). perhaps i am totally off - would like to learn about other people's experience. -Original Message- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Tue 1/22/2008 8:22 PM To: core-user@hadoop.apache.org Subject: RE: speculative task execution and writing side-effect

RE: speculative task execution and writing side-effect files

2008-01-22 Thread Devaraj Das
1. In what situation would speculative task execution kick in if it's enabled It would be based on tasks' progress. A speculative instance of a running task is launched if the task is question is lagging behind the others in terms of progress it has made. It also depends on whether there are