Map job hangs indefinitely

2011-06-21 Thread Sudharsan Sampath
Hi, I am starting a job from the map of another job. Following are quick mock of the code snippets that I use. But the 2nd job hangs indefinitely after the 1st task attempt fails. There is not even a 2nd attempt. This runs fine on a cluster with one node but fails on a two node cluster. Can someo

Re: Map job hangs indefinitely

2011-06-22 Thread Sudharsan Sampath
002_m_00_0' from 'TASK_TRACKER1' Thanks Sudhan S On Wed, Jun 22, 2011 at 12:13 PM, Devaraj K wrote: > With this info it is difficult to find out where the problem is coming. > Can you check the job tracker and task tracker logs related to these jobs? > > > *

Re: controlling no. of mapper tasks

2011-06-22 Thread Sudharsan Sampath
Hi Allen, The number of map tasks is driven by the number of splits of the input provided. The configuration for 'number of map tasks' is only a hint and will be honored only if the value is more than the number of input splits. If its less, then the latter takes higer precedence. But as a hack/w

MapReduce output could not be written

2011-07-05 Thread Sudharsan Sampath
Hi, In one of my jobs I am getting the following error. java.io.IOException: File X could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1282) at org.apache.hadoop.hdfs.server.namenode.NameNod

Re: MapReduce output could not be written

2011-07-05 Thread Sudharsan Sampath
eur. Just asking out of curiosity. > > > On Tue, Jul 5, 2011 at 6:13 PM, Sudharsan Sampath wrote: > >> Hi, >> >> In one of my jobs I am getting the following error. >> >> java.io.IOException: File X could only be r

Hadoop online upgrade

2011-07-07 Thread Sudharsan Sampath
Hi, Is it possible to upgrade to a newer version of hadoop without bringing the cluster down? To my understanding its not. But just wondering.. Thanks Sudharsan S

Re: "java.lang.Throwable: Child Error " And " Task process exit with nonzero status of 1."

2011-07-11 Thread Sudharsan Sampath
Hi, The issue could be attributed to many causes. Few of which are 1) Unable to create logs due to insufficient space in the logs directory, permissions issue. 2) ulimit threshold that causes insuffucient allocation of memory. 3) OOM on the child or unable to allocate the configured memory while

Re: Lack of data locality in Hadoop-0.20.2

2011-07-12 Thread Sudharsan Sampath
what's the map task capacity of each node ? On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti wrote: > Hi, > > I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input > data using a 20 node cluster of nodes. HDFS is configured to use 128MB block > size (so 1600maps are created

Re: Can I define a datastructure that all Mappers share?

2011-08-15 Thread Sudharsan Sampath
Hi, To my knowledge, its not possible with plain map-reduce. But you can try using a distributed cache on top of it. To quote a few try, hazelcast (if ur prog lang is java) or gigaspace. Just a note, why would you want to share date across mappers. It defeats the basic assumption of map-reduce th

Task JVM reuse

2011-08-22 Thread Sudharsan Sampath
Hi, If the task jvm is set to be re-used with a -1 option, when does the jvms exit? >From the JVM Manager class, it looks like its done only when the job completes. Is that right? Thanks Sudharsan S

No of reduce tasks per heartbeat

2011-08-24 Thread Sudharsan Sampath
Hi, I see in the code that while we assign a number of map tasks, we assign only one reduce task per tasktracker during the heartbeat. Is there a brief somewhere on why this design decision is made ? Thanks Sudhan S

Re: Very slow MapReduce Job

2011-08-29 Thread Sudharsan Sampath
Hi, Is it slow compared to your vanilla version of processing serially? Generally Pseudo set ups should be just used to verify the correctness of the program logic and for performance statistics you should run it at a real cluster where you can achieve parallelism and thus its benefits. Thanks Su

Re: FileOutputFormat (getWorkOutputPath)

2011-09-01 Thread Sudharsan Sampath
Hi, Also move the line 'dir = FileOutputFormat.getWorkOutputPath(conf).toString();' to the configure method as the map() is called for every input line. Thanks Sudhan S On Thu, Sep 1, 2011 at 9:11 AM, Kadu canGica Eduardo wrote: > Thanks a lot Harsh J. > Now it is working fine! > > > 2011/8/31

Re: I keep getting multiple values for unique reduce keys

2011-09-04 Thread Sudharsan Sampath
Hi, I suspect it's something to do with your custom Writable. Do you have a clear method on your container? If so, that should be used before the obj is initialized every time to avoid retaining previous values due to object reuse during ser-de process. Thanks Sudhan S On Mon, Sep 5, 2011 at 6

Re: I keep getting multiple values for unique reduce keys

2011-09-05 Thread Sudharsan Sampath
g just has a couple of ArrayLists to gather up Name > and Type objects. > > I suspect I need to extend ArrayWritable instead. I'll try that next. > > Cheers. > > R > > On Sep 4, 2011, at 9:37 PM, Sudharsan Sampath wrote: > > Hi, > > I suspect it

Re: No Mapper but Reducer

2011-09-07 Thread Sudharsan Sampath
This is true and it took as off by surprise in recent past. Also, it had quite some impact on our job cycles where the size of input is totally random and could also be zero at times. In one of our cycles, we run a lot of jobs. Say we configure X as the num of reducers for a job which does not hav

Re: Multiple Mappers and One Reducer

2011-09-07 Thread Sudharsan Sampath
Hi, Its possible by setting the num of reduce tasks to be 1. Based on your example, it looks like u need to group ur records based on "Date, counter1 and counter2". So that should go in the logic of building your key for your map o/p. Thanks Sudhan S On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat

Re: Does anyone have sample code for forcing a custom InputFormat to use a small split

2011-09-12 Thread Sudharsan Sampath
Hi, Which version of Hadoop are u using. With v0.21 hadoop supports split bzip2 compressed files(HADOOP-4012). So you dont even have to read from beginning to end. This patch is also available in cdh3 distribution which I would recommend as 0.21 is not declared suitable for production. Also the

Re: How to sort key,value pair by value(In ascending)

2011-09-13 Thread Sudharsan Sampath
One way is to reverse the output in the mapper to emit<1, 10050> and in the reducer, use a treeset to order ur values.. for each value o/p in the reducer. With this O/P will be sorted as per ur needs within each reducer. If u need a total sorted o/p, u can use a single reducer or design ur part

Avro 1.5.x version compatibility

2011-09-29 Thread Sudharsan Sampath
Hi, We are looking to upgrade avro 1..4.1 to avro 1.5.x version. Does anyone know if this can cause any incompatibility with hadoop cdh3 distro? Thanks Sudhan S

Re: hanging map reduce processes

2011-10-27 Thread Sudharsan Sampath
Hi Henning, I feel it's the non-daemon thread that's causing the issue. A JVM will not exit until all its non-daemon threads have finished. Is there a reason why you want this thread to be non-daemon? If unavoidable, then can you exit this thread when the reducer's job is completed? Thanks Sudhan

Re: HDFS error : Could not Complete file

2011-11-04 Thread Sudharsan Sampath
eatly appreciated. Thanks Sudhan S On Fri, Nov 4, 2011 at 2:47 PM, Sudharsan Sampath wrote: > Hi, > > I have a simple map-reduce program [map only :) ]that reads the input and > emits the same to n outputs on a single node cluster with max map tasks set > to 10 on a 16 core processor machin

Re: Never ending reduce jobs, error Error reading task outputConnection refused

2011-11-08 Thread Sudharsan Sampath
Hi, Also, please make it a point to use only hostnames in your configuration also. Hadoop works entirely on hostname configurations. Thanks Sudhan S On Fri, Nov 4, 2011 at 9:39 PM, Russell Brown wrote: > Done so, working, Awesome and many many thanks! > > Cheers > > Russell > On 4 Nov 2011, at

Re: Issues with Distributed Caching

2011-11-08 Thread Sudharsan Sampath
Hi, We read something similar but donot use FileSystem api. Path[] cacheFiles = DistributedCache.getLocalCacheFiles(jobConf); if (cacheFiles != null) { for (Path cacheFile : cacheFiles) { FileInputStream fis = new FileInputStream(cacheFile.toString()); //L

Fwd: HDFS error : Could not Complete file

2011-11-08 Thread Sudharsan Sampath
Hi, Am really stuck with this issue. If I decrease the number of max map tasks to something like 4, then it runs fine. Does anyone have a clue on the issue. Thanks Sudhan S -- Forwarded message -- From: Sudharsan Sampath Date: Fri, Nov 4, 2011 at 5:10 PM Subject: Re: HDFS error

Re: how to implement error thresholds in a map-reduce job ?

2011-11-17 Thread Sudharsan Sampath
Hi, If you mirror the logic of checking the error condition in both mapper and reducer (from the counters), you have a higher probability that the job will fail as early as possible. The mappers are not guaranteed to get the last updated value of a counter from all the mappers and if it slips thru