Hi,
I am starting a job from the map of another job. Following are quick mock of
the code snippets that I use. But the 2nd job hangs indefinitely after the
1st task attempt fails. There is not even a 2nd attempt. This runs fine on a
cluster with one node but fails on a two node cluster.
Can someo
002_m_00_0' from 'TASK_TRACKER1'
Thanks
Sudhan S
On Wed, Jun 22, 2011 at 12:13 PM, Devaraj K wrote:
> With this info it is difficult to find out where the problem is coming.
> Can you check the job tracker and task tracker logs related to these jobs?
>
>
> *
Hi Allen,
The number of map tasks is driven by the number of splits of the input
provided. The configuration for 'number of map tasks' is only a hint and
will be honored only if the value is more than the number of input splits.
If its less, then the latter takes higer precedence.
But as a hack/w
Hi,
In one of my jobs I am getting the following error.
java.io.IOException: File X could only be replicated to 0 nodes, instead of
1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1282)
at
org.apache.hadoop.hdfs.server.namenode.NameNod
eur. Just asking out of curiosity.
>
>
> On Tue, Jul 5, 2011 at 6:13 PM, Sudharsan Sampath wrote:
>
>> Hi,
>>
>> In one of my jobs I am getting the following error.
>>
>> java.io.IOException: File X could only be r
Hi,
Is it possible to upgrade to a newer version of hadoop without bringing the
cluster down? To my understanding its not. But just wondering..
Thanks
Sudharsan S
Hi,
The issue could be attributed to many causes. Few of which are
1) Unable to create logs due to insufficient space in the logs directory,
permissions issue.
2) ulimit threshold that causes insuffucient allocation of memory.
3) OOM on the child or unable to allocate the configured memory while
what's the map task capacity of each node ?
On Tue, Jul 12, 2011 at 6:15 PM, Virajith Jalaparti wrote:
> Hi,
>
> I was trying to run the Sort example in Hadoop-0.20.2 over 200GB of input
> data using a 20 node cluster of nodes. HDFS is configured to use 128MB block
> size (so 1600maps are created
Hi,
To my knowledge, its not possible with plain map-reduce. But you can try
using a distributed cache on top of it. To quote a few try, hazelcast (if ur
prog lang is java) or gigaspace.
Just a note, why would you want to share date across mappers. It defeats the
basic assumption of map-reduce th
Hi,
If the task jvm is set to be re-used with a -1 option, when does the jvms
exit?
>From the JVM Manager class, it looks like its done only when the job
completes. Is that right?
Thanks
Sudharsan S
Hi,
I see in the code that while we assign a number of map tasks, we assign only
one reduce task per tasktracker during the heartbeat.
Is there a brief somewhere on why this design decision is made ?
Thanks
Sudhan S
Hi,
Is it slow compared to your vanilla version of processing serially?
Generally Pseudo set ups should be just used to verify the correctness of
the program logic and for performance statistics you should run it at a real
cluster where you can achieve parallelism and thus its benefits.
Thanks
Su
Hi,
Also move the line 'dir = FileOutputFormat.getWorkOutputPath(conf).toString();'
to the configure method as the map() is called for every input line.
Thanks
Sudhan S
On Thu, Sep 1, 2011 at 9:11 AM, Kadu canGica Eduardo
wrote:
> Thanks a lot Harsh J.
> Now it is working fine!
>
>
> 2011/8/31
Hi,
I suspect it's something to do with your custom Writable. Do you have a
clear method on your container? If so, that should be used before the obj is
initialized every time to avoid retaining previous values due to object
reuse during ser-de process.
Thanks
Sudhan S
On Mon, Sep 5, 2011 at 6
g just has a couple of ArrayLists to gather up Name
> and Type objects.
>
> I suspect I need to extend ArrayWritable instead. I'll try that next.
>
> Cheers.
>
> R
>
> On Sep 4, 2011, at 9:37 PM, Sudharsan Sampath wrote:
>
> Hi,
>
> I suspect it
This is true and it took as off by surprise in recent past. Also, it had
quite some impact on our job cycles where the size of input is totally
random and could also be zero at times.
In one of our cycles, we run a lot of jobs. Say we configure X as the num of
reducers for a job which does not hav
Hi,
Its possible by setting the num of reduce tasks to be 1. Based on your
example, it looks like u need to group ur records based on "Date, counter1
and counter2". So that should go in the logic of building your key for your
map o/p.
Thanks
Sudhan S
On Wed, Sep 7, 2011 at 3:02 PM, Sahana Bhat
Hi,
Which version of Hadoop are u using. With v0.21 hadoop supports split bzip2
compressed files(HADOOP-4012). So you dont even have to read from beginning
to end.
This patch is also available in cdh3 distribution which I would recommend as
0.21 is not declared suitable for production.
Also the
One way is to reverse the output in the mapper to emit<1,
10050> and in the reducer, use a treeset to order ur values.. for each value
o/p in the reducer.
With this O/P will be sorted as per ur needs within each reducer. If u need
a total sorted o/p, u can use a single reducer or design ur part
Hi,
We are looking to upgrade avro 1..4.1 to avro 1.5.x version. Does anyone
know if this can cause any incompatibility with hadoop cdh3 distro?
Thanks
Sudhan S
Hi Henning,
I feel it's the non-daemon thread that's causing the issue. A JVM will not
exit until all its non-daemon threads have finished. Is there a reason why
you want this thread to be non-daemon? If unavoidable, then can you exit
this thread when the reducer's job is completed?
Thanks
Sudhan
eatly appreciated.
Thanks
Sudhan S
On Fri, Nov 4, 2011 at 2:47 PM, Sudharsan Sampath wrote:
> Hi,
>
> I have a simple map-reduce program [map only :) ]that reads the input and
> emits the same to n outputs on a single node cluster with max map tasks set
> to 10 on a 16 core processor machin
Hi,
Also, please make it a point to use only hostnames in your configuration
also. Hadoop works entirely on hostname configurations.
Thanks
Sudhan S
On Fri, Nov 4, 2011 at 9:39 PM, Russell Brown wrote:
> Done so, working, Awesome and many many thanks!
>
> Cheers
>
> Russell
> On 4 Nov 2011, at
Hi,
We read something similar but donot use FileSystem api.
Path[] cacheFiles = DistributedCache.getLocalCacheFiles(jobConf);
if (cacheFiles != null)
{
for (Path cacheFile : cacheFiles)
{
FileInputStream fis = new FileInputStream(cacheFile.toString());
//L
Hi,
Am really stuck with this issue. If I decrease the number of max map tasks
to something like 4, then it runs fine. Does anyone have a clue on the
issue.
Thanks
Sudhan S
-- Forwarded message --
From: Sudharsan Sampath
Date: Fri, Nov 4, 2011 at 5:10 PM
Subject: Re: HDFS error
Hi,
If you mirror the logic of checking the error condition in both mapper and
reducer (from the counters), you have a higher probability that the job
will fail as early as possible. The mappers are not guaranteed to get the
last updated value of a counter from all the mappers and if it slips thru
26 matches
Mail list logo