Hi Owen O'Malley,
Thank you for that Instant reply. It's working now. Can you explain me
what you mean by input to reducer is reused in little detail?
On Tue, Mar 20, 2012 at 11:28 AM, Owen O'Malley omal...@apache.org wrote:
On Mon, Mar 19, 2012 at 10:52 PM, madhu phatak phatak@gmail.com
On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak phatak@gmail.com wrote:
Hi Owen O'Malley,
Thank you for that Instant reply. It's working now. Can you explain me
what you mean by input to reducer is reused in little detail?
Each time the statement Text value = values.next(); is executed
Le 3/19/12 6:39 PM, Mathias Herberts a écrit :
does it work under user hdfs?
Yes, using user hdfs command is fine.
On Mar 19, 2012 6:32 PM, Olivier Sallou olivier.sal...@irisa.fr wrote:
Hi,
I have installed Hadoop 1.0 using .deb package.
I tried to configure superuser groups but it
Le 3/19/12 8:00 PM, Harsh J a écrit :
The right property for your version of Hadoop is
dfs.permissions.supergroup. Change the property name, restart NN,
and your 'root' user should behave as a superuser afterwards.
It works thanks!
Wiki is however wrong, I gonna create a bug to fix the
Hi all,
we have a cluster with 32 machines and running C# version of wordcount
program on it.
Map phase is done by different machines but Reduce is only done by one
machine. Our data is around 7G text data and by using one machine for
Reduce phase this job is doing so slowly.
Is there any
Hi Mausoud
Set -D mapred.reduce.tasks=n; ie to any higher value.
Sent from BlackBerry® on Airtel
-Original Message-
From: Masoud mas...@agape.hanyang.ac.kr
Date: Tue, 20 Mar 2012 17:52:58
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Increasing
I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB
ethernet connection)
After triggering any MR job, its taking like 3-5 seconds to launch ( I mean
the time when I can see any MR job completion % on the screen).
I know internally its trying to launch the job,intialize mappers,
Thanks for reply,
as you know in this way we will have n final result too,
is this any way to increase the number of Reducer for fast computation
but have only one final result?
B.S
Masoud
On 03/20/2012 07:02 PM, bejoy.had...@gmail.com wrote:
Hi Mausoud
Set -D mapred.reduce.tasks=n; ie to
Hi,
First, it sounds like you have 2 6 core CPUs giving you 12 cores not 24.
Even though the OS reports 24 cores that's the hyper threading and not real
cores.
This becomes an issue with respect to tuning.
To answer your question ...
You have a single 1TB HD. That's going to be a major
Hi Masoud
One reducer would definitely emit one output file. If you are looking at
just one file as your final result in lfs, Then once you have the MR job done
use hadoop fs -getmerge .
Sent from BlackBerry® on Airtel
-Original Message-
From: Masoud mas...@agape.hanyang.ac.kr
Thanks Bejoy, that makes sense .
If I want to know the different record's original file, I need to
put an extra file id into the mapper's output value, then get it in the
reducer .
Do you have any other ideas
Thanks!.
On Tue, Mar 20, 2012 at 6:09 PM,Bejoy Ks
Yes, if you are having more than 2 files to be compared against then, the
file name/ id is required from mapper. If it is just two files and you
just want to know which lines are not unique then just the line no would be
good but if you are looking at more granular info like the exact changes in
Thanks a lot!
On Tue, Mar 20, 2012 at 7:13,Bejoy Ks bejoy.had...@gmail.com wrote:
Yes, if you are having more than 2 files to be compared against then, the
file name/ id is required from mapper. If it is just two files and you
just want to know which lines are not unique then just the line
thanks chris!
On Tue, Mar 20, 2012 at 6:30 AM, Chris White chriswhite...@gmail.comwrote:
Setting sortComparatorClass will allow you to configure a
RawComparator implementation (allowing you to do more efficient
comparisons at the byte level). If you don't set it then hadoop uses
the
Hi, Gayatri
On 03/20/2012 11:59 AM, Gayatri Rao wrote:
Hi all,
I am running a map reduce job in EC2 instances and it seems to be very
slow. It takes hours together for simple projection and aggregation of
data.
What filesystem are you using for data storage: HDFS in EC2 or Amazon S3?
Which
Hi All,
We are using DistributedCache.addFileToClassPath to have jars as well as a
property file available in our classpath.
For some reason, the property file cannot be found in our classpath, but
the jars are found.
Is there something specific to the implementation of addFileToClassPath
that
Unless something has changed recently it won't automatically relocate
the blocks. When I did something similar I had a script that walked
through the whole set of files that were misreplicated and increased
the replication factor then dropped it back down. This triggered
relocation of blocks to
Thanks for your reply and script. Hopefully it still apply to 0.20.203
As far as I play with test cluster. The balancer would take care of
replica placement.
I just don't want to fall into the situation that the hdfs sit in the
safemode
for hours and users can't use hadoop and start yelping.
John has already addressed your concern. I'd only like to add that
fixing of replication violations does not require your NN to be in
safe mode and it won't be. Your worry can hence be voided :)
On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum
patai.sangbutsara...@turn.com wrote:
Thanks for
Thanks you all.
On Tue, Mar 20, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote:
John has already addressed your concern. I'd only like to add that
fixing of replication violations does not require your NN to be in
safe mode and it won't be. Your worry can hence be voided :)
On Wed, Mar
Hi All I noticed there is something strange in my Fair Share Scheduler monitor
GUI, the SUMof the Faire Share Value is always about 30 even there is only one
M/R Job is running, so I don't know whether the value is about the usage
percentage, if it was the percentage, that explains why all
You are right, Dieter. The linux diff regards a file as a list, but I
only want to treat it as a set. Sorry I did't make it clear at begining .
On Tue, Mar 20, 2012 at 7:33 PM,Dieter Plaetinck die...@plaetinck.be
wrote:
the diff command on linux (i.e. gnu diffutils) is way more involved than
Hi,
For the first time when I added a new hadoop location, I could not see the
hadoop.job.ugi parameter in the advanced tab. But still I continued to add
the location and the location appeared in the Map/Reduce locations panel. I
could see DFS Locations tree in the project explorer. When I
23 matches
Mail list logo