This basically happens while running a mapreduce job. When a map reduce job
is triggered the job files are put in hdfs with high replication (
replication is controlled by - 'mapred.submit.replication' default value
is 10).
The job files are cleaned up after the job is completed and hence that
Hi Henry
You can change the secondary name node storage location by overriding the
property 'fs.checkpoint.dir' in your core-site.xml
On Wed, Apr 17, 2013 at 2:35 PM, Henry Hung ythu...@winbond.com wrote:
Hi All,
** **
What is the property name of Hadoop 1.0.4 to change secondary
Hi Marcos,
You need to consider the slots based on the available memory
Available Memory = Total RAM - (Memory for OS + Memory for Hadoop Daemons
like DN,TT + Memory for other servicess if any running in that node)
Now you need to consider the generic MR jobs planned on your cluster. Say
if
Hi Amit
Are you seeing any errors or warnings on JT logs?
Regards
Bejoy KS
Hi Rahul
If you look at larger cluster and jobs that involve larger input data sets.
The data would be spread across the whole cluster, and a single node might
have various blocks of that entire data set. Imagine you have a cluster
with 100 map slots and your job has 500 map tasks, now in that
+1 for Hadoop Operations
On Tue, Apr 16, 2013 at 3:57 PM, MARCOS MEDRADO RUBINELLI
marc...@buscapecompany.com wrote:
Tadas,
Hadoop Operations has pretty useful, up-to-date information. The chapter
on hardware selection is available here:
more number of mappers and less number of mappers slots.
Regards,
Rahul
On Tue, Apr 16, 2013 at 2:40 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi Rahul
If you look at larger cluster and jobs that involve larger input data
sets. The data would be spread across the whole cluster
then you need lesser volume of data per
reducer for better performance results.
In general it is better to have the number of reduce tasks slightly less than
the number of available reduce slots in the cluster.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message
.
You can round this value and use it to set the number of reducers in conf
programatically.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Manoj Babu manoj...@gmail.com
Date: Wed, 21 Nov 2012 23:28:00
To: user@hadoop.apache.org
Cc: bejoy.had
it.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: jamal sasha jamalsha...@gmail.com
Date: Wed, 21 Nov 2012 14:50:51
To: user@hadoop.apache.orguser@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt
Hi..
I guess i am asking alot
Hi Pankaj
AFAIK You can do the same. Just provide the properties like mapper class,
reducer class, input format, output format etc using -D option at run time.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Pankaj Gupta pan...@brightroll.com
Date
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Mark Kerzner mark.kerz...@shmsoft.com
Date: Wed, 14 Nov 2012 17:05:20
To: Hadoop Useruser@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Strange error in Hive
Hi,
I am trying to insert a table in hive
.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Manoj Babu manoj...@gmail.com
Date: Thu, 15 Nov 2012 10:03:24
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Setting up a edge node to submit jobs
Hi,
How to setup a edge node
code can be
efficient as yours would very specific to your app but the MR in hive and pig
may be more generic.
To just write your custom mapreduce functions, just basic knowledge on java is
good. As you are better with java you can understand the internals better.
Regards
Bejoy KS
Sent from
.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Sigurd Spieckermann sigurd.spieckerm...@gmail.com
Date: Mon, 22 Oct 2012 22:29:15
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Data locality of map-side join
Hi guys,
I've been trying
was not available in new mapreduce API at that point.
Now mapreduce API is pretty good and you can go ahead with that for
development. AFAIK mapreduce API is the future.
Let's wait for a commiter to officially comment on this.
Regards
Bejoy KS
Sent from handheld, please excuse typos
Hi Manoj
You can get the file in a readable format using
hadoop fs -text fileName
Provided you have lzo codec within the property 'io.compression.codecs' in
core-site.xml
A 'hadoop fs -ls' command would itself display the file size.
Regards
Bejoy KS
Sent from handheld, please excuse typos
Hi Jay
Counters are reported at the end of a task to JT. So if a task fails the
counters from that task are not send to JT and hence won't be included in the
final value of counters from that Job.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Jay
)
Regards
Bejoy KS
--
View this message in context:
http://hadoop-common.472056.n3.nabble.com/Hadoop-installation-on-mac-tp3999520p3999535.html
Sent from the Users mailing list archive at Nabble.com.
Hi Murthy
Hadoop - The definitive Guide by Tom White has the details on file write
anatomy.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: murthy nvvs murthy_n1...@yahoo.com
Date: Wed, 10 Oct 2012 04:27:58
To: user@hadoop.apache.orguser
Hi Nisha
The current stable version is the 1.0.x releases. This is well suited for
production environments.
0.23.x/2.x.x releases is of alpha quality and hence not that recommended on
production.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From
tasks when the number of
input splits/map tasks are large which is quite common.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: centerqi hu cente...@gmail.com
Date: Sun, 7 Oct 2012 23:28:55
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
aggregated sum and count for each key.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: iwannaplay games funnlearnfork...@gmail.com
Date: Fri, 5 Oct 2012 12:32:28
To: useru...@hbase.apache.org; u...@hadoop.apache.org;
hdfs-userhdfs-user@hadoop.apache.org
aggregated sum and count for each key.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: iwannaplay games funnlearnfork...@gmail.com
Date: Fri, 5 Oct 2012 12:32:28
To: useru...@hbase.apache.org; user@hadoop.apache.org;
hdfs-userhdfs-u...@hadoop.apache.org
Hi Sadak
AFAIK HADOOP_HEAPSIZE determines the jvm size of the daemons like NN,JT,TT,DN
etc.
mapred.child.java.opts and mapred.child.ulimit is used to set the jvm heap for
child jvms launched for each map/reduce task launched.
Regards
Bejoy KS
Sent from handheld, please excuse typos
be accessible from this client.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Kartashov, Andy andy.kartas...@mpac.ca
Date: Thu, 4 Oct 2012 16:51:35
To: user@hadoop.apache.orguser@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: RE: copyFromLocal
I
Hi
You need to alter the value of mapred.max.split size to a value larger than
your block size to have less number of map tasks than the default.
On Tue, Oct 2, 2012 at 10:04 PM, Shing Hing Man mat...@yahoo.com wrote:
I am running Hadoop 1.0.3 in Pseudo distributed mode.
When I submit a
Sorry for the typo, the property name is mapred.max.split.size
Also just for changing the number of map tasks you don't need to modify the
hdfs block size.
On Tue, Oct 2, 2012 at 10:31 PM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi
You need to alter the value of mapred.max.split size
Hi Shing
Is your input a single file or set of small files? If latter you need to use
CombineFileInputFormat.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Shing Hing Man mat...@yahoo.com
Date: Tue, 2 Oct 2012 10:38:59
To: user
to distributed cache
Sent: Oct 2, 2012 05:44
Hi all
How do you add a small file to distributed cache in MR program
Regards
Abhi
Sent from my iPhone
Regards
Bejoy KS
Sent from handheld, please excuse typos.
Hi Anna
If you want to increase the block size of existing files. You can use a
Identity Mapper with no reducer. Set the min and max split sizes to your
requirement (512Mb). Use SequenceFileInputFormat and SequenceFileOutputFormat
for your job.
Your job should be done.
Regards
Bejoy KS
Hi Oliver
I have scribbled a small post on reduce side joins ,
the implementation matches with your requirement
http://kickstarthadoop.blogspot.in/2011/09/joins-with-plain-map-reduce.html
Regards
Bejoy KS
Hi Ravi
You can take a look at mockito
http://books.google.co.in/books?id=Nff49D7vnJcCpg=PA138lpg=PA138dq=mockito+%2B+hadoopsource=blots=IifyVu7yXpsig=Q1LoxqAKO0nqRquus8jOW5CBiWYhl=ensa=Xei=b2pjULHSOIPJrAeGsIHwAgved=0CC0Q6AEwAg#v=onepageq=mockito%20%2B%20hadoopf=false
On Thu, Sep 27, 2012 at
Hi
If you don't want either key or value in the output, just make the
corresponding data types as NullWritable.
Since you just need to filter out a few records/itemd from your logs,
reduce phase is not mandatory just a mappper would suffice your needs. From
your mapper just output the records
Hi Peter
AFAIK oozie has a mechanism to achieve this. You can trigger your jobs as
soon as the files are written to a certain hdfs directory.
On Tue, Sep 25, 2012 at 10:23 PM, Peter Sheridan
psheri...@millennialmedia.com wrote:
These are log files being deposited by other processes, which
that run DNs. You can verify the current value using
'ulimit -n' and then try increasing the same to a much higher value.
Regards
Bejoy KS
')
LOCATION '/user/myuser/MapReduceOutput/2012/09/11';
Like this you need to register each of the paritions. After this your query
should work as desired.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Nataraj Rashmi - rnatar rashmi.nata...@acxiom.com
Date: Thu
Hi Natraj
Create a partitioned table and add the sub dirs as partitions. You need to have
some logic in place for determining the partitions. Say if the sub dirs denote
data based on a date then make date as the partition.
Regards
Bejoy KS
Sent from handheld, please excuse typos
Hi Lin
The default value for number of reducers is 1
namemapred.reduce.tasks/name
value1/value
It is not determined by data volume. You need to specify the number of
reducers for your mapreduce jobs as per your data volume.
Regards
Bejoy KS
On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang
Hi Lin
The default values for all the properties are in
core-default.xml
hdfs-default.xml and
mapred-default.xml
Regards
Bejoy KS
On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang lin.yang.ja...@gmail.comwrote:
Hi, Bejoy
Thanks for you reply.
where could I find the default value
for that db.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Yaron Gonen yaron.go...@gmail.com
Date: Tue, 11 Sep 2012 15:41:26
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Some general questions about DBInputFormat
Hi,
After
Hi Yogesh
The detailed steps are available in hadoop wiki on FAQ page
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
Regrads
Bejoy KS
On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari yogeshdh
Hi Manoj
From my limited knowledge on file appends in hdfs , i have seen more
recommendations to use sync() in the latest releases than using append().
Let us wait for some commiter to authoritatively comment on 'the production
readiness of append()' . :)
Regards
Bejoy KS
On Mon, Sep 10, 2012
Hi Manoj
You can load daily logs into a individual directories in hdfs and process them
daily. Keep those results in hdfs or hbase or dbs etc. Every day do the
processing, get the results and aggregate the same with the previously
aggregated results till date.
Regards
Bejoy KS
Sent from
Hi Prashant
Welcome to Hadoop Community. :)
Hadoop is meant for processing large data volumes. Saying that, for your
custom requirements you should write your own mapper and reducer that
contains your business logic for processing the input data. Also you can
have a look at hive and pig, which
Hi
You can change the replication factor of an existing directory using
'-setrep'
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep
The below command will recursively set the replication factor to 1 for all
files within the given directory '/user'
hadoop fs -setrep -w 1 -R
Hi Uddipan
As Harsh mentioned, replication factor is a client side property . So you
need to update the value for 'dfs.replication' in hdfs-site.xml as per your
requirement in your edge nodes or from the machines your are copying files
to hdfs. If you are using some of the existing DN's for this
)
You need to have the exact configuration files and hadoop jars from the
cluster machines on this tomcat environment as well. I mean on the classpath of
your application.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Visioner Sadak visioner.sa
Hi Udayani
By default hadoop works well for linux and linux based OS. Since you are on
Windows you need to install and configure ssh using cygwin before you start
hadoop daemons.
On Tue, Sep 4, 2012 at 6:16 PM, Udayini Pendyala udayini_pendy...@yahoo.com
wrote:
Hi,
Following is a
Hi Francesco
TextInputFormat reads line by line based on '\n' by default, there the key
values is the position offset and the line contents respectively. But in
your case it is just a sequence of integers and also it is Binary. Also you
require the offset for each integer value and not offset by
HI Abhay
The TaskTrackers on which the reduce tasks are triggered is chosen in
random based on the reduce slot availability. So if you don't need the
reduce tasks to be scheduled on some particular nodes you need to set
'mapred.tasktracker.reduce.tasks.maximum' on those nodes to 0. The
bottleneck
, Bejoy Ks bejoy.had...@gmail.com wrote:
HI Abhay
The TaskTrackers on which the reduce tasks are triggered is chosen in
random based on the reduce slot availability. So if you don't need the
reduce tasks to be scheduled on some particular nodes you need to set
Hi Gaurav
You can get the information on the num of map tasks in the job from the JT web
UI itself.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Gaurav Dasgupta gdsay...@gmail.com
Date: Wed, 29 Aug 2012 13:14:11
To: user@hadoop.apache.org
Reply
.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Abhay Ratnaparkhi abhay.ratnapar...@gmail.com
Date: Tue, 28 Aug 2012 19:40:58
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: one reducer is hanged in reduce- copy phase
Hello,
I have a MR
Hi Abhay
What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to /tmp
the contents would be deleted on a OS restart. You need to change this location
before you start your NN.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Abhay
/joins-with-plain-map-reduce.html
Regards
Bejoy KS
for
number of splits won't hold. If small files are there then definitely the
number of maps tasks should be more.
Also did you change the split sizes as well along with block size?
Regards
Bejoy KS
/ -files to distribute jars or files
Regards
Bejoy KS
to the map task
node.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Matthias Kricke matthias.mk.kri...@gmail.com
Sender: matthias.zeng...@gmail.com
Date: Mon, 13 Aug 2012 16:33:06
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re
/HBaseIntegration
Regards
Bejoy KS
To understand more details on the working, i have just scribbled something
long back, may be it can help you start off
http://kickstarthadoop.blogspot.in/2011/04/word-count-hadoop-map-reduce-example.html
Regards
Bejoy KS
Hi Andy
Is your hadoop.tmp.dir or dfs.name.dir configured to /tmp? If so it can
happen as /tmp dir gets wiped out on OS restarts
Regards
Bejoy KS
That is a good pointer Harsh.
Thanks a lot.
But if IdentityMapper is being used shouldn't the job.xml reflect that? But
Job.xml always shows mapper as our CustomMapper.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Harsh J ha...@cloudera.com
Date
Ok Got it now. That is a good piece of information.
Thank You :)
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Harsh J ha...@cloudera.com
Date: Fri, 3 Aug 2012 16:28:27
To: mapreduce-user@hadoop.apache.org; bejoy.had...@gmail.com
Cc: Mohammad
on that as well.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Mohammad Tariq donta...@gmail.com
Date: Thu, 2 Aug 2012 15:48:42
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Reading fields from a Text line
!..
Regards
Bejoy KS
reduce
tasks there is no gaurentee that one task will be scheduled on each node.
It can be like 2 in one node and 1 in another.
Regards
Bejoy KS
Hi Nathan
Alternatively you can have a look at Sqoop , which offers efficient data
transfers between rdbms and hdfs.
Regards
Bejoy KS
the framework triggers Identity Mapper instead of the custom
mapper provided with the configuration.
This seems like a bug to me . Filed a jira to track this issue
https://issues.apache.org/jira/browse/MAPREDUCE-4507
Regards
Bejoy KS
);
job.setMapOutputValueClass(Text.class);
//setting the final reduce output data type classes
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Regards
Bejoy KS
.
With these two steps you can ensure that a task is attempted only once.
These properties to be set in mapred-site.xml or at job level.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Marco Gallotta ma...@gallotta.co.za
Date: Thu, 2 Aug 2012 16:52:00
To: common
Hi
Why not use 'hadoop fs -getMerge outputFolderInHdfs targetFileNameInLfs'
while copying files out of hdfs for the end users to consume. This will merge
all the files in 'outputFolderInHdfs' into one file and put it in lfs.
Regards
Bejoy KS
Sent from handheld, please excuse typos
that runs
mapreduce jobs, for a non security enabled cluster it is mapred.
You need to increase this to a laarge value using
mapred soft nproc 1
mapred hard nproc 1
If you are running on a security enabled cluster, this value should be
raised for the user who submits the job.
Regards
Bejoy KS
Hi Dinesh
Try using $HADOOP_HOME/bin/start-all.sh . It starts all the hadoop
daemons including TT and DN.
Regards
Bejoy KS
Hi Keith
Your NameNode is not up still. What does the NN logs say?
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: anil gupta anilgupt...@gmail.com
Date: Fri, 27 Jul 2012 11:30:57
To: common-user@hadoop.apache.org
Reply-To: common-user
Hi Tariq
KeyValueTextInputFormat is available from hadoop 1.0.1 version on
wards for the new mapreduce API
http://hadoop.apache.org/common/docs/r1.0.1/api/org/apache/hadoop/mapreduce/lib/input/KeyValueTextInputFormat.html
Regards
Bejoy KS
On Wed, Jul 25, 2012 at 8:07 PM, Mohammad Tariq donta
Hi Oleg
From the job tracker page, you can get to the failed tasks and see
which was the file split processed by that task. The split information
is available under the status column for each task.
The file split information is not available on job history.
Regrads
Bejoy KS
On Tue, Jul 24
Hi Jay
Did you try
hadoop job -kill-task task-id ? And is that not working as desired?
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: jay vyas jayunit...@gmail.com
Date: Fri, 20 Jul 2012 17:17:58
To: common-user@hadoop.apache.orgcommon-user
Hi Yogesh
Is your dfs.name.dir pointing to /tmp dir? If so try changing that to any other
dir . The contents of /tmp may get wiped out on OS restarts.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: yogesh.kuma...@wipro.com
Date: Fri, 20 Jul 2012 06
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Yuvrajsinh Chauhan yuvraj.chau...@elitecore.com
Date: Thu, 19 Jul 2012 15:16:24
To: hdfs-user@hadoop.apache.org
Reply-To: hdfs-user@hadoop.apache.org
Subject: RE: Hadoop filesystem directories not visible
Dear
not visible
Thanks Bejoy!!
On Thu, Jul 19, 2012 at 3:22 PM, Bejoy KS bejoy.had...@gmail.com wrote:
Hi Saniya
In hdfs the directory exists only as meta data in the name node. There is no
real hierarchical existence like normal file system. It is the data in the
files that is stored as hdfs
Hi Prabhjot
Yes, Just use the filesystem commands
hadoop fs -copyFromLocal src fs path destn hdfs path
Regards
Bejoy KS
On Thu, Jul 19, 2012 at 3:49 PM, iwannaplay games
funnlearnfork...@gmail.com wrote:
Hi,
I am unable to use sqoop and want to load data in hdfs for testing,
Is there any
.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Robert Dyer psyb...@gmail.com
Date: Thu, 12 Jul 2012 23:03:02
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Jobs randomly not starting
I'm using Hadoop 1.0.3
Hi Sanchita
Try your code after commenting the following Line of code,
//conf.setInputFormat(TextInputFormat.class);
AFAIK This explicitly sets the input format as TextInputFormat instead
of MultipleInput and hence the compiler throws an error stating 'no
input path specified'.
Regards
Bejoy
Regards
Bejoy KS
Sent from handheld, please excuse typos.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
Hi Anurag,
To add on, you can also change the replication of exiting files by
hadoop fs -setrep
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep
On Tue, Jun 26, 2012 at 7:42 PM, Bejoy KS bejoy.had...@gmail.com wrote:
Hi Anurag,
The easiest option would be , in your map
Regards
Bejoy KS
Sent from handheld, please excuse typos.
Hi Pedro
In simple terms Streaming API is used in hadoop if you have your mapper or
reducer is in any language other than java . Say ruby or python.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Pedro Costa psdc1...@gmail.com
Date: Sat, 16 Jun
very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?
Thanks for advice.
Ondrej Klimpera
Regards
Bejoy KS
Sent from handheld
You can follow the documents for 0.20.x . It is almost same for 1.0.x as well.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Alpha Bagus Sunggono bagusa...@gmail.com
Date: Thu, 14 Jun 2012 17:15:16
To: common-user@hadoop.apache.org
Reply
? Is it required for the Map Reduce to
execute on the machines which has the data stored (DFS)?
Bejoy: MR framework takes care of this. Map tasks consider data locality.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Girish Ravi giri...@srmtech.com
Date: Tue
Hi Girish
You can achice this using reduce side joins. Use MultipleInputFormat for
parsing two different sets of log files.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Girish Ravi giri...@srmtech.com
Date: Tue, 12 Jun 2012 12:59:32
To add on, have a look at hive and pig. Those are perfect fit for similar use
cases.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: Bejoy KS bejoy.had...@gmail.com
Date: Tue, 12 Jun 2012 13:04:33
To: mapreduce-user@hadoop.apache.org
Reply
Hi
If your intension is controlling the number of attempts every task make, then
the property to be tweaked is
mapred.map.max.attempts
The default value is 4, for no map task re attempts make it 1.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From
Hi Subbu,
The file/split processed by a mapper could be obtained from
WebUI as soon as the job is executed. However this detail can't be
obtained once the job is moved to JT history.
Regards
Bejoy
On Thu, May 3, 2012 at 6:25 PM, Kasi Subrahmanyam
kasisubbu...@gmail.com wrote:
Hi,
Hi Sumadhur,
The easier approach is to make the hostname of the new NN same as the old one,
else you'll have to update the new one on config files across cluster.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: sumadhur sumadhur_i...@yahoo.com
Date
Hi Mete
A custom Paritioner class can control the flow of keys to the desired reducer.
It gives you more control on which key to which reducer.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: mete efk...@gmail.com
Date: Fri, 27 Apr 2012 09:19:21
IdentityReducer is being
triggered.
Regards
Bejoy KS
Sent from handheld, please excuse typos.
-Original Message-
From: kasi subrahmanyam kasisubbu...@gmail.com
Date: Tue, 17 Apr 2012 19:10:33
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re
(theClass)
//set final/reduce output key value types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class)
If both map output and reduce output key value types are the same you
just need to specify the final output types.
Regards
Bejoy KS
On Tue, Apr 17
1 - 100 of 196 matches
Mail list logo