Hi Rahul,
Can you please be more specific? Do you want to control mappers running
simultaneously for your job ( I guess ) or the cluster as a whole?
If for your job, and you want to control it on a per node basis, one way is to
allocate more memory to each of your mapper so it occupies more than
Hi,
I see you are using the new APIs, so this should be relevant for you
https://issues.apache.org/jira/browse/MAPREDUCE-118
As you have noticed, in the old APIs the JobClient could be queried using JobID
, which was returned when the job was submitted. There was a thread in
hadoop-dev to
Hi,
I tried testing my odbc build with isql, but I get the following error:
[ISQL]ERROR: Could not SQLAllocEnv
I tried,
dltest /usr/local/lib/libodbchive.so SQLAllocEnv which succeeds, so I guess the
entry point should be found.
Any suggestions anyone?
Amogh
Hi,
Incidentally was looking into a similar thing. The Hive server is not
threadsafe,
https://issues.apache.org/jira/browse/HIVE-187?focusedCommentId=12738494page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12738494
for more.
Amogh
On 6/25/10 7:16 PM, Omer,
Hi,
I'm referring to https://issues.apache.org/jira/browse/HIVE-187 , which has
Linux 32 and 64 bit thrift libs. I noticed that the 64 bit lib doesn't contain
the fb303 module, unlike the 32 bit compilation. I'm trying to build one for my
use, but if you have it handy will be of great help to
Do I need to remove and re-create the whole file?
Simply put, as of now, yes. Append functionality is being made available to
users to add to end of file though :)
Amogh
On 6/22/10 1:56 PM, elton sky eltonsky9...@gmail.com wrote:
hello everyone,
I noticed there are 6 operations in HDFS:
Since the scale of input data and operations of each reduce task is the same,
what may cause the execution time of reduce tasks different?
You should consider looking at the copy, shuffle and reduce times separately
from JT UI to get better info. Many (dynamic) considerations like network
Hi,
Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the
following
1. Compile the streaming jar files with your own custom classes and run on ec2
using this custom jar ( should work for 18.3 . Make sure you pick compatible
streaming classes )
2. Jar up your classes
Hi,
Quick couple of questions,
Is the namenode formatted and the daemon started?
Can you ssh w/o password?
Amogh
On 6/2/10 5:03 PM, Khaled BEN BAHRI khaled.ben_ba...@it-sudparis.eu wrote:
Hi :)
I installed hadoop and i tried to store data in hdfs
but any command i want to execute like fs
warning.
WARN mapred.JobClient: No job jar file set. User classes may not be
found. See JobConf(Class) or JobConf#setJar(String).
So the results were incorrect.
Thanks,
Mo
On Wed, Jun 2, 2010 at 4:56 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
Depending on what hadoop version ( 0.18.3
)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Thanks,
Mo
On Wed, Jun 2, 2010 at 8:40 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
You might need to add
-Dstream.shipped.hadoopstreaming=path_to_your_custom_streaming_jar
Amogh
On 6/2/10 5:10
Hi,
The default partitioner is - hashcode(key) MODULO number_of_reducers, so its
pretty much possible.
Can I change this hash function in anyway?
Sure, any custom partitioner can be plugged in. Check o.a.h.mapreduce.partition
or the secondary sort example on mapred tutorial for more.
On a side
Hi All,
Is there a documentation I can refer to while attempting to connect MSTR v8.* /
v9 to Hive, probably some FAQs or cookbook or even a blog ;) ?
Any inputs appreciated.
Thanks,
Amogh
Hi,
InetAddress.getLocalHost() should give you the hostname for each mapper/reducer
Amogh
On 5/6/10 8:39 PM, Alan Miller alan.mil...@synopsys.com wrote:
Not sure if this is the right list for this question, but.
Is it possible to determine which host actually processed my MR job?
Regards,
Hi,
Pass the -D property in command line. eg:
Hadoop fs -Ddfs.block.size=multiple of checksum .
You can check if its actually set the way you needed by hadoop fs -stat %o
file
HTH,
Amogh
On 4/14/10 9:01 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote:
I thought I saw a way to
Hi,
The file system object will contain the scheme, authority etc for the given uri
or path. The conf object acts as reference ( unable to get a better terminology
) to this info.
Looking at the MapFileOutputFormat should help provide better understanding as
to how writers and readers are
Hi,
(#maxmapTasksperTaskTracker + #maxreduceTasksperTaskTracker) * JVMHeapSize
PhysicalMemoryonNode
The tasktracker and datanode daemons also take up memory, 1GB each by default I
think. Is that accounted for?
Could there be an issue with HDFS data or metadata taking up memory?
Is the
Hi,
Piggybacking on Gang’s reply, to add files / dirs recursively you can use the
filestatus, liststatus to determine if its a file or dir and add as needed (
check FileStatus API for this ) There is a patch which does this for
FileInputFormat
Hi,
AFAIK, it is a hint. Depending on the block size, minimum split size and this
hint the exact number of splits is computed. So if you have total_size/hint
block size but greater than min split size, you should see the exact number.
This is how I understand it, please let me know if I'm
Hi Gang,
Yes, the time to distribute files is considered as jobs running time ( more
specifically the set up time ). The time is essentially for the the TT to copy
the files specified in distributed cache to its local FS, generally from HDFS
unless you have a separate FS for JT. So in general
Hi,
The property mapred.jobtracker.completeuserjobs.maximum property specifies the
number of jobs to be kept on JT page at any time. After this they are available
under history page. Probably setting this to 0 will do the trick?
Amogh
On 3/17/10 10:09 PM, Raymond Jennings III
Hi,
http://hadoop.apache.org/common/docs/current/native_libraries.html
Should answer your questions.
Amogh
On 3/18/10 10:48 PM, jiang licht licht_ji...@yahoo.com wrote:
I got the following error when I tried to do gzip compression on map output,
using hadoop-0.20.1.
settings in
Hi,
Not sure if this can be done.
Here's a relevant snippet of code:
{
super(inputCounter, conf, reporter);
combinerClass = cls;
keyClass = (ClassK) job.getMapOutputKeyClass();
valueClass = (ClassV) job.getMapOutputValueClass();
comparator = (RawComparatorK)
map-reduce
completes). Does the name node need to store the metadata of each individual
file during the unpacking for this case?
-Michael
On Feb 25, 2010, at 10:31 PM, Amogh Vasekar wrote:
Hi,
The number of mappers initialized depends largely on your input format ( the
getSplits of your
Hi,
The number of mappers initialized depends largely on your input format ( the
getSplits of your input format) , (almost all) input formats available in
hadoop derive from fileinputformat, hence the 1 mapper per file block notion (
this actually is 1 mapper per split ).
You say that you have
file writing besides
the context.write() for the intermediate records.
Thanks, Tim
Am 24.02.2010 05:28, schrieb Amogh Vasekar:
Hi,
Can you let us know what is the value for :
Map input records
Map spilled records
Map output bytes
Is there any side effect file written?
Thanks,
Amogh
Hi,
Can you let us know what is the value for :
Map input records
Map spilled records
Map output bytes
Is there any side effect file written?
Thanks,
Amogh
On 2/23/10 8:57 PM, Tim Kiefer tim-kie...@gmx.de wrote:
No... 900GB is in the map column. Reduce adds another ~70GB of
FILE_BYTES_WRITTEN
Hi,
Can you please let us know what platform you are running on your hadoop
machines?
For gzip and lzo to work, you need supported hadoop native libraries ( I
remember reading on this somewhere in hadoop wiki :) )
Amogh
On 2/23/10 8:16 AM, jiang licht licht_ji...@yahoo.com wrote:
I have a
--- On Mon, 2/22/10, Amogh Vasekar am...@yahoo-inc.com wrote:
From: Amogh Vasekar am...@yahoo-inc.com
Subject: Re: java.io.IOException: Spill failed when using w/ GzipCodec for Map
output
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Monday, February 22, 2010, 11:27 PM
Hi,
Can
So, considering this situation of loading mixed good and corrupted .gz
files, how to still get expected results?
Try manipulating the value mapred.max.map.failures.percent to a % of files you
expect to be corrupted / acceptable data skip percent.
Amogh
On 2/21/10 7:17 AM, jiang licht
Hi,
The hadoop meet last year has some very interesting business solutions
discussed:
http://www.cloudera.com/company/press-center/hadoop-world-nyc/
Most of the companies in there have shared their methodology on their blogs /
on slideshare.
One I have handy is:
Hi Ankit,
however the the issue that i am facing that I was expecting all the maps to
finish before any reduce starts.
This is exactly how it happens, reducers poll map tasks for data and begin user
code only after all maps complete.
when is closed function called after every map or after all
Hi,
Yes the same location is populated with different values ( returned by
iter.next() ) for optimization reasons. There is a new patch which will allow
you to mark() and reset() iterator so that you buffer required values (
equivalently you can do that yourself, its anyways in-mem for the
redistribution in this case? If that is the
case, can a custom scheduler be written -- will it be any easy task?
Regards,
Raghava.
On Thu, Feb 4, 2010 at 2:52 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
Will there be a re-assignment of Map Reduce nodes by the Master?
In general using available
has to be defined before the job is started, right?
But because I don't know the value of K beforehand,
I want the chain to continue forever until some counter in reduce task is zero.
Felix Halim
On Thu, Feb 4, 2010 at 3:53 PM, Amogh Vasekar am...@yahoo-inc.com wrote:
However, from ri to m(i+1
Hi,
A shot in the dark, is the conf file in your classpath? If yes, are the
parameters you are trying to override marked final?
Amogh
On 2/4/10 3:18 AM, Gang Luo lgpub...@yahoo.com.cn wrote:
Hi,
I am writing script to run whole bunch of jobs automatically. But the
configuration file doesn't
(jobConf); Do something to check
termination condition}
If I write something like that in the code, would not the Map node run on the
same data chunk it has each time? Will there be a re-assignment of Map Reduce
nodes by the Master?
Regards,
Raghava.
On Wed, Feb 3, 2010 at 9:59 AM, Amogh
However, from ri to m(i+1) there is an unnecessary barrier. m(i+1) should not
need to wait for all reducers ri to finish, right?
Yes, but r(i+1) cant be in the same job, since that requires another sort and
shuffle phase ( barrier ). So you would end up doing, job(i) : m(i)r(i)m(i+1) .
Hi,
For global line numbers, you would need to know the ordering within each split
generated from the input file. The standard input formats provide offsets in
splits, so if the records are of equal length you can compute some kind of
numbering.
I remember someone had implemented sequential
-program.html.
You particular solution won't work, because I need to do additional processing
between the two passes.
--gordon
On Wed, Nov 25, 2009 at 1:50 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Amogh
On 1/28/10 4:03 PM, Ravi ravindra.babu.rav...@gmail.com wrote:
Thank you Amogh.
On Thu, Jan 28
Hi Gang,
Yes PathFilters work only on file paths. I meant you can include such type of
logic at split level.
The input format's getSplits() method is responsible for computing and adding
splits to a list container, for which JT initializes mapper tasks. You can
override the getSplits() method
Hi,
In general, the file split may break the records, its the responsibility of the
record reader to present the record as a whole. If you use standard available
InputFormats, the framework will make sure complete records are presented in
key,value.
Amogh
On 1/29/10 9:04 AM, Udaya Lakshmi
Yes, parameter is mapred.task.timeout in mS.
You can also update status / output to stdout after some time chunks to avoid
this :)
Amogh
On 1/28/10 10:52 AM, prasenjit mukherjee pmukher...@quattrowireless.com
wrote:
Now I see. The tasks are failing with the following error message :
*Task
this property only on
master's hadoop-site.xml will do or I need to do it on all the slaves as
well ?
Any way I can do this from PIG ( or I guess I am asking too much here :) )
On Thu, Jan 28, 2010 at 10:57 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Yes, parameter is mapred.task.timeout in mS
Hi,
To elaborate a little on Gang's point, the buffer threshold is limited by
io.sort.spill.percent, during which spills are created. If the number of spills
is more than min.num.spills.for.combine, combiner gets invoked on the spills
created before writing to disk.
I'm not sure what exactly
Can I tell hadoop to save the map outputs per reducer to be able to inspect
what's in them
You can set keep.tasks.files.pattern will save mapper output, set this regex to
match your job/task as need be. But this will eat up a lot of local disk space.
The problem most likely is your data ( or
Hi,
Can you elaborate on your case a little?
If you need sort and shuffle ( ie outputs of different reducer tasks of R1 to
be aggregated in some way ) , you have to write another map-red job. If you
need to process only local reducer data ( ie your reducer output key is same as
input key ),
HDFS.
-Thanks for the pointer.
Prasen
On Tue, Jan 19, 2010 at 10:47 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
When NN is in safe mode, you get a read-only view of the hadoop file
system. ( since NN is reconstructing its image of FS )
Use hadoop dfsadmin -safemode get to check
Hi,
Do your steps qualify as separate MR jobs? Then using JobClient APIs should
be more than sufficient for such dependencies.
You can add the whole output directory as input to another one to read all
files, and provide PathFilter to ignore any files you don't want to be
processed, like side
Hi,
When NN is in safe mode, you get a read-only view of the hadoop file system. (
since NN is reconstructing its image of FS )
Use hadoop dfsadmin -safemode get to check if in safe mode.
hadoop dfsadmin -safemode leave to leave safe mode forcefully. Or use hadoop
dfsadmin -safemode wait to
and the new APIs. I was digging for that answer for awhile. Thanks.
--- On Tue, 1/12/10, Amogh Vasekar am...@yahoo-inc.com wrote:
From: Amogh Vasekar am...@yahoo-inc.com
Subject: Re: Is it possible to share a key across maps?
To: common-user@hadoop.apache.org common-user@hadoop.apache.org,
raymondj
Hi,
I ran into a very similar situation quite some time back and had then
encountered this : http://issues.apache.org/jira/browse/HADOOP-475
After speaking to a few Hadoop folks, they had said complete cloning was not a
straightforward option for some optimization reasons.
There were a few
Hi,
Can you please let us know your system configuration running hadoop?
The error you see is when the reducer is copying its respective map output into
memory. The parameter mapred.job.shuffle.input.buffer.percent can be
manipulated for this ( a bunch of others will also help you optimize sort
Hi,
I believe you need to add the partition file to distributed cache so that all
tasks have it.
The terasort code uses this sampler, you can refer to that if needed.
Amogh
On 12/15/09 5:06 PM, afarsek adji...@gmail.com wrote:
Hi,
I'm using the InputSampler.RandomSampler to perform a
.
I further assume, I need only apply the latest patch, which is 5.
Am I correct.
On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar am...@yahoo-inc.com wrote:
http://issues.apache.org/jira/browse/MAPREDUCE-370
You'll have to work around for now / try to apply patch.
Amogh
On 12/9/09 8:54 PM
#.
I didn't use SkipBadRecords class. I think by default the feature is disabled.
So, it should have nothing to do with this.
I do my test using tables of TPC-DS. If I run my job on some 'toy tables' I
make, the statistics is correct.
-Gang
- 原始邮件
发件人: Amogh Vasekar am...@yahoo
Hi,
The counters are updated as the records are *consumed*, for both mapper and
reducer. Can you confirm if all the values returned by your iterators are
consumed on reduce side? Also, are you having feature of skipping bad records
switched on?
Amogh
On 12/11/09 4:32 AM, Gang Luo
http://issues.apache.org/jira/browse/MAPREDUCE-370
You'll have to work around for now / try to apply patch.
Amogh
On 12/9/09 8:54 PM, Geoffry Roberts geoffry.robe...@gmail.com wrote:
Aaron,
I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
lib.output.MultipleOutputs. I'm
Hi,
If the file doesn’t exist, java will error out.
For partial skips, o.a.h.mapreduce.Mapper class provides a method run(), which
determines if the end of split is reached and if not, calls map() on your k,v
pair. You may override this method to include flag checks too and if that
fails, the
Hi,
Please try removing the combiner and running.
I know that if you use multiple outputs from within a mapper, those k,v pairs
are not a part of sort and shuffle phase. Your combiner is same as reducer
which uses mos, and might be an issue on map side. If I'm to take a guess, mos
writes to a
Hi,
What are your intermediate output K,V class formats? “Text” format is
inherently UTF-8 encoded. If you want end-to-end processing to be via gbk
encoding, you may have to write a custom writable type.
Amogh
On 11/30/09 7:09 PM, 郭鹏 gpcus...@gmail.com wrote:
I know the default output coder
Hi,
Task slots reuse JVM over the course of entire job right? Specifically, would
like to point to :
http://issues.apache.org/jira/browse/MAPREDUCE-453?focusedCommentId=12619492page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12619492
Thanks,
Amogh
On 11/30/09 5:44
();
System.out.println(mapred.input.file=+cfg.get(mapred.input.file));
displays null, so maybe this fell out by mistake in the api change?
Regards
Saptarshi
On Thu, Nov 26, 2009 at 2:13 AM, Saptarshi Guha
saptarshi.g...@gmail.com wrote:
Thank you.
Regards
Saptarshi
On Thu, Nov 26, 2009 at 2:10 AM, Amogh
Hi,
I'm not sure if this will apply to your case since i'm not aware of the common
part of job2:mapper and job3:mapper but would like to give it a shot.
The whole process can be combined into a single mapred job. The mapper will
read a record and process till the saved data part , then for each
Hi,
For near real time performance you may try Hbase. I had read about Streamy
doing this, and their hadoop-world-nyc ppt is available on their blog:
http://devblog.streamy.com/2009/07/24/streamy-hadoop-summit-hbase-goes-realtime/
Amogh
On 11/25/09 1:31 AM, onur ascigil
Hi,
keep.tasks.files.pattern is what you need, as the name suggests its a pattern
match on intermediate outputs generated.
Wrt to copying map data to hdfs, your mappers close() method should help you
achieve this, but might slow up your tasks.
Amogh
On 11/23/09 8:08 AM, Jeff Zhang
Hi,
This is the time for all three phases of reducer right?
I think its due to the constant spilling for a single key to disk since the map
partitions couldn't be held in-mem due to buffer limit. Did the other reducer
have numerous keys with low number of values ( ie smaller partitions? )
MultipleOutputFormat and MOS are to be merged :
http://issues.apache.org/jira/browse/MAPREDUCE-370
Amogh
On 11/18/09 12:03 PM, Y G gymi...@gmail.com wrote:
in the old MR API ,there is MutilOutputFormat class which i can use to
custom the reduce output file name.
it's very useful for me.
but i
I would like the connection management to live separately
from the mapper instances per node.
The JVM reuse option in Hadoop might be helpful for you in this case.
Amogh
On 11/16/09 6:22 AM, yz5od2 woods5242-outdo...@yahoo.com wrote:
Hi,
a) I have a Mapper ONLY job, the job reads in records,
Hi Mark,
A future release of Hadoop will have a MultipleInputs class, akin to
MultipleOutputs. This would allow you to have a different inputformat, mapper
depending on the path you are getting the split from. It uses special
Delegating[mapper/input] classes to resolve this. I understand
Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.
Amogh
On 11/2/09 2:02 PM, Zhang Bingjun (Eddy)
Mark,
Set-up for a mapred job consumes a considerable amount of time and resources
and so, if possible a single job is preferred.
You can add multiple paths to your job, and if you need different processing
logic depending upon the input being consumed, you can use parameter
map.input.file in
Hi,
Rebalancer should help you : http://issues.apache.org/jira/browse/HADOOP-1652
Amogh
On 10/28/09 2:54 PM, Vibhooti Verma verma.vibho...@gmail.com wrote:
Hi All,
We are facing the issue with distribution of data in a cluster where nodes
have differnt storage capacity.
We have 4 nodes with
Hi Bhushan,
If splitting input files is an option, why don't you let hadoop do this for
you? If need be you may use a custom input format and sequencefile*outputformat.
Amogh
On 10/27/09 7:55 PM, bhushan_mahale bhushan_mah...@persistent.co.in wrote:
Hi Jason,
Thanks for the reply.
The string
Hi,
Many options available here. You can use jobconf (0.18 ) / context.conf (0.20)
to pass these lines across all tasks ( assuming the size isnt relatively large
) and use configure / setup to retrieve these.. Or use distributed cache to
read a file containing these lines ( possibly with jvm
For skipping failed tasks try : mapred.max.map.failures.percent
Amogh
On 10/21/09 8:58 AM, 梁景明 futur...@gmail.com wrote:
hi, I use hadoop0.20 and 8 nodes, there is a job that has 130 map to run
,and completed 128 map,
but only 2 map fail ,and its fail in my case is accepted ,but the job fail
Yahoo! Had an Everest MPP framework based on columnar storage, don't know how
popular it was, but required pretty high end machines. Zebra I guess partially
aims at getting that into Hadoop using t-file implementation, and its source is
available in contrib.
Amogh
On 10/19/09 10:18 AM,
Hi,
Check the distributed cache APIs, it provides various functionalities to
distribute and add jars to classpath on compute machines.
Amogh
On 10/19/09 3:38 AM, yz5od2 woods5242-outdo...@yahoo.com wrote:
Hi,
What is the preferred method to distribute the classes (in various
Jars) to my
Hi,
It would be more helpful if you provide the exact error here.
Also, hadoop uses the local FS to store intermediate data, along with HDFS for
final output.
If your job is memory intensive, try limiting the number of tasks you are
running in parallel on a machine.
Amogh
On 10/19/09 8:27 AM,
Nguyen Dinh munt...@gmail.com wrote:
Thanks Amogh. For my application, I want each map task reports to me
where it's running. However, I have no idea how to use Java
Inetaddress APIs to get that info. Could you explain more?
Van
On Wed, Oct 14, 2009 at 2:16 PM, Amogh Vasekar am...@yahoo-inc.com
Hi,
I guess configure is now setup(), and using toolrunner can create a
configuration / context to mimic the required behavior.
Thanks,
Amogh
-Original Message-
From: Amandeep Khurana [mailto:ama...@gmail.com]
Sent: Tuesday, October 06, 2009 5:43 AM
To: common-user@hadoop.apache.org
Hi Huang,
Haven't worked with Hbase but in general,
If you want to have control over what data split to go as a whole to mapper,
easiest way is to compress that split in single file; making as many split
files as needed. If you need to know what file is currently being processed,
you can use
Along with partitioner, try to plug in a combiner. It would provide significant
performance gains. Not sure about the algo you use, but might have to tweak
that a little to facilitate a combiner.
Thanks,
Amogh
-Original Message-
From: Chandraprakash Bhagtani
I believe framework checks timestamps on HDFS for marking an already available
copy of the file valid or invalid, since the archived files are not cleaned up
till a certain du limit is reached, and no apis for cleanup available. There
was a thread on this some time back on the list.
Amogh
Hi,
Please check the namenode heap usage. Your cluster may be having too many files
to handle / too little free space. It is generally available in the UI. This is
one of the causes I have seen for the Timeout.
Amogh
-Original Message-
From: Kunsheng Chen [mailto:ke...@yahoo.com]
Sent:
Hi All,
Regarding the JVM reuse feature incorporated, it says reuse is generally
recommended for streaming and pipes jobs. I'm a little unclear on this and any
pointers will be appreciated.
Also, in what scenarios will this feature be helpful for java mapred jobs?
Thanks,
Amogh
Hi,
Funny enough was looking at it just yesterday.
http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Task+JVM+Reuse
Thanks,
Amogh
-Original Message-
From: Zhimin [mailto:wan...@cs.umb.edu]
Sent: Tuesday, September 15, 2009 10:53 PM
To: core-u...@hadoop.apache.org
: DistributedCache purgeCache()
Thanks for your swift response.
But where can I find deletecache()?
Thanks.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Thu 9/3/2009 2:44 PM
To: common-user@hadoop.apache.org
Subject: RE: DistributedCache purgeCache()
AFAIK
Before setting the task limits, do take into account the memory considerations
( many archive posts on this can be found ).
Also, your tasktracker and datanode daemons will run on that machine as well,
so you might want to set aside some processing power for that.
Cheers!
Amogh
-Original
Have a look at jobclient, it should suffice.
Cheers!
Amogh
-Original Message-
From: bharath vissapragada [mailto:bharathvissapragada1...@gmail.com]
Sent: Friday, September 04, 2009 9:15 PM
To: common-user@hadoop.apache.org
Subject: Re: Some issues!
Hey ,
I have one more doubt ,
Hi,
Mapper is used to process the K,V pair passed to it, MapRunnable is an
interface, when implemented is responsible for generating a conforming K,V
pair and pass it to Mapper.
Cheers!
Amogh
-Original Message-
From: Rakhi Khatwani [mailto:rkhatw...@gmail.com]
Sent: Thursday, August
boxes.
Do you have any suggestion? I am thinking about JVM re-use feature of Hadoop or
I can set up a chain of two map-reduce pairs.
Best regards.
Fang.
On Mon, Aug 24, 2009 at 1:25 PM, Amogh Vasekar
am...@yahoo-inc.commailto:am...@yahoo-inc.com wrote:
No, but if you want a reducer like
Hadoop will make sure that every k,v pair with same key will land up in same
reducer and consumed in a single reduce instance.
-Original Message-
From: Nipun Saggar [mailto:nipun.sag...@gmail.com]
Sent: Tuesday, August 25, 2009 10:41 AM
To: common-user@hadoop.apache.org
Subject: Re:
I'm not sure that is the case with Hadoop. I think its assigning reduce task to
an available tasktracker at any instant; Since a reducer polls JT for completed
maps. And if it were the case as you said, a reducer wont be initialized until
all maps have completed , after which copy phase would
PM
To: common-user@hadoop.apache.org
Subject: Re: MR job scheduler
Amogh
i think Reduce phase starts only when all the map phases are completed .
Because it needs all the values corresponding to a particular key!
2009/8/21 Amogh Vasekar am...@yahoo-inc.com
I'm not sure that is the case
across the network(because already
many values to that key are on that machine where the map phase completed)..
2009/8/21 Amogh Vasekar am...@yahoo-inc.com
Yes, but the copy phase starts with the initialization for a reducer, after
which it would keep polling for completed map tasks to fetch
Hi,
GenericOptionsParser is customized only for Hadoop specific params :
* codeGenericOptionsParser/code recognizes several standarad command
* line arguments, enabling applications to easily specify a namenode, a
* jobtracker, additional configuration resources etc.
Ideally, all params
While setting mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum, please consider the memory usage your
application might have since all tasks will be competing for the same and might
reduce overall performance.
Thanks,
Amogh
-Original Message-
From:
10 mins reminds me of parameter mapred.task.timeout . This is configurable. Or
alternatively you might just do a sysout to let tracker know of its existence (
not an ideal solution though )
Thanks,
Amogh
-Original Message-
From: Mathias De Maré [mailto:mathias.dem...@gmail.com]
Sent:
Maybe I'm missing the point, but in terms of execution performance benefit,
what does copying to dfs and then compressing to be fed to a map/reduce job
provide? Isn't it better to compress offline / outside latency window and
make available on dfs?
Also, your mapreduce program will launch one
1 - 100 of 109 matches
Mail list logo