Resending >
> Hi,
> I have few input splits that are few MB in size.
> I want to submit 1 GB of input to every mapper. Does anyone know how can I do
> it ?
> Currently each mapper gets one input split that results in many small
> map-output files.
>
> I tried setting -Dmapred.map.min.spli
The input split size is detemined by map.min.split.size, dfs.block.size and
mapred.map.tasks.
goalSize = totalSize / mapred.map.tasks
minSize = max {mapred.min.split.size, minSplitSize}
splitSize= max (minSize, min(goalSize, dfs.block.size))
minSplitSize is determined by each InputFormat such as
The following are suitable for hadoop 0.20.2.
2011/5/25 Juwei Shi
> The input split size is detemined by map.min.split.size, dfs.block.size and
> mapred.map.tasks.
>
> goalSize = totalSize / mapred.map.tasks
> minSize = max {mapred.min.split.size, minSplitSize}
> splitSize= max (minSize, min(goa
Thanks Juwei !
I will go through this..
Sent from my iPhone
On May 25, 2011, at 7:51 AM, Juwei Shi wrote:
> The following are suitable for hadoop 0.20.2.
>
> 2011/5/25 Juwei Shi
> The input split size is detemined by map.min.split.size, dfs.block.size and
> mapred.map.tasks.
>
> goalSize
Hello guys,
I have written an application that downloads metadata from 3 groups of
Flickr and i implement a map/reduce task so as metadata to be processed by 3
different mappers (each corresponds to one group...). My app runs on single
mode, but when i try to run it in a pseudo-distributed mode had
Based on your stacktrace, the 'Task' did begin alright. (This is
post-configuration/setup)
You're getting an NPE on
metaFlickrPro\.PhotosDownload$MapClass\.map(PhotosDownload\.java:124)
Its not possible for us to tell why since the point that was thrown is
from your custom code - and we do not hav
Alright, I 'll send you the code (it's an amateur application). Any help is
appreciated! (Don't bother with the Flickrj API)... And something else, how
do you debug a map/reduce app so as to be sure what happens. I use eclipse
and hadoop's plugin for eclipse (Galileo). Thanks a lot!
MetaFlickrPro
I haven't gone through the whole thing, but getting the Configuration
object via a static member "conf" set only during submission (main())
will not work - and is probably why there's an NPE.
Use the Context object in the map() call to get a configuration
instance. That is the only right way I kno
Thanks a lot! I 'll try it...
I gave mapred.min.size=10L i.e. 1 GB and each input file is 233 MB
and block size = 64 MB.
With all these values, i thought my split size would work and 4 input files
would be combined to get 1 GB input split but somehow this does not happen
and I get 10 mappers , each corresponding to 233
Does anyone knows how to save and how to retrieve an instance of a class
using the Configuration class?
Configuration is basically serialized to an XML file and shipped to
the worker machines on submission of a job. What are you looking to do
exactly, and why can't you instantiate the class again in the tasks?
On Wed, May 25, 2011 at 11:30 PM, Michael Giannakopoulos
wrote:
> Does anyone knows how t
Yes, that's a good idea!!! You 've got a point!
Hi Harsh,
I just implemented a combineFile InputFormat and its record reader for my
case.
Now my input has 10 files each of 233 MB and by using this, My job just runs
1 mapper that processes them.
How can I control it by split size i.e. if i say make every split 1 GB i.e.
run 3 mappers for these
Thanks a lot! Your help was invaluable! Those guys like you, who answer to
anyone are heroes! Thanks mate! Hope to talk again! :D
Hi,
> There is lots of SequenceFile in HDFS, how can I merge them into one
> SequenceFile?
The simplest way to do that is to create a job that
- input format = sequence file
- map = identity mapper
- reduce = identity reduce
- output = sequence file
and
job.setNumReduceTasks(1)
However: I think
Sorry it is working,, i was not giving right value with
-Dmapred.max.split.size.
Thanks for your help !
On Wed, May 25, 2011 at 11:34 AM, Mapred Learn wrote:
> Hi Harsh,
> I just implemented a combineFile InputFormat and its record reader for my
> case.
>
> Now my input has 10 files each of 233
Hi,
We have MapReduce program which writes data to mysql database using
DBOutputFormat.
Our program has one reducer.
I understand that all the inserts happen during the close() operation of
the reducer.
Is it gauranteed that this operation is atomic ? i.e; what happens if
the writes fail in
On 05/25/2011 04:27 PM, Giridhar Addepalli wrote:
Hi,
We have MapReduce program which writes data to mysql database using
DBOutputFormat.
Our program has one reducer.
I understand that all the inserts happen during the close() operation
of the reducer.
Is it gauranteed that this operatio
19 matches
Mail list logo