Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Resending > > Hi, > I have few input splits that are few MB in size. > I want to submit 1 GB of input to every mapper. Does anyone know how can I do > it ? > Currently each mapper gets one input split that results in many small > map-output files. > > I tried setting -Dmapred.map.min.spli

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Juwei Shi
The input split size is detemined by map.min.split.size, dfs.block.size and mapred.map.tasks. goalSize = totalSize / mapred.map.tasks minSize = max {mapred.min.split.size, minSplitSize} splitSize= max (minSize, min(goalSize, dfs.block.size)) minSplitSize is determined by each InputFormat such as

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Juwei Shi
The following are suitable for hadoop 0.20.2. 2011/5/25 Juwei Shi > The input split size is detemined by map.min.split.size, dfs.block.size and > mapred.map.tasks. > > goalSize = totalSize / mapred.map.tasks > minSize = max {mapred.min.split.size, minSplitSize} > splitSize= max (minSize, min(goa

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Thanks Juwei ! I will go through this.. Sent from my iPhone On May 25, 2011, at 7:51 AM, Juwei Shi wrote: > The following are suitable for hadoop 0.20.2. > > 2011/5/25 Juwei Shi > The input split size is detemined by map.min.split.size, dfs.block.size and > mapred.map.tasks. > > goalSize

Map Reduce when downloading files...

2011-05-25 Thread Michael Giannakopoulos
Hello guys, I have written an application that downloads metadata from 3 groups of Flickr and i implement a map/reduce task so as metadata to be processed by 3 different mappers (each corresponds to one group...). My app runs on single mode, but when i try to run it in a pseudo-distributed mode had

Re: Map Reduce when downloading files...

2011-05-25 Thread Harsh J
Based on your stacktrace, the 'Task' did begin alright. (This is post-configuration/setup) You're getting an NPE on metaFlickrPro\.PhotosDownload$MapClass\.map(PhotosDownload\.java:124) Its not possible for us to tell why since the point that was thrown is from your custom code - and we do not hav

Re: Map Reduce when downloading files...

2011-05-25 Thread Michael Giannakopoulos
Alright, I 'll send you the code (it's an amateur application). Any help is appreciated! (Don't bother with the Flickrj API)... And something else, how do you debug a map/reduce app so as to be sure what happens. I use eclipse and hadoop's plugin for eclipse (Galileo). Thanks a lot! MetaFlickrPro

Re: Map Reduce when downloading files...

2011-05-25 Thread Harsh J
I haven't gone through the whole thing, but getting the Configuration object via a static member "conf" set only during submission (main()) will not work - and is probably why there's an NPE. Use the Context object in the map() call to get a configuration instance. That is the only right way I kno

Re: Map Reduce when downloading files...

2011-05-25 Thread Michael Giannakopoulos
Thanks a lot! I 'll try it...

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
I gave mapred.min.size=10L i.e. 1 GB and each input file is 233 MB and block size = 64 MB. With all these values, i thought my split size would work and 4 input files would be combined to get 1 GB input split but somehow this does not happen and I get 10 mappers , each corresponding to 233

How to store an instance of a class in the Configuration?

2011-05-25 Thread Michael Giannakopoulos
Does anyone knows how to save and how to retrieve an instance of a class using the Configuration class?

Re: How to store an instance of a class in the Configuration?

2011-05-25 Thread Harsh J
Configuration is basically serialized to an XML file and shipped to the worker machines on submission of a job. What are you looking to do exactly, and why can't you instantiate the class again in the tasks? On Wed, May 25, 2011 at 11:30 PM, Michael Giannakopoulos wrote: > Does anyone knows how t

Re: How to store an instance of a class in the Configuration?

2011-05-25 Thread Michael Giannakopoulos
Yes, that's a good idea!!! You 've got a point!

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Hi Harsh, I just implemented a combineFile InputFormat and its record reader for my case. Now my input has 10 files each of 233 MB and by using this, My job just runs 1 mapper that processes them. How can I control it by split size i.e. if i say make every split 1 GB i.e. run 3 mappers for these

Re: How to store an instance of a class in the Configuration?

2011-05-25 Thread Michael Giannakopoulos
Thanks a lot! Your help was invaluable! Those guys like you, who answer to anyone are heroes! Thanks mate! Hope to talk again! :D

Re: How to merge several SequenceFile into one?

2011-05-25 Thread Niels Basjes
Hi, > There is lots of SequenceFile in HDFS, how can I merge them into one > SequenceFile? The simplest way to do that is to create a job that - input format = sequence file - map = identity mapper - reduce = identity reduce - output = sequence file and job.setNumReduceTasks(1) However: I think

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Sorry it is working,, i was not giving right value with -Dmapred.max.split.size. Thanks for your help ! On Wed, May 25, 2011 at 11:34 AM, Mapred Learn wrote: > Hi Harsh, > I just implemented a combineFile InputFormat and its record reader for my > case. > > Now my input has 10 files each of 233

DBOutputFormat with one reducer

2011-05-25 Thread Giridhar Addepalli
Hi, We have MapReduce program which writes data to mysql database using DBOutputFormat. Our program has one reducer. I understand that all the inserts happen during the close() operation of the reducer. Is it gauranteed that this operation is atomic ? i.e; what happens if the writes fail in

Re: DBOutputFormat with one reducer

2011-05-25 Thread Marcos Ortiz
On 05/25/2011 04:27 PM, Giridhar Addepalli wrote: Hi, We have MapReduce program which writes data to mysql database using DBOutputFormat. Our program has one reducer. I understand that all the inserts happen during the close() operation of the reducer. Is it gauranteed that this operatio