Ted and lhztop, here is a gist of my code: http://pastebin.com/mxY4AqBA
Can you suggest few ways of optimizing it? I know i am re-initializing the conf object in the map function everytime its called, i'll change that. Anil Gupta, 6 Node Cluster so 6 Region Servers.. I am basically trying to do a partial join across 3 tables, perform some computation on it and dump into another table.. The first Table is somehwere around 19m rows, 2nd one 1m rows and 3rd table is 2.5m rows.. I know we can use hive/pig for this but i am to write this as a map/reduce application.. For the first table, i created a smaller subset of 100,000 rows and ran it. The output was my first thread message which completed in one hour.. For 19m rows, i cannot imagine it running in a finite time.. Please suggest something.. On Mon, Aug 26, 2013 at 12:03 PM, Pavan Sudheendra <[email protected]>wrote: > Jens, can i set a smaller value in my application? > Is this valid? > conf.setInt("mapred.max.split.size", 50); > > This is our mapred-site.xml: > > <?xml version="1.0" encoding="UTF-8"?> > <configuration> > <property> > <name>mapred.job.tracker</name> > <value>ip-10-10-100170.eu-east-1.compute.internal:8021</value> > </property> > <property> > <name>mapred.job.tracker.http.address</name> > <value>0.0.0.0:50030</value> > </property> > <property> > <name>mapreduce.job.counters.max</name> > <value>120</value> > </property> > <property> > <name>mapred.output.compress</name> > <value>false</value> > </property> > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > </property> > <property> > <name>mapred.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.DefaultCodec</value> > </property> > <property> > <name>mapred.map.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > <property> > <name>mapred.compress.map.output</name> > <value>true</value> > </property> > <property> > <name>zlib.compress.level</name> > <value>DEFAULT_COMPRESSION</value> > </property> > <property> > <name>io.sort.factor</name> > <value>64</value> > </property> > <property> > <name>io.sort.record.percent</name> > <value>0.05</value> > </property> > <property> > <name>io.sort.spill.percent</name> > <value>0.8</value> > </property> > <property> > <name>mapred.reduce.parallel.copies</name> > <value>10</value> > </property> > <property> > <name>mapred.submit.replication</name> > <value>2</value> > </property> > <property> > <name>mapred.reduce.tasks</name> > <value>6</value> > </property> > <property> > <name>mapred.userlog.retain.hours</name> > <value>24</value> > </property> > <property> > <name>io.sort.mb</name> > <value>112</value> > </property> > <property> > <name>mapred.child.java.opts</name> > <value> -Xmx471075479</value> > </property> > <property> > <name>mapred.job.reuse.jvm.num.tasks</name> > <value>1</value> > </property> > <property> > <name>mapred.map.tasks.speculative.execution</name> > <value>false</value> > </property> > <property> > <name>mapred.reduce.tasks.speculative.execution</name> > <value>false</value> > </property> > <property> > <name>mapred.reduce.slowstart.completed.maps</name> > <value>0.8</value> > </property></configuration> > > > Suggest ways to overwrite the default value please. > > > On Mon, Aug 26, 2013 at 9:38 AM, anil gupta <[email protected]> wrote: > >> Hi Pavan, >> >> Standalone cluster? How many RS you are running?What are you trying to >> achieve in MR? Have you tried increasing scanner caching? >> Slow is very theoretical unless we know some more details of your stuff. >> >> ~Anil >> >> >> >> On Sun, Aug 25, 2013 at 5:52 PM, 李洪忠 <[email protected]> wrote: >> >>> You need release your map code here to analyze the question. generally, >>> when map/reduce hbase, scanner with filter(s) is used. so the mapper count >>> is the hbase region count in your hbase table. >>> As the reason why you reduce so slow, I guess, you have an disaster join >>> on the three tables, which cause too many rows. >>> >>> 于 2013/8/26 4:36, Pavan Sudheendra 写道: >>> >>> Another Question, why does it indicate number of mappers as 1? Can i >>>> change it so that multiple mappers perform computation? >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > > > > -- > Regards- > Pavan > -- Regards- Pavan
