Using Hive 0.8.1 on Amazon EMR Hadoop Job. Some problems with using mapjoin: 1) Exceed memory, I got the following errors. Then I remove mapjoin in the query and instead set hive.auto.convert.join=true, thinking that let hive decides when mapjoin is suitable. It does run much farther in the job, but then another similar error towards the end. 2) The I tried with mapjoin in the same query before and then set hive.mapjoin.localtask.max.memory.usage=3, same exact error. My questions is that is there any other settings I can use to increase mapjoin memory or hashtable size? Or is there any other better options? 2013-05-25 11:37:39 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 11:37:57 Processing rows: 200000 Hashtable size: 199999 Memory usage: 776687416 rate: 0.833 2013-05-25 11:38:00 Processing rows: 215031 Hashtable size: 215031 Memory usage: 813018320 rate: 0.872 2013-05-25 11:38:00 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information
Task failed! Task ID: Stage-10 Logs: 3) Please look at the four errors example below. My other question is that of all the runs with that mapjoin error, there is a pattern, the mapjoin is done on bu, which is a job before the error, all the error happens just shy of 4 rows of the bu table mapjoin, I found this too much of a coincidence, can someone please offer some insight?:#1:215035 Rows loaded to hdfs://10.190.182.26:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-35-49_100_9059150281675034748/-ext-10000 MapReduce Jobs Launched: Job 0: Map: 18 Reduce: 13 Accumulative CPU: 139.31 sec HDFS Read: 1226414348 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 54.1 sec HDFS Read: 687306237 HDFS Write: 695722636 SUCCESS Job 2: Map: 16 Accumulative CPU: 89.09 sec HDFS Read: 695838641 HDFS Write: 703096594 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 42 seconds 500 msec OK Time taken: 108.206 seconds OK Time taken: 0.013 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525233737_8911fdca-6536-45bb-aac3-19b92b3de99c.log 2013-05-25 11:37:39 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 11:37:57 Processing rows: 200000 Hashtable size: 199999 Memory usage: 776687416 rate: 0.833 2013-05-25 11:38:00 Processing rows: 215031 Hashtable size: 215031 Memory usage: 813018320 rate: 0.872 2013-05-25 11:38:00 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_23-37-37_320_2027014861824847272/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-10 Logs:#2Table default.badurls stats: [num_partitions: 0, num_files: 18, num_rows: 0, total_size: 701922144, raw_data_size: 0] 214618 Rows loaded to hdfs://10.46.205.55:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_23-12-54_513_8781300101638774300/-ext-10000 MapReduce Jobs Launched: Job 0: Map: 21 Reduce: 13 Accumulative CPU: 142.11 sec HDFS Read: 1225164183 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 53.25 sec HDFS Read: 686157231 HDFS Write: 694562725 SUCCESS Job 2: Map: 18 Accumulative CPU: 92.26 sec HDFS Read: 694650326 HDFS Write: 701922144 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 47 seconds 620 msec OK Time taken: 104.744 seconds OK Time taken: 0.013 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525231414_3cc50fdd-7e7a-4bcf-baab-b465804c6e49.log 2013-05-25 11:14:41 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 11:15:04 Processing rows: 200000 Hashtable size: 199999 Memory usage: 776903128 rate: 0.833 2013-05-25 11:15:09 Processing rows: 214614 Hashtable size: 214614 Memory usage: 811592328 rate: 0.871 2013-05-25 11:15:09 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_23-14-39_270_9001068894438685980/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: Stage-10 Logs:#3212590 Rows loaded to hdfs://10.96.162.172:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_22-48-32_821_7905326830391538266/-ext-10000 MapReduce Jobs Launched: Job 0: Map: 19 Reduce: 13 Accumulative CPU: 139.04 sec HDFS Read: 1224727949 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 53.03 sec HDFS Read: 685799930 HDFS Write: 694158841 SUCCESS Job 2: Map: 17 Accumulative CPU: 90.31 sec HDFS Read: 694245117 HDFS Write: 701443582 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 42 seconds 380 msec OK Time taken: 105.029 seconds OK Time taken: 0.012 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525225050_e3192035-d205-4ba3-9ac1-442c072d725d.log 2013-05-25 10:50:20 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 10:50:47 Processing rows: 200000 Hashtable size: 199999 Memory usage: 775795904 rate: 0.832 2013-05-25 10:50:50 Processing rows: 212586 Hashtable size: 212586 Memory usage: 813714632 rate: 0.873 2013-05-25 10:50:50 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_22-50-17_863_2904729913146511194/-local-10006/HashTable-Stage-6/MapJoin-bu-21--.hashtable Execution failed with exit status: 2 Obtaining error information#4212271 Rows loaded to hdfs://10.4.26.233:9000/mnt/var/lib/hive_081/tmp/scratch/hive_2013-05-25_22-19-40_060_2510727650477507447/-ext-10000 MapReduce Jobs Launched: Job 0: Map: 21 Reduce: 13 Accumulative CPU: 142.28 sec HDFS Read: 1224302437 HDFS Write: 2179 SUCCESS Job 1: Map: 9 Accumulative CPU: 52.82 sec HDFS Read: 685493907 HDFS Write: 693841827 SUCCESS Job 2: Map: 17 Accumulative CPU: 90.52 sec HDFS Read: 693926843 HDFS Write: 701115104 SUCCESS Total MapReduce CPU Time Spent: 4 minutes 45 seconds 620 msec OK Time taken: 117.335 seconds OK Time taken: 0.011 seconds Total MapReduce jobs = 3 Execution log at: /tmp/hadoop/hadoop_20130525222121_d91f5176-958e-4b10-896d-ffd427e1a12c.log 2013-05-25 10:21:39 Starting to launch local task to process map join; maximum memory = 932118528 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/.versions/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2013-05-25 10:21:57 Processing rows: 200000 Hashtable size: 199999 Memory usage: 782325272 rate: 0.839 2013-05-25 10:22:00 Processing rows: 212267 Hashtable size: 212267 Memory usage: 809524488 rate: 0.868 2013-05-25 10:22:00 Dump the hashtable into file: file:/tmp/hadoop/hive_2013-05-25_22-21-37_408_6569501641432754678/-local-10006/HashTable-Stage-6/MapJoin-bu-31--.hashtable Execution failed with exit status: 2 Obtaining error information