[
https://issues.apache.org/jira/browse/MAPREDUCE-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandy Ryza resolved MAPREDUCE-5450.
---
Resolution: Fixed
Hadoop Flags: Reviewed
Unnecessary Configuration instantiation in IFileInputStream slows down merge
- Port to branch-1
---
Key: MAPREDUCE-5450
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5450
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv1
Affects Versions: 1.1.0
Reporter: Stanislav Barton
Assignee: Stanislav Barton
Priority: Blocker
Fix For: 1.3.0
Attachments: MAPREDUCE-5450-1.1.0.txt, mapreduce-5450.txt
We are using hadoop-2.0.0+1357-1.cdh4.3.0.p0.21 with MRv1. After upgrade from
4.1.2 to 4.3.0, I have noticed some performance deterioration in our MR job
in the Reduce phase. The MR job has usually 10 000 map tasks (10 000 files on
input each about 100MB) and 6 000 reducers (one reducer per table region). I
was trying to figure out what at which phase the slow down appears (firstly I
suspected that the slow gathering of the 1 map output files is the
culprit) and found out that the problem is not reading the map output (the
shuffle) but the sort/merge phase that follows - the last and actual reduce
phase is fast. I have tried to up the io.sort.factor because I thought the
lots of small files are being merged on disk, but again upping that to 1000
didnt do any difference. I have then printed the stack trace and found out
that the problem is initialization of the
org.apache.hadoop.mapred.IFileInputStream namely the creation of the
Configuration object which is not propagated along from earlier context, see
the stack trace:
Thread 13332: (state = IN_NATIVE)
- java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0
(Compiled frame; information may be imprecise)
- java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228
(Compiled frame)
- java.io.File.exists() @bci=20, line=733 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.getResource(java.lang.String, boolean)
@bci=136, line=999 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.findResource(java.lang.String, boolean)
@bci=3, line=966 (Compiled frame)
- sun.misc.URLClassPath.findResource(java.lang.String, boolean) @bci=17,
line=146 (Compiled frame)
- java.net.URLClassLoader$2.run() @bci=12, line=385 (Compiled frame)
-
java.security.AccessController.doPrivileged(java.security.PrivilegedAction,
java.security.AccessControlContext) @bci=0 (Compiled frame)
- java.net.URLClassLoader.findResource(java.lang.String) @bci=13, line=382
(Compiled frame)
- java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=1002
(Compiled frame)
- java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2,
line=1192 (Compiled frame)
- javax.xml.parsers.SecuritySupport$4.run() @bci=26, line=96 (Compiled frame)
-
java.security.AccessController.doPrivileged(java.security.PrivilegedAction)
@bci=0 (Compiled frame)
-
javax.xml.parsers.SecuritySupport.getResourceAsStream(java.lang.ClassLoader,
java.lang.String) @bci=10, line=89 (Compiled frame)
- javax.xml.parsers.FactoryFinder.findJarServiceProvider(java.lang.String)
@bci=38, line=250 (Interpreted frame)
- javax.xml.parsers.FactoryFinder.find(java.lang.String, java.lang.String)
@bci=273, line=223 (Interpreted frame)
- javax.xml.parsers.DocumentBuilderFactory.newInstance() @bci=4, line=123
(Compiled frame)
- org.apache.hadoop.conf.Configuration.loadResource(java.util.Properties,
org.apache.hadoop.conf.Configuration$Resource, boolean) @bci=16, line=1890
(Compiled frame)
- org.apache.hadoop.conf.Configuration.loadResources(java.util.Properties,
java.util.ArrayList, boolean) @bci=49, line=1867 (Compiled frame)
- org.apache.hadoop.conf.Configuration.getProps() @bci=43, line=1785
(Compiled frame)
- org.apache.hadoop.conf.Configuration.get(java.lang.String) @bci=35,
line=712 (Compiled frame)
- org.apache.hadoop.conf.Configuration.getTrimmed(java.lang.String) @bci=2,
line=731 (Compiled frame)
- org.apache.hadoop.conf.Configuration.getBoolean(java.lang.String, boolean)
@bci=2, line=1047 (Interpreted frame)
- org.apache.hadoop.mapred.IFileInputStream.init(java.io.InputStream,
long, org.apache.hadoop.conf.Configuration) @bci=111, line=93 (Interpreted
frame)
-
org.apache.hadoop.mapred.IFile$Reader.init(org.apache.hadoop.conf.Configuration,
org.apache.hadoop.fs.FSDataInputStream, long,
org.apache.hadoop.io.compress.CompressionCodec,
org.apache.hadoop.mapred.Counters$Counter) @bci=60, line=303 (Interpreted
frame)
-