[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668539#comment-15668539 ] Ravi Prakash commented on MAPREDUCE-2841: - Found it. From https://access.redhat.com/documentation/en-US/Red_Hat_Developer_Toolset/3/html/User_Guide/sect-Changes_in_Version_3.0-GCC.html bq. This applies to any string literal followed without white space by some macro. To fix this, add some white space between the string literal and the macro name. I've filed : https://issues.apache.org/jira/browse/MAPREDUCE-6810 > Task level native optimization > -- > > Key: MAPREDUCE-2841 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Environment: x86-64 Linux/Unix >Reporter: Binglin Chang >Assignee: Sean Zhong > Fix For: 3.0.0-alpha1 > > Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, > MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, > dualpivotv20-0.patch, fb-shuffle.patch, > hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, > mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, > mr-2841-merge.txt > > > I'm recently working on native optimization for MapTask based on JNI. > The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs > emitted by mapper, therefore sort, spill, IFile serialization can all be done > in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising > results: > 1. Sort is about 3x-10x as fast as java(only binary string compare is > supported) > 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware > CRC32C is used, things can get much faster(1G/ > 3. Merge code is not completed yet, so the test use enough io.sort.mb to > prevent mid-spill > This leads to a total speed up of 2x~3x for the whole MapTask, if > IdentityMapper(mapper does nothing) is used > There are limitations of course, currently only Text and BytesWritable is > supported, and I have not think through many things right now, such as how to > support map side combine. I had some discussion with somebody familiar with > hive, it seems that these limitations won't be much problem for Hive to > benefit from those optimizations, at least. Advices or discussions about > improving compatibility are most welcome:) > Currently NativeMapOutputCollector has a static method called canEnable(), > which checks if key/value type, comparator type, combiner are all compatible, > then MapTask can choose to enable NativeMapOutputCollector. > This is only a preliminary test, more work need to be done. I expect better > final results, and I believe similar optimization can be adopt to reduce task > and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665587#comment-15665587 ] Ravi Prakash commented on MAPREDUCE-2841: - I recently upgraded from Fedora 22 to Fedora 25 (I'm assuming this means the latest and greatest compilers, cmake etc.) My trunk build failed with this error: {code}[WARNING] /home/raviprak/Code/hadoop/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Log.h:35:67: error: unable to find string literal operator ‘operator""_fmt_’ with ‘const char [37]’, ‘long unsigned int’ arguments [WARNING] fprintf(LOG_DEVICE, "%02d/%02d/%02d %02d:%02d:%02d INFO "_fmt_"\n", \ {code} Without understanding what is wrong with the macro, I had to remove {{\_fmt\_}} from the fprintf to make to build proceed. Anyone else seeing this? > Task level native optimization > -- > > Key: MAPREDUCE-2841 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task > Environment: x86-64 Linux/Unix >Reporter: Binglin Chang >Assignee: Sean Zhong > Fix For: 3.0.0-alpha1 > > Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, > MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, > dualpivotv20-0.patch, fb-shuffle.patch, > hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, > mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, > mr-2841-merge.txt > > > I'm recently working on native optimization for MapTask based on JNI. > The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs > emitted by mapper, therefore sort, spill, IFile serialization can all be done > in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising > results: > 1. Sort is about 3x-10x as fast as java(only binary string compare is > supported) > 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware > CRC32C is used, things can get much faster(1G/ > 3. Merge code is not completed yet, so the test use enough io.sort.mb to > prevent mid-spill > This leads to a total speed up of 2x~3x for the whole MapTask, if > IdentityMapper(mapper does nothing) is used > There are limitations of course, currently only Text and BytesWritable is > supported, and I have not think through many things right now, such as how to > support map side combine. I had some discussion with somebody familiar with > hive, it seems that these limitations won't be much problem for Hive to > benefit from those optimizations, at least. Advices or discussions about > improving compatibility are most welcome:) > Currently NativeMapOutputCollector has a static method called canEnable(), > which checks if key/value type, comparator type, combiner are all compatible, > then MapTask can choose to enable NativeMapOutputCollector. > This is only a preliminary test, more work need to be done. I expect better > final results, and I believe similar optimization can be adopt to reduce task > and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175842#comment-14175842 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Nathan. I'm not adverse to putting it in branch-2. As of earlier this week we are also now shipping this feature to our customers as an experimental option, which should hopefully get some baking done. If the community is on board, I'm happy to backport to branch-2 as well. Perhaps we can label it as experimental in our next release, and stable in the release following if we get some good feedback? Would be great to hear some results from testing at Y!. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Fix For: 3.0.0 Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172967#comment-14172967 ] Nathan Roberts commented on MAPREDUCE-2841: --- {quote} Let's let this bake in trunk for a little while and consider a backport to branch-2 down the road if there is demand. Marking the issue as resolved for now. {quote} Nice Work! Not sure how much baking really happens on trunk;) Looking forward to this getting onto branch 2. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Fix For: 3.0.0 Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132651#comment-14132651 ] Hudson commented on MAPREDUCE-2841: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #679 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/679/]) Import initial code for MAPREDUCE-2841 (native output collector) (todd: rev b2551c06a09fb80a9e69adbc01c4c34b93ad0139) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/BufferType.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/FileSystem.h * hadoop-dist/pom.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Merge.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/NativeRuntimeJniImpl.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test.sh * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/resources/native_conf.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Streams.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/MapOutputCollector.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/util/NativeTaskOutput.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/InputBuffer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/BufferStream.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Log.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Streams.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/sdk/example/CustomModule/src/main/java/org/apache/hadoop/nativetask/platform/custom/CustomPlatform.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/system/function/org/apache/hadoop/mapred/nativetask/kvtest/TestInputFile.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/BufferStream.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/handlers/IDataLoader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/TaskContext.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Constants.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/system/function/org/apache/hadoop/mapred/nativetask/kvtest/KVTest.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/Lz4Codec.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/org/apache/hadoop/mapred/nativetask/TestTaskContext.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/NativeRuntime.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/BatchHandler.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/pom.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/NativeBatchProcessor.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/config.h.cmake * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/system/function/org/apache/hadoop/mapred/nativetask/nonsorttest/NonSortTest.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/resources/test-nonsort-conf.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/DataInputStream.java *
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132709#comment-14132709 ] Hudson commented on MAPREDUCE-2841: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1895 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1895/]) Import initial code for MAPREDUCE-2841 (native output collector) (todd: rev b2551c06a09fb80a9e69adbc01c4c34b93ad0139) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/TaskContext.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/BufferStream.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/NativeObjectFactory.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/test_commons.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestMemoryPool.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/sdk/example/CustomModule/README.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/NativeTask.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Compressions.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/CombineHandler.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestFileSystem.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Compressions.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/MapOutputSpec.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestReadBuffer.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/Constants.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/jniutils.cc * hadoop-dist/pom.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestComparatorForDualPivotQuickSort.cc * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/org/apache/hadoop/mapred/nativetask/handlers/TestNativeCollectorOnlyHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/ByteBufferDataWriter.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Combiner.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestSort.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/TaskCounters.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/org/apache/hadoop/mapred/nativetask/utils/TestBytesUtil.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/BatchHandler.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/GzipCodec.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestIterator.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/gtest_main.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/org_apache_hadoop_mapred_nativetask_NativeBatchProcessor.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/NativeSerialization.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/system/function/org/apache/hadoop/mapred/nativetask/testutil/MockValueClass.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestFixSizeContainer.cc *
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132748#comment-14132748 ] Hudson commented on MAPREDUCE-2841: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1870 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1870/]) Import initial code for MAPREDUCE-2841 (native output collector) (todd: rev b2551c06a09fb80a9e69adbc01c4c34b93ad0139) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/ByteWritableSerializer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestPrimitives.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/DataInputStream.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/handler/AbstractMapHandler.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/resources/normal_conf.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/Buffers.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/resources/test-lz4-compress-conf.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/buffer/DirectBufferPool.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/GzipCodec.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/WritableUtils.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/resources/test-bzip2-compress-conf.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/util/SnappyUtil.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestConfig.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/org/apache/hadoop/mapred/nativetask/handlers/TestCombineHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/ICombineHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/BlockCodec.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/VLongWritableSerializer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/jniutils.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib/primitives.h * hadoop-mapreduce-project/hadoop-mapreduce-client/pom.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/DefaultSerializer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/system/function/org/apache/hadoop/mapred/nativetask/combinertest/LargeKVCombinerTest.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/lib/TestIterator.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/test/java/org/apache/hadoop/mapred/nativetask/testutil/TestInput.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/LongWritableSerializer.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/Platforms.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Hash.cc * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Text.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Hash.h * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/util/TestHash.cc * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/java/org/apache/hadoop/mapred/nativetask/serde/NativeSerialization.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/snappy-c.h *
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133079#comment-14133079 ] Sean Zhong commented on MAPREDUCE-2841: --- Thanks for everyone! Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Fix For: 3.0.0 Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127197#comment-14127197 ] Todd Lipcon commented on MAPREDUCE-2841: bq. -1 javac. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1264 warnings). This is due to needing to import the deprecated UTF8 class to provide support for that type. Aside from that, seems like Jenkins is happy with the patch. The merge vote is already started on mapreduce-dev and is set to close 9/12 EOD PST. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124381#comment-14124381 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666986/mr-2841-merge-3.patch against trunk revision e6420fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 79 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1264 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-assemblies hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice: org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4860//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4860//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4860//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124766#comment-14124766 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666986/mr-2841-merge-3.patch against trunk revision d1fa582. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4861//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124786#comment-14124786 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667067/mr-2841-merge-4.patch against trunk revision d1fa582. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 71 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1264 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4862//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4862//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4862//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124789#comment-14124789 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667067/mr-2841-merge-4.patch against trunk revision d1fa582. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 71 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1264 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4863//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4863//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4863//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge-3.patch, mr-2841-merge-4.patch, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123332#comment-14123332 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666832/mr-2841-merge.txt against trunk revision 9e941d9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 71 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1304 javac compiler warnings (more than the trunk's current 1264 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4855//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to cause Findbugs (version 2.0.3) to fail. {color:red}-1 release audit{color}. The applied patch generated 8 release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/sdk/example/CustomModule {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4855//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4855//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4855//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4855//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123496#comment-14123496 ] Todd Lipcon commented on MAPREDUCE-2841: It looks like there are some issues where test-patch is trying to build/test the sample SDK module. There are also some other warnings to work through, though it seems like something's up with the Jenkins job such that it isn't archiving artifacts. [~mauzhang], [~clockfly], can you guys look into fixing the CustomModule SDK example? Also, is the entirety of that example still relevant given that we're not supporting user-defined native classes at this point? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124270#comment-14124270 ] Sean Zhong commented on MAPREDUCE-2841: --- Hi Todd, In that case, we can remove CustomModule SDK example. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124279#comment-14124279 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Sean. Did you file a separate JIRA with the patch to remove CustomModule? The merge patches posted here should be generated by diffing the branch vs trunk. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124284#comment-14124284 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666979/mr-2841-merge-2.txt against trunk revision e6420fe. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 71 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1304 javac compiler warnings (more than the trunk's current 1264 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 8 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-assemblies hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4858//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, MR-2841benchmarks.pdf, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt, mr-2841-merge-2.txt, mr-2841-merge.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119174#comment-14119174 ] Todd Lipcon commented on MAPREDUCE-2841: FYI, I'm running terasort validations/perf tests on a cluster now. If there are any specific configurations that you'd like to see for terasort on a 5-node cluster, please let me know. (eg HDFS block sizes, MR task memory configurations, etc). I did my best to pick some reasonable settings that kept the machine resources saturated, but want to make sure everyone's input is heard. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14119200#comment-14119200 ] Sean Zhong commented on MAPREDUCE-2841: --- Hi Todd, We typically choose block size as 512MB, and tune the io.sort.mb to make each task spill only once, and use 10GbE for testing. Other setting are typical settings. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch, micro-benchmark.txt I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14115900#comment-14115900 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Joy. Nice to hear from you, and glad to hear the benchmark was useful. A couple interesting points: bq. Took the average of 3 runs after one warmup run (all in same JVM) Do you typically enable JVM reuse? How many runs do you typically get within the same JVM in typical Qubole applications? I found that, if I increase the number of runs within a JVM to 30 or 40, then the existing collector becomes nearly as efficient as the native one. But, it really takes this many runs for the JIT to fully kick in. So, one of the main advantages of the native collector isn't that C++ code is so much faster than JITted Java code, but rather that, in the context of a map task, we rarely have a process living long enough to get the full available performance of the JIT. I ran some benchmarks with -XX:+PrintCompilation and found that the JIT was indeed kicking in on the first run. But, after many runs, some key functions got re-jitted and became much faster. Given that most people I know do not enable JVM reuse, and even if they do, typically do not manage to run 30-40 tasks within a JVM, I think there is a significant boost to running precompiled code for this hot part of the code. bq. Old Collector: 20.3s bq. New Collector: 7.48s This is comparing the MR2 collector vs the FB collector (BMOB?) Did you also try the native collector? It's interesting that your old collector runtimes are so slow. Did you tweak anything about the benchmark? On my system, the current MR2 collector pretty quickly gets down to 10sec. bq. I think query latency is absolutely the wrong benchmark for measuring the utility of these optimizations. The problem is Hive runtime (for example) is dominated by startup and launch overheads for these types of queries. But in a CPU/throughput bound cluster - the improvements would matter much more than straight line query latency improvements would indicate. Agreed. That's why the benchmark also reports total CPU time. The native collector is single-threaded whereas the existing MR2 collector is multi-threaded. So even though the wall time of a single task may not improve that much, it's using significantly less CPU to do the same work (meaning in a real job you'll get better overall throughput and cluster utilization). Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114834#comment-14114834 ] Joydeep Sen Sarma commented on MAPREDUCE-2841: -- chiming in, I tried Todd's benchmark on the FB blockoutputbuffer - from an internal email: i ran a benchmark that Todd Lipcon had posted which sorts 2.5M records of 100 bytes each (10 byte key, 90 byte value) distributed evenly across 100 partitions. Took the average of 3 runs after one warmup run (all in same JVM). - Old Collector: 20.3s - New Collector: 7.48s very interested in this work. We are going to enable FB's output collector by default in Qubole. I have done some tests on TPCH queries. It doesn't make a difference in all queries - but sometimes it does significantly, sample queries from BMOB: RegularBMOB q05 544 484 q01 -- no change -- (94) q02 175 166 q03 -- no change -- (too much variance but approx 256) One thing - I think query latency is absolutely the wrong benchmark for measuring the utility of these optimizations. The problem is Hive runtime (for example) is dominated by startup and launch overheads for these types of queries. But in a CPU/throughput bound cluster - the improvements would matter much more than straight line query latency improvements would indicate. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113114#comment-14113114 ] Todd Lipcon commented on MAPREDUCE-2841: I've uploaded a benchmark tool to github here: https://github.com/toddlipcon/mr-collector-benchmark As a quick summary of the results, the current output collector typically takes 6.6sec of CPU and 9 seconds of wall time to sort 250MB of terasort-style input data. The native implementation takes 1.7sec of CPU time and 3-5 seconds of wall time. So, as originally discussed at the top of this JIRA, there is a big CPU advantage to native sorting - across a terasort we can expect to save many thousands of seconds of CPU time, which should translate either to faster wall clock for the job, or to better concurrency on the cluster. I'll write up a more thorough summary including results for Facebook's collector implementation, and some explanation of _why_ it's faster. I'm also planning to deploy this on a cluster soon and get some actual results on a terasort. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069945#comment-14069945 ] Binglin Chang commented on MAPREDUCE-2841: -- Hi Sean, the test succeed on macosx, but failed on ubuntu12, I update the test a little in MAPREDUCE-5985. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068275#comment-14068275 ] Binglin Chang commented on MAPREDUCE-2841: -- bq. I have some issue compiling the code on MACOSX I downgrade cmake from 3.0 to 2.8 and unset JAVA_HOME, cmake can success, have not find the root reason. There are other compile errors, already create sub-task for this. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068306#comment-14068306 ] Binglin Chang commented on MAPREDUCE-2841: -- Hi [~clockfly], when I run the tests, one test failed with, is this expected? {code} [ RUN ] IFile.TestGlibCBug 14/07/21 15:55:30 INFO TestGlibCBug ./testData/testGlibCBugSpill.out /home/decster/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test/TestIFile.cc:186: Failure Value of: realKey Actual: 1127504685 Expected: expect[index] Which is: 4102672832 [ FAILED ] IFile.TestGlibCBug (0 ms) [--] 2 tests from IFile (240 ms total) {code} Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068323#comment-14068323 ] Binglin Chang commented on MAPREDUCE-2841: -- I see the comments in the test code, but it doesn't help much, my env is ubuntu12, glibc: 2.15-0ubuntu10.3 {code} Expected: expect[index] Which is: 4102672832 uint32_t expect[5] = {-1538241715, -1288088794, -192294464, 563552421, 1661521654}; while(NULL != (key = reader-nextKey(length))) { int realKey = bswap(*(uint32_t *)(key)); ASSERT_EQ(expect[index], realKey); index++; } {code} the expected value is not in expect array, maybe the uint32 to int32 is buggy? or there must be an array index out of range bug in the code? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069754#comment-14069754 ] Sean Zhong commented on MAPREDUCE-2841: --- Hi Binglin, The TestGlibCBug UT fail is not expected. I am investigating why. I will open a subtask for this. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068171#comment-14068171 ] Binglin Chang commented on MAPREDUCE-2841: -- Thanks Sean, patch looks good. I have some issue compiling the code on MACOSX, I see the cmake file is mostly copy from hadoop-common(or other sub projects), I compile hadoop-common successfully in my env, but failed for nativetask, so there maybe some issue in CMakefile {code} [copy] Copying 1 file to /Volumes/SSD/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/test/testData [exec] CMake Error at /usr/local/Cellar/cmake/3.0.0/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:136 (message): [exec] Could NOT -- Configuring incomplete, errors occurred! [exec] See also /Volumes/SSD/projects/hadoop-trunk/hadoop-mapreduce-project/hadoop-mapredufind JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY [exec] JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) [exec] Call Stack (mce-client/hadoop-mapreduce-client-nativetask/target/native/CMakeFiles/CMakeOutput.log. [exec] ost recent call first): [exec] /usr/local/Cellar/cmake/3.0.0/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:343 (_FPHSA_FAILURE_MESSAGE) [exec] /usr/local/Cellar/cmake/3.0.0/share/cmake/Modules/FindJNI.cmake:286 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) [exec] JNIFlags.cmake:117 (find_package) [exec] CMakeLists.txt:24 (include) {code} Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068226#comment-14068226 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Binglin. Mind filing a subtask to fix the compilation on OSX? Since you and Sean are both branch committers, maybe you can work together to resolve this and commit a fix on the branch? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064878#comment-14064878 ] Sean Zhong commented on MAPREDUCE-2841: --- Hi Todd, The patch is uploaded to: https://raw.githubusercontent.com/intel-hadoop/nativetask/native_output_collector/patch/hadoop-3.0-mapreduce-2841-2014-7-17.patch (It is too big to be uploaded to here) It is patched against hadoop3.0 trunk. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065042#comment-14065042 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Sean. Something seems to be wrong with that patchfile -- some of the files seem to be present 11 times in it: {code} todd@todd-ThinkPad-T540p:~$ grep '+++.*TextSerializer' hadoop-3.0-mapreduce-2841-2014-7-17.patch | less -S | wc -l 11 {code} That might also explain why the file is too large to upload here as an attachment. Could you try to regenerate the patch? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065064#comment-14065064 ] Sean Zhong commented on MAPREDUCE-2841: --- Ah, thanks for pointing this out. I am not sure why this happen. I just uploaded the patch to this jira https://issues.apache.org/jira/secure/attachment/12656288/hadoop-3.0-mapreduce-2841-2014-7-17.patch updates: 1. Remove Hbase/hive/hive/mahout/pig related code, those code will be posted elsewhere in another jira or hosted on github. 2. Use ServiceLoader to discover custom platform(to support custom key types) Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065207#comment-14065207 ] Todd Lipcon commented on MAPREDUCE-2841: Thanks Sean. This patch looks better. I committed it as the initial import onto the new feature branch (MR-2841). I had some issues building on my Ubuntu 13.10 system, but one of the purposes of the feature branch is to be able to iterate on it more collaboratively. I'll file a couple of subtasks for the issues I'm running into on my box. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch, hadoop-3.0-mapreduce-2841-2014-7-17.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053848#comment-14053848 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Sean. Would you mind creating a patch file which can be applied to MR trunk which would add the native collector code into the Hadoop tree? Seems like we should probably create a new maven project inside hadoop-mapreduce-project/hadoop-mapreduce-client/ (eg something like hadoop-mapreduce-native-collector)? When a patch file is ready, I'll create a development branch and we can work from there to address the remaining issues mentioned above around pluggability. I also noticed that the current code uses autoconf, whereas Hadoop has generally standardized on cmake for native build. Let's add that to our list of things to do on the development branch. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047771#comment-14047771 ] Todd Lipcon commented on MAPREDUCE-2841: At the risk of sounding like a broken record, the key points towards encouraging this contribution are: # it is fully transparent, and thus revertible without impacting users in case it is later abandoned or otherwise problematic to maintain # the code has a multi-year history of contribution from several different developers, which eliminates risk of abandonment # the code is already in use in several production clusters, which indicates that end users find the improvement useful and stable in real-world applications # benchmarks show a substantial performance improvement across a variety of workloads # the new feature is 100% optional, with no changes to existing code paths, which eliminates any risk from users or vendors who prefer not to enable it Since I think Arun's questions have mostly been addressed, I'd like to continue making forward progress on this issue. Arun -- could you clarify whether you are vetoing (-1) or just expressing some healthy skepticism (-0)? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047804#comment-14047804 ] Chris Douglas commented on MAPREDUCE-2841: -- If [~clockfly] is close to a patch, that would make the scope concrete. It sounds like there are more than zero changes to the framework (i.e., the MAPREDUCE-2454 API is insufficient), but fewer than a full replacement of the {{Task}} code with C\+\+. Would it be difficult to produce and post a patch to ground the discussion? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047843#comment-14047843 ] Todd Lipcon commented on MAPREDUCE-2841: The patch required for the output collector is just this one: https://github.com/intel-hadoop/nativetask/blob/native_output_collector/patch/hadoop-2.patch In fact, this just provides the automatic fallback functionality. That functionality is probably useful for all pluggable output collectors -- happy to break it out to be distinct from the JIRA. The only other diff in that patch is a trivial addition to the Text writable implementation to allow setting a Text more easily from different serialization formats. I don't think it makes sense to break it out to a separate JIRA, but happy to do so if that makes things easier. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045189#comment-14045189 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Arun. I think Sean did a good job answering many of your questions, but here are a few more responses to specifics from your earlier comment. bq. I'm confused. There already exists large amounts of code on the github for a the full task runtime. Is that abandoned? Are you saying there no intention to contribute that to Hadoop, ever? Why would that be? Would that be a separate project? As Sean pointed out, there is a branch on github which is just the native collector. I won't speak for Sean, but my own opinion is the same as yours with regard to the runtime. If we want to make a full native MR framework, it's a larger project that should probably be in the incubator and build on APIs exposed by YARN and MR. A strict accelerator of the existing MR, though, doesn't seem to make as much sense as a separate project. bq. C++ still is a major problem w.r.t different compiler versions So long as you avoid C\+\+11, C\+\+ can be very portable. AFAIK there is no usage of C\+\+11 features in this contribution, and I would agree with you that we should avoid them and stick to a proper subset of C\+\+. Personally, I am currently working on another project which uses C\+\+ with the Google style guidelines and we have no problem building on a wide variety of operating systems (despite having orders of magnitude more code and complexity). bq. Furthermore, there are considerably more security issues which open up in C++ land such as buffer overflow etc. I'm not sure how this is a concern, since the new code only runs in the context of tasks, not daemons. C certainly has the same issues (and in my experience, buffer overflows and memory leaks are more common in C vs C++ due to lack of safe containers and smart pointers, but that's a separate discussion). It might be a stability concern, but that's easy to address with extensive testing, which Sean and team have already been doing for the past year or more. bq. I'm sure we both would take 2x on Pig/Hive anyday... smile Well, it's not quite 2x, but the performance benchmarks referenced on the wiki show hive aggregation having a 50% improvement. So, while I agree that terasort is not representative of many workloads, Sean and his team have done a good job showing that this optimization benefits a large class of diverse workloads, with no change required to the upper-level framework. bq. Furthermore, this jira was opened nearly 3 years ago and only has sporadic bursts of activity - not a good sign for long-term maintainability. I'm not sure that the past is indicative of the future here. Many times we open JIRAs and don't have time to fully push them to fruition until the future -- eg YARN sat around on JIRA with sporadic activity for many years until your team at Yahoo really got started on it. Even then, if I recall correctly, a lot of the development happened in a separate repository before there was an initial code drop to a branch at Apache. The same is true of this project (though of course on much smaller scale) -- the project idea was a few years back, and then it was developed in Intel's repository until it is now being proposed to be integrated. bq. Finally, what is the concern you see with starting this as an incubator project and allowing folks to develop a community around it? We can certainly help on our end by making it easy for them to plug in via interfaces etc. The main concern is that it would be difficult for users to install/plug in. Speaking with my Apache hat on, I think this benefits all MR users and it would be great to say Upgrade to 2.5, and your jobs will go 50% faster in many cases! With my vendor hat on, it might actually be beneficial for this to live elsewhere -- we could tout it as a unique feature of our distro :) But, I'm trying to do the right thing here for the community at large, and also encourage a new group of developers to make contributions to our project. bq. discussion of line counts, etc My metrics were using the 'sloccount' program which counts non-comment non-empty lines. Sean already gave a good breakdown of the code. But, I think it's unimportant to squabble over details - my main point there was just that the contribution is meaty but not massive. It's also relatively simple code (eg entirely single-threaded) which is confined to the task (no concerns of daemon stability) and entirely optional (users can switch on a per-job basis whether to use this collector). I'd assume that in our first release we would leave the feature off by default and only make it on-by-default after we observe that many users have enabled it with good results. In the very worst case, since it is a fully transparent optimization,
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044206#comment-14044206 ] Sean Zhong commented on MAPREDUCE-2841: --- First, Arun and Todd, thank you both for your honest opinions! You are both respected! I believe the differences will narrow after we see the same facts, I'd like to state the facts and clarify some confusions: 1. How many lines of code on earth? Here is a breakdown for branch https://github.com/intel-hadoop/nativetask/tree/native_output_collector: java code(*.java) 122 files, 8057 lines nativetask 62 files, 4080 lines nativetask Unit Test14 files, 1222 lines other platform pig/mahout/hbase/hive25 files, 477 lines scenario test 21 files, 2278 lines, native code(*.h, *.cc) 128 file, 47048 lines nativetask 85 files, 11713 lines nativetask Unit Test33 files, 4911 lines, otherPlatform pig/mahout/hbase/hive 2 files,1083 lines, thirdparty gtest lib header files 3 files,28699 lines thirdparty lz4/snappy/cityhash 5 files,642 lines (Note: All license header lines in each source file are not counted, blanks and other comments are counted) If we measure the LOC in the sense of code complexity, then: Third party code like google test header files should not be counted,gtest head alone has 28699 lines of code. Pig/mahout/hbase/hive code will be removed from the code repository eventually, and should not be counted. Scenario test code may not be included, as you can always write new scenario tests. So after the deduction, effective code contains, NativeTask Source Code(java + native C++): 15793 lines NativeTask Unit test Code(java + native C++): 6133 lines 2. Is this patch used as alternate implementation of MapReduce runtime, like TEZ? No, the whole purpose of this patch submission is to act as an Map Output Collector, which transparently improve MapReduce performance, NOT as a new MR engine. The code is posted at branch https://github.com/intel-hadoop/nativetask/tree/native_output_collector, it only includes code for map output collector. 3. Why there are Pig/Mahout/HBase/Hive code in native task source code? We are working on removing platform(Hive/Pig/HBase/Mahout) code from native task source code a I commented above, and provide them as standalone jars. We rushed to post the link without fully cleanup so that we can get some early feedback from community. 4. Is the Full native runtime included? No, full native runtime is not included in this patch, and related code is stripped. Repo https://github.com/intel-hadoop/nativetask/tree/native_output_collector only contains code for transparent collector. 5. Are there intention to contribute the full native runtime node to Hadoop? or act as a separate project? It is not the purpose of this patch to support full native runtime mode, the goal of this patch is to make existing MR job runs better on modern CPU with native map output collector. Full native runtime mode is another topic, there is a long way for that to be ready for submission, We don't want to consider that now. 6. Are there interface compatibility issue? This patch is not about full native runtime mode which supports native mapper and native reducer. This patch is only about a custom map output collector in transparent mode. We are using existing java interfaces, and people are still running their java version mapper/reducer without re-compilation. User can make a small configure change to enable this nativetask collector. When there is a case that nativetask don't support, it will simply fallback to default MR implementation. 7. Are there C++ ABI issue? The concern make sense. Regarding ABI, if the user don't need custom key comparator, he will never need to implement native comparator on nativetask.so, so no ABI issue. If the user do want to write a native comparator, the nativetask native interface involved is very limited, only: typedef int (*ComparatorPtr)(const char * src, uint32_t srcLength, const char * dest, uint32_t destLength); However, the current code will assume user to include whole NativeTask.h, which contains more stuff than the typedef above. We will work on this to make sure that NativeTask.h only expose necessary minimum API. After we do this, there should be no big ABI issue. 8. How can you make sure the quality of this code? The code has been actively developed more than 1 year. It has been used and tested in production for a very long time, and there are also full set of unit test and scenario test for coverage. 9. Can this be worked on TEZ instead? We believe it is good for MapReduce, we know people are still
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041754#comment-14041754 ] Arun C Murthy commented on MAPREDUCE-2841: -- I also noticed that the github has a bunch of code related to Pig, Hive etc. - I think we'd all agree that they need to be in respective projects eventually. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041752#comment-14041752 ] Arun C Murthy commented on MAPREDUCE-2841: -- {quote} If the MR developer community generally agrees this belongs in the core, I'd like to start a feature branch for it in order to import the current code, sort out the build/integration issues, and take care of the remaining items that Sean mentioned above. {quote} [~tlipcon] Thanks for starting this discussion. I have a few thoughts I'd like to run by you. I think the eventual goal of this (looking at https://github.com/intel-hadoop/nativetask/blob/master/README.md) is a full-native runtime for MapReduce including sort, shuffle, merge etc. Hence, it does look like we will achieve a compatible, but alternate implementation of MapReduce runtime. Hence, this is similar to other alternate runtimes for MapReduce such as Apache Tez. Furthermore, this is implemented in C++ - which is, frankly, a concern for the poor job C++ has done with ABI. I'm glad to see that it doesn't rely on boost - the worst affender. This is the same reason the native Hadoop client (HADOOP-10388) is being done purely in C. Also, the MR development community is pre-dominantly Java, which is something to keep in mind. This is a big concern for me. In all, it seems to me we could consider having this not in Apache Hadoop, but as an incubator project to develop a native, MR compatible runtime. This will allow it to develop a like-minded community (C++ skills etc.) and not be bogged down by *all* of Hadoop's requirements such as security (how/when will this allow for secure shuffle or encrypted shuffle etc.), compatibility with several OSes (flavours of Linux, MacOSX, Windows) etc. It will also allow them to ship independently and get user feedback more quickly. Similarly, I am wary of importing a nearly 75K LOC codebase into a stable project and it's impact on our releases on breakage - particularly given the difference in skills of the community i.e. Java v/s C++ etc. What do you think Todd Sean? I'm more than happy to help with incubator process if required. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041757#comment-14041757 ] Arun C Murthy commented on MAPREDUCE-2841: -- On related thought to Pig/Hive etc. - I see Hadoop MapReduce fading away fast particularly since projects using MR such as Pig, Hive, Cascading etc. re-vector on other projects like Apache Tez or Apache Spark. For e.g. # Hive-on-Tez (https://issues.apache.org/jira/browse/HIVE-4660) - The hive community has already moved it's major investments away from MR to Tez. # Pig-on-Tez (https://issues.apache.org/jira/browse/PIG-3446) - The pig community is very close to shipping this in pig-0.14 and again is investing heavily on Tez. Given that, Sean/Todd, would it be useful to discuss contributing this to Tez instead? This way the work here would continue to stay relevant in the context of the majority users of MapReduce who use Pig, Hive, Cascading etc. Of course, I'm sure another option is Apache Spark, but given that Tez is much more closer (code-base wise) to MR, it would be much easier to contribute to Tez. Happy to help if that makes sense too. Thanks. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041775#comment-14041775 ] Gopal V commented on MAPREDUCE-2841: From my previous experience with MAPREDUCE-4755, I found adding a new Sort buffer impl to Tez to be far simpler. The 4755 patch lives on as the multi-core PipelinedSorter in Tez, in case you need a reference hook for alt-sorter impls there. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041781#comment-14041781 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Arun. I agree that building a completely parallel C++ MR runtime is a much larger project that should not be part of Hadoop. However, per my above comment, the current goal for contribution is _only the MapOutputCollector_. That is to say, there is no ABI concern, because the interface exposed to application developers is really quite small (unlike something like pipes or a full task SDK). I realize it's confusing because the github repo does mention these larger goals of the project, but I agree with you that they are not worth the cost-benefit within MR itself. bq. I also noticed that the github has a bunch of code related to Pig, Hive etc. - I think we'd all agree that they need to be in respective projects eventually. Fully agree. I raised the same concern offline to Sean and he is working on making the platform support use a ServiceLoader instead of explicitly registering all of the other frameworks inside the core code. (his comment Extract support for Hive/Pig/HBase/Mahout platforms to standalone jars, and decouple the dependency with native task source code is referring to this). bq. Similarly, I am wary of importing a nearly 75K LOC codebase into a stable project I think the 75k you're counting may include the auto-generated shell scripts. By my count, the non-test Java code is 3k lines, some of which is boilerplate around the different platform implementations (which would move into those projects). If you exclude the thirdparty dependencies which are bundled into the repo, the non-test C++ source is 10kloc. So, it's not a tiny import by any means, but for 2x improvement on terasort wallclock, my opinion is that the maintenance burden is worth it. As for importing to Tez, I don't think the community has generally agreed to EOL MapReduce :) In other words, if you feel that Tez is the future, it makes sense for you to not work on any further MR optimizations, but shouldn't preclude others from doing so. There are lots of production apps on MR today that do not want to switch to a new framework (regardless of compatibility layers), so I think this would be valuable for the community. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041816#comment-14041816 ] Arun C Murthy commented on MAPREDUCE-2841: -- Todd, bq. I agree that building a completely parallel C++ MR runtime is a much larger project that should not be part of Hadoop. I'm confused. There already exists large amounts of code on the github for a the full task runtime. Is that abandoned? Are you saying there no intention to contribute that to Hadoop, ever? Why would that be? Would that be a separate project? With or without ABI, C++ still is a major problem w.r.t different compiler versions, different platforms we support etc. That is precisely why HADOOP-10388 chose to use pure-C only. A similar switch makes me *much* more comfortable, aside from the disparity in skills in the Hadoop community. Furthermore, there are considerably more security issues which open up in C++ land such as buffer overflow etc. bq. I think the 75k you're counting may include the auto-generated shell scripts. From the github: {noformat} $ find . -name *.java | xargs wc -l 11988 total $ find . -name *.h | xargs wc -l 27269 total $ find . -name *.cc | xargs wc -l 26276 total {noformat} Whether it's test or non-test, we are still importing a *lot* of code - code for which the Hadoop community does need to maintain? bq. So, it's not a tiny import by any means, but for 2x improvement on terasort wallclock, my opinion is that the maintenance burden is worth it. Todd, as we both know, there are many, many ways to get 2x improvement on terasort... ... nor is it worth a lot in real-world outside of benchmarks. I'm sure we both would take 2x on Pig/Hive anyday... *smile* bq. As for importing to Tez, I don't think the community has generally agreed to EOL MapReduce Regardless of whether or not we pull this into MR, it would be useful to pull it into Tez too - if Sean wants to do it. Let's not discourage them. I'm sure we both agree, and want to see real world workloads improve and that Hive/Pig/Cascading etc. represent that. IAC, hopefully we can stop this meme that I'm trying to *preclude* you from doing anything regardless of my religious beliefs. IAC, we both realize MR is reasonably stable and won't get a lot of investment, and so do our employers: http://vision.cloudera.com/mapreduce-spark/ http://hortonworks.com/hadoop/tez/ So, I'd appreciate if we don't misinterpret each others' technical opinions and concerns during this discussion. Thanks. FTR: I'll restate my concerns about C++, roadmap for C++ runtime, maintainability, support for all of Hadoop (security, platforms etc.). Furthermore, this jira was opened nearly 3 years ago and only has sporadic bursts of activity - not a good sign for long-term maintainability. I've stated my concerns, let's try get through them by focussing on those aspects. Finally, what is the concern you see with starting this as an incubator project and allowing folks to develop a community around it? We can certainly help on our end by making it easy for them to plug in via interfaces etc. Thanks. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040691#comment-14040691 ] Sean Zhong commented on MAPREDUCE-2841: --- Latest native task code is posted at: https://github.com/intel-hadoop/nativetask/tree/native_output_collector for easy review. Currently the code is patched againt Hadoop2.2. Some features highlights: 1. Full performance test covered https://github.com/intel-hadoop/nativetask/tree/native_output_collector#what-is-the-benefit 2. Support all values types which extends Writable. 3. Support all key types in hadoop.io, and most key types in project hive, pig, mahout, hbase. For a list of supported key types, please check https://github.com/intel-hadoop/nativetask/wiki#supported-key-types 4. Fully support java combiner. 5. Support large key and values. 6. A full test suite for key value combination. 7. Support GZIP, LZ4, and Snappy. Items we are still working on: 1. Extract support for Hive/Pig/HBase/Mahout platforms to standalone jars, and decouple the dependency with native task source code. 2. More documents describing the api. For design, test, and doc, please check https://github.com/intel-hadoop/nativetask/tree/native_output_collector https://github.com/intel-hadoop/nativetask/wiki Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041027#comment-14041027 ] Todd Lipcon commented on MAPREDUCE-2841: Hey Sean. Thanks for posting the updated code. I've been following the progress via Github for a couple of months and also had some in-person discussion with the authors a while back to learn about this project. It seems like this would be a good addition to Hadoop -- it offers some substantial performance improvements for many jobs, both in latency and total cluster throughput. While it would be possible to distribute this as a separate downloadable (eg on github or a separate incubator project) it seems like it would be better for the project to be part of the core. If the MR developer community generally agrees this belongs in the core, I'd like to start a feature branch for it in order to import the current code, sort out the build/integration issues, and take care of the remaining items that Sean mentioned above. I'll volunteer to be a committer shepherd for the branch and help ensure that all the code is properly reviewed and up to our usual contribution standards around licensing, testing, etc. I think a feature branch is better than trying to sort out the remaining tasks over in github. What do other folks think? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041031#comment-14041031 ] Todd Lipcon commented on MAPREDUCE-2841: Also, just to clarify -- the current goal is to contribute the transparent map output collector improvement. The wiki pages also mention a full task framework which includes an ability for users to author tasks fully in C++. That's an interesting extension to pursue in the future, but as I understand it, the current state of the github repository is just the transparent output collector improvements, which require no changes in user code so long as they are using standard writables, and only some pretty simple coding (akin to writing a RawComparator) if they are using custom types. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041043#comment-14041043 ] Todd Lipcon commented on MAPREDUCE-2841: BTW, one quick question: I noticed in your README that you support CRC32C checksums on IFile. However, the Java code currently hard-codes CRC32. Do we need a JIRA to make this configurable (or just switch Java over to CRC32C since it's faster in Java too?) Seems like we could start that work on trunk in parallel with importing the native code. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001286#comment-14001286 ] Sean Zhong commented on MAPREDUCE-2841: --- Updates on this: https://github.com/intel-hadoop/nativetask Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961266#comment-13961266 ] Hadoop QA commented on MAPREDUCE-2841: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638882/fb-shuffle.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1491 javac compiler warnings (more than the trunk's current 1483 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4488//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4488//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4488//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4488//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4488//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198900#comment-13198900 ] Hadoop QA commented on MAPREDUCE-2841: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12492208/MAPREDUCE-2841.v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1749//console This message is automatically generated. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199485#comment-13199485 ] He Yongqiang commented on MAPREDUCE-2841: - Really cool stuff! In terms of CPU performance, can you also do some comparison with new java output collector that Todd mentioned (https://github.com/facebook/hadoop-20/blob/master/src/mapred/org/apache/hadoop/mapred/BlockMapOutputBuffer.java)? In facebook's internal tests, we have seen a big improvement(8x-10x) for cpu spent in sort (e.g., example 1: 240M-30M, example 2: 9M-1.8M), and total mapper CPU of 2x ( example 1: 707M - 440M, example 2: 40M-19M). They are CPU numbers, not latency numbers. BlockMapOutputBuffer.java uses only one thread but the original collector uses 2 threads. But the latency is still improved by a lot (like 30%). with some analysis on performance differences, it will really help understand some bottlenecks and the difference that language brings. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199498#comment-13199498 ] Binglin Chang commented on MAPREDUCE-2841: -- @Yongqiang It seams BlockMapOutputBuffer only support BytesWritable currently? So Wordcount Terasort can't run directly . From the test result, Map Avg time and sort time, I would say sort take about 40-50% time of whole map task, because sort is CPU intensive, the CPU time should be more, about 50%-60% maybe. If my compiler assumption is right, total speedup for Wordcount mapper should be 10x, sort speedup should be 10x-12x, and the rest(reader, mapper, merge, spill combined) should be (10-0.5*12)/(1-0.5)=8. I must say using quicksort to sort small buffers fit into cache then merge them is a good idea, I should make this optimization too. NativeTask currently use single thread currently, but I think all partition based collector, including BlockMapOutputCollector) can take advantage of parallel sort spill I mentioned in the design doc, this needs code changes to other part(TaskTracker,IndexCache maybe), and change map output file to a directory. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176107#comment-13176107 ] Dong Yang commented on MAPREDUCE-2841: -- Beautiful works beyond HCE! Contrib to binglin~ Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169992#comment-13169992 ] Todd Lipcon commented on MAPREDUCE-2841: I chatted recently with the MR guys over at Facebook. They have another implementation here which they've been working on that gives similar gains, while staying all in java. The approach is something like the following: - map output collector collects into small buffers, each sized something close to L3 cache - when any buffer is full, sort it but don't spill it - when enough buffers are collected to fill io.sort.mb, merge them from memory to disk This fixes the cache locality issues that everyone has identified, but doesn't require native code. It's up on their github here: https://github.com/facebook/hadoop-20/blob/master/src/mapred/org/apache/hadoop/mapred/BlockMapOutputBuffer.java Maybe Yongqiang can comment more on this approach? Dmytro has given permission offline for us to work from the code on their github and contribute it on trunk (they may not have time to contribute it in the nearterm) I think a short term goal for this area we could attack would be to make the map output collector implementation pluggable. Then people can experiment more freely with different collector implementations. I don't have time for it - just throwing it out there as a thought. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093507#comment-13093507 ] Binglin Chang commented on MAPREDUCE-2841: -- Hi, Yongqiang bq. i may should make me more clear: we are trying to evaluate and compare the c++ impl in HCE (and also this jira) and doing a pure java re-impl. I think a pure java re-impl is possible, there are some tricks to slove memory fragmentation issues, maybe not throughly, for example letting many adjacent buckets share one MemoryBlock if partition number is too large, which is what I will do in native implementation. And again, it's hard to support stream like key/value serialization semantics, so the java re-impl has the same limitations as native impl has. But the low level unaligned memcmp memcpy is hard to implement in java. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093534#comment-13093534 ] Binglin Chang commented on MAPREDUCE-2841: -- Update some test results. 1. Terasort 10G input 40map 40reduce on 9node cluster, 7map/7reduce slot per node io.sort.mb 500MB Results on jobhistory: |||| Total || AverageMap || AverageShuffle || AverageReduce ||java | 54s | 14s | 14s |10s | ||native | 39s | 7s |15s| 9s| ||java-snappy |36s | 15s | 9s|8s| ||native-snappy|27s | 7s | 7s | 8s | speedup-without-compression: 1.38 speedup-with-compression: 1.33 2. I did another test of big data set Terasort 100G 400map 400reduce on 9node cluster, 7map/7reduce slot per node |||| Total || AverageMap || AverageShuffle || AverageReduce ||java-snappy | 277s | 17s| 28s | 10s | ||native-snappy| 234s | 10s| 22s | 10s | speedup: 1.18 When cluster is under heavy workload, the bottleneck will be shown in page cache, shuffle, so optimizations in sortspill do not play big roles. 3. I test the dual pivot quicksort patch provided by Chris, using the same test as test No.1 There are no observable differences compare to old QuickSort, Average map task time for java-snappy is the same as before(15s), perhaps the data set is too small, or the bottleneck is dominated by other factors, like memory random access. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093777#comment-13093777 ] Todd Lipcon commented on MAPREDUCE-2841: One interesting C++ vs Java factor here is that this hot code is running in short-lived map tasks, the majority of the time. So there is less time for the JIT to kick in and actually compile the sorting code. In the past I've added -Xprof to mapred.child.java.opts, or looked at oprofile-jit output, and seen a fair amount of time spent in interpreted code. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094154#comment-13094154 ] Chris Douglas commented on MAPREDUCE-2841: -- bq. we are trying to evaluate and compare the c++ impl in HCE (and also this jira) and doing a pure java re-impl. So the thing that we mostly cared about is that is there sth that the c++ impl can do and a java re-impl can not. And if there is, we need to find out how much is that difference. And from there we can have a better understand of each approach and decide which approach to go. Sorry, that's what I was trying to answer. A system matching your description existed in 0.16 and tests of the current collector show it to be faster for non-degenerate cases and far more predictable. The bucketed model inherently has some internal fragmentation which can only be eliminated by using expensive buffer copies and compactions or by using per-record byte arrays, where the 8 byte object overhead exceeds the cost of tracking the partition, requiring only 4 bytes. Eliminating that overhead is impractical, but even mitigating it (e.g. allowing partitions to share slabs) requires that one implement an allocation and memory management system across Java byte arrays or ByteBuffers, themselves allocated by the JVM. I would expect that system to be easier to write and maintain than even the current impl, but not trivial if it supports all of the existing use cases and semantics. Unlike the C++ impl (and like the current one), abstractions will likely be sacrificed to avoid the overheads. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094214#comment-13094214 ] Scott Carey commented on MAPREDUCE-2841: {quote} There are no observable differences compare to old QuickSort {quote} Dual-pivot quicksort causes the exact same number of comparisons as ordinary quicksort. However, it should have fewer swaps (0.80 times as many). If the cost of comparison is high (larger records, object comparison) the effect will be minimal. If the cost of comparison is low (values, very simple objects) the performance difference can be larger, up to about 2.5x as fast for sorting an array of ints. http://mail.openjdk.java.net/pipermail/core-libs-dev/2010-August/004687.html (initial message, many more if you search). Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092972#comment-13092972 ] Binglin Chang commented on MAPREDUCE-2841: -- bq. It might make sense to commit that subset as optional functionality first, then iterate based on feedback. I agree. How to contribute this to hadoop? Add a new subdirectory in contrib like streaming, or merge to native, or stay in current c++/libnativetask? It contains both c++ and java code, and will likely to add client tools like streaming, and dev SDK. Random memory config gives the Resource Scheduler more information so it may yield better schedule algorithms. As for OOM, there is a flex layer for memory control already, page cache. In typical slave node memory configuration and real cases, page cache (should) take considerable proportions of total memory(20%-50%), so for example tasks can be configured to use 60% of memory, but can have some variance in 20% range, and the variance become relatively small when multiple tasks combined to node level or whole job level. One of my colleague is working on shuffle service, which delegate all reduce shuffle work to a per node service, this has some aspect which is similar: For a single task, the variance of memory footprint is a problem, but it gets much stable for many tasks run on a node. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093007#comment-13093007 ] He Yongqiang commented on MAPREDUCE-2841: - we are also evaluating the approach of optimizing the existing Hadoop Java map side sort algorithms (like playing the same set of tricks used in this c++ impl: bucket sort, prefix key comparison, a better crc32 etc). The main problem we are interested is how big is the memory problem for the java impl. Also it will be very useful here to define an open benchmark. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093034#comment-13093034 ] Allen Wittenauer commented on MAPREDUCE-2841: - Sure are a lot of header files with full blown functions in them Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093121#comment-13093121 ] Chris Douglas commented on MAPREDUCE-2841: -- {quote}I agree. How to contribute this to hadoop? Add a new subdirectory in contrib like streaming, or merge to native, or stay in current c++/libnativetask? It contains both c++ and java code, and will likely to add client tools like streaming, and dev SDK.{quote} To pair the java/c++ code, a contrib module could make sense. Client tools and dev libraries are distant goals, though. Contributing it to the 0.20 branch is admissible, but suboptimal. Most of the releases generated for that series are sustaining releases. While it's possible to propose a new release branch with these improvements, releasing it would be difficult. Targeting trunk would be the best approach, if you can port your code. {quote}we are also evaluating the approach of optimizing the existing Hadoop Java map side sort algorithms (like playing the same set of tricks used in this c++ impl: bucket sort, prefix key comparison, a better crc32 etc). The main problem we are interested is how big is the memory problem for the java impl.{quote} Memory _is_ the problem. The bucketed sort used from 0.10(?) to 0.16 had more internal fragmentation and a less predictable memory footprint (particularly for jobs with lots of reducers). Subsequent implementations focused on reducing the number of spills for each task, because the cost of spilling dominated the cost of the sort. Even with a significant speedup in the sort step, avoiding a merge by managing memory more carefully usually effects faster task times. Merging from fewer files also decreases the chance of failure and reduces seeks across all drives (by spreading output over fewer disks). A precise memory footprint also helped application authors calculate the framework overhead (both memory and number of spills) from the map output size without considering the number of reducers. That said, jobs matching particular profiles admit far more aggressive optimization, particularly if some of the use cases are ignored. Records larger than the sort buffer, user-defined comparators (particularly on deserialized objects), the combiner, and the intermediate data format restrict the solution space and complicate implementations. There's certainly fat to be trimmed from the general implementation, but restricting the problem will admit far more streamlined solutions than identifying and branching on all the special cases. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093174#comment-13093174 ] He Yongqiang commented on MAPREDUCE-2841: - bq. The bucketed sort used from 0.10 to 0.16 had more internal fragmentation and a less predictable memory footprint (particularly for jobs with lots of reducers). If the java impl use the similar impl as the c++ one here, the only difference will be language. right? Sorry, can you explain more about how the c++ can do a better job here for predictable memory footprint? in the current java impl, all records (no matter which reducer it is going) are stored in a central byte array. In the c++ impl, on one mapper task, each reducer will have one corresponding partition bucket which maintains its own memory buffer. From what i understand, one partition bucket is for one reducer. and all records going to that reducer from the current maptask are stored there, will be sorted and spilled from there. From the sort part is that it save the number of comparison since the original sort will need to compared records from difference reducers. And the c++ impl has trick of doing prefix comparison which reduces the number of cpu ops (8 bytes compare - one long cmp op). bq. Subsequent implementations focused on reducing the number of spills for each task, because the cost of spilling dominated the cost of the sort.Even with a significant speedup in the sort step, avoiding a merge by managing memory more carefully usually effects faster task times. I totally agree the spill will be the dominate factor if it is there. So here comes the problem that how much more memory the java impl will need compared to the c++ one. 20% or 50% or 100%? so we can calculate the chance of avoidable spilling if using the c++ impl. (Note: based on our analysis on jobs running during the past one month, most jobs need to shuffle less than 700MB data per mapper.) Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093291#comment-13093291 ] Chris Douglas commented on MAPREDUCE-2841: -- bq. If the java impl use the similar impl as the c++ one here, the only difference will be language. right? Yes, but the language difference includes other overheads (more below). bq. Sorry, can you explain more about how the c++ can do a better job here for predictable memory footprint? in the current java impl, all records (no matter which reducer it is going) are stored in a central byte array. In the c++ impl, on one mapper task, each reducer will have one corresponding partition bucket which maintains its own memory buffer. From what i understand, one partition bucket is for one reducer. and all records going to that reducer from the current maptask are stored there, will be sorted and spilled from there. Each partition bucket maintins its own memory buffer, so the memory consumed by the collection framework includes the unused space in all the partition buffers. I'm calling that, possibly imprecisely, internal fragmentation. The {{RawComparator}} interface also requires that keys be contiguous, introducing other waste if the partition's collection buffer were not copied whenever it is expanded (as in 0.16; the expansion/copying overhead also harms performance and makes memory usage hard to predict because both src and dst buffers exist simultaneously), i.e. a key partially serialized at the end of a slab must be realigned in a new slab. This happens at the end of the circular buffer in the current implementation, but would happen on the boundary of every partition collector chunk. That internal fragmentation creates unused buffer space that prematurely triggers a spill to reclaim the memory. Allocating smaller slabs decreases internal fragmentation, but also adds an ~8 byte object tracking overhead and GC cycles. In contrast, large allocations (like the single collection buffer) are placed directly in permgen. The 4 byte overhead per record to track the partition is a space savings over slabs exactly matching each record size, requiring at least 8 bytes per record if naively implemented. The current implementation is oriented toward stuffing the most records into a precisely fixed amount of memory, and adopts a few assumptions: 1) one should spill as little as possible 2) if spilling is required, at least don't block the mapper 3) packing the most records into each spill favors MapTasks with combiners. If there are cases (we all acknowledge that there are) where spilling more often but _faster_ can compensate for that difference, then it's worth reexamining those assumptions. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093324#comment-13093324 ] He Yongqiang commented on MAPREDUCE-2841: - sorry, i am kind of confused. i may should make me more clear: we are trying to evaluate and compare the c++ impl in HCE (and also this jira) and doing a pure java re-impl. So the thing that we mostly cared about is that is there sth that the c++ impl can do and a java re-impl can not. And if there is, we need to find out how much is that difference. And from there we can have a better understand of each approach and decide which approach to go. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092522#comment-13092522 ] Binglin Chang commented on MAPREDUCE-2841: -- bq. The goal of the last few implementations was a predictable memory footprint. Predictable memory foot print is very important, I strongly agree. Observations of our cluster's memory status shows that most OOM or high memory consumption cases are caused by big Key/Value in InputReader, Key/Value writable and merge. Sortspill is much more stable. I think in many cases predictable memory control is enough, rather than precise memory control, since it's impractical. We can use some dynamic memory if it is in a predicable range, for example +/-20%, +-30%, etc. Just an idea, what if memory related configurations can be a random variable, with mean variance? Can this leads to better resource utilization? A fixed memory bound always means application will request more memory than they really need. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092524#comment-13092524 ] Binglin Chang commented on MAPREDUCE-2841: -- bq. Can you summarize how the memory management works in the current patch? KeyValue Buffer memory management in the current patch is very simple, it has three parts: MemoryPool Hold the buffer of size io.sort.mb, and track current buffer usage notice that this buffer will only occupy virtual memory not RSS(memory really used) if the memory is not actually accessed, this is better than java because java initialize arrays. Memory lazy allocation is a beautiful feature :) MemoryBlock Small chunk of memory block backed by MemoryPool, used by PartitionBucket the default size of MemoryBlock = ceil(io.sort.mb / partition / 4 / MIN_BLOCK_SIZE) / MIN_BLOCK_SIZE currently MIN_BLOCK_SIZE == 32K, it should be dynamically tuned according to partition number io.sort.mb The purpose of MemoryBlock is to reduce CPU cache miss. When sorting large indirect addressed KV pairs, I guess the sort time will be dominated by RAM random reads, so MemoryBlock is used to let each bucket get relatively continous memory. PartitionBucket Store KV pairs for a partition, it has two arrays: vectorMemoryBlock * blocks blocks used by this bucket vectoruint32_t offsets KV pair start offset in MemoryPool this vector is not under memory control(in io.sort.mb) yet, a bug needs to be fixed (use memory of MemoryPool, use MemoryBlock directly or move backward from buffer end) it uses less memory(1/3) than java kvindices, and use 1/2 of io.sort.mb memory at most (when all k/v are empty), so it won't be much problem currently Limitations of this approach: Large partition number leads to small MemoryBlock Large Key/Value can cause memory holes in small MemoryBlock It's difficult to determine block size, since it relates to K/V size(like the old io.sort.record.percent), 200MB memory can only hold 12800 16K MemoryBlocks, so if average K/V size is a little bigger than 8K, half of the memory will likely be wasted. This approach will not work well when partition number Key/Value size is large, but this is rare case, and it can be improved, just for example, we can use MemoryPool directly (disable MemoryBlock) if io.sort.mb/partiion number is too small The other thing related to this is this approach only support simple synchronized collect/spill, I think this will not harm performance very much. Asynchronized collect/spill needs tuning of io.sort.spill.percent, and we can make sortspill really fast so parallel collect spill is not so important as before, we can also let the original mapper thread to do sortspill by enabling parallel sortspill. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092596#comment-13092596 ] Chris Douglas commented on MAPREDUCE-2841: -- {quote}Just an idea, what if memory related configurations can be a random variable, with mean variance? Can this leads to better resource utilization? A fixed memory bound always means application will request more memory than they really need. I think in many cases predictable memory control is enough, rather than precise memory control, since it's impractical. We can use some dynamic memory if it is in a predicable range, for example +/-20%, +-30%, etc.{quote} The fixed memory bound definitely causes resource waste. Not only will users ask for more memory than they need (particularly since most applications are not tightly tuned), but in our clusters, users will just as often request far too little. Because tasks' memory management is uniformly specified within a job, there isn't even an opportunity for the framework to adapt to skew. The random memory config is an interesting idea, but failed tasks are regrettable and expensive waste. For pipelines with SLAs, random failures will probably motivate users to jack up their memory requirements to match the range (which, if configurable, seems to encode the same contract). The precise specification was avoiding OOMs; because the collection is across a JNI boundary, a relaxed predictable memory footprint could be easier to deploy, assuming a hard limit in the native code to avoid swapping. Thanks for the detail on the collection data structures. That makes it much easier to orient oneself in the code. A few quick notes on your [earlier|https://issues.apache.org/jira/browse/MAPREDUCE-2841?focusedCommentId=13086973page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086973] comment: Adding the partition to the record, again, was to make the memory more predictable. The overhead in Java tracking thousands of per-partition buckets (many going unused) was worse than the per-record overhead, particularly in large jobs. Further, user comparators are often horribly inefficient, so the partition comparison and related hit to its performance was in the noise. The cache miss is real, but hard to reason about without leaving the JVM. The decorator-based stream is/was? required by the serialization interface. While the current patch only supports records with a known serialized length, the contract for other types is more general. Probably too general, but users with occasional several-hundred MB records (written in chunks) exist. Supporting that in this implementation is not a critical use case, since they can just use the existing collector. Tuning this to handle memcmp types could also put the burden of user comparators on the serialization frameworks, which is probably the best strategy. Which is to say: obsoleting the existing collection framework doesn't require that this support all of its use cases, if some of those can be worked around more competently elsewhere. If its principal focus is performance, it may make sense not to support inherently slow semantics. Which brings up a point: what is the scope of this JIRA? A full, native task runtime is a formidable job. Even if it only supported memcmp key types, no map-side combiner, no user-defined comparators, and records smaller than its intermediate buffer, such an improvement would still cover a lot of user jobs. It might make sense to commit that subset as optional functionality first, then iterate based on feedback. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch, dualpivot-0.patch, dualpivotv20-0.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092303#comment-13092303 ] Binglin Chang commented on MAPREDUCE-2841: -- Hi, Scott bq. there is quite a bit of juice to squeeze from the Java implementation. I agree with that Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092322#comment-13092322 ] Binglin Chang commented on MAPREDUCE-2841: -- Has someone already done a benchmark of hadoop running on java 7 vs java 6, and share some results? I'm afraid I don't have enough resource to do standard benchmark, I can do some simple tests but may not convincing. Hadoop uses it's own QuickSort HeapSort implementation and interface, if dual pivot quicksort Timsort is much faster, I think we should do some test, and add it to hadoop(this does not require java7). The current implementation is very naive, and has a long way to be further optimized. For example, sort just use std::sort. The (very)long term goal for this work, is to provide a independent task-level native runtime and API. Users can use native api to develop applications, but java application also get part of the performance benefits. It opens up further optimization possibilities, both in framework and application layer. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092160#comment-13092160 ] Scott Carey commented on MAPREDUCE-2841: I think there is quite a bit of juice to squeeze from the Java implementation. For example, Java 7 uses a different sort algorithm that is often 2x as fast as Java 6 for objects (dual pivot quicksort) and a faster mergesort implementation too for arrays (TimSort). Java can (and likely will in the 0.23 branch) also benefit from hardware CRC. After that, I wonder how much faster a native implementation would actually be. Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091491#comment-13091491 ] MengWang commented on MAPREDUCE-2841: - Good job! Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086977#comment-13086977 ] Binglin Chang commented on MAPREDUCE-2841: -- This patch is for 0.20 branch Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086978#comment-13086978 ] Binglin Chang commented on MAPREDUCE-2841: -- I think this work can help improving hadoop in many ways: # Reduce total resource consumption, mainly CPU # Speed up job execution, better response time # More precise memory control, important feature in production system # More programming interface, c/cpp, python, etc. # Opens up further optimization possibility Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux Reporter: Binglin Chang Assignee: Binglin Chang Attachments: MAPREDUCE-2841.v1.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/s). 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used. There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira