GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/2753
[SPARK-3453] Netty-based BlockTransferService, extracted from Spark core This PR encapsulates #2330, which is itself a continuation of #2240. The first goal of this PR is to provide an alternate, simpler implementation of the ConnectionManager which is based on Netty. In addition to this goal, however, we want to resolve [SPARK-3796](https://issues.apache.org/jira/browse/SPARK-3796), which calls for a standalone shuffle service which can be integrated into the YARN NodeManager, Standalone Worker, or on its own. This PR makes the first step in this direction by ensuring that the actual Netty service is as small as possible and extracted from Spark core. Given this, we should be able to construct this standalone jar which can be included in other JVMs without incurring significant dependency or runtime issues. The actual work to ensure that such a standalone shuffle service would work in Spark will be left for a future PR, however. In order to minimize dependencies and allow for the service to be long-running (possibly much longer-running than Spark, and possibly having to support multiple version of Spark simultaneously), the entire service has been ported to Java, where we have full control over the binary compatibility of the components and do not depend on the Scala runtime or version. These issues: have been addressed by folding in #2330: SPARK-3453: Refactor Netty module to use BlockTransferService interface SPARK-3018: Release all buffers upon task completion/failure SPARK-3002: Create a connection pool and reuse clients across different threads SPARK-3017: Integration tests and unit tests for connection failures SPARK-3049: Make sure client doesn't block when server/connection has error(s) SPARK-3502: SO_RCVBUF and SO_SNDBUF should be bootstrap childOption, not option SPARK-3503: Disable thread local cache in PooledByteBufAllocator TODO before mergeable: [ ] Implement uploadBlock() [ ] Unit tests for RPC side of code [ ] Performance testing [ ] Turn OFF by default (currently on for unit testing) You can merge this pull request into a Git repository by running: $ git pull https://github.com/aarondav/spark netty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2753 ---- commit 165eab1518f5184ef9609f26d374c5ccefd05472 Author: Reynold Xin <r...@apache.org> Date: 2014-09-09T07:29:33Z [SPARK-3453] Refactor Netty module to use BlockTransferService. Also includes some partial support for uploading blocks. commit 1760d3292ecf262e4c77c9e3b28bfd2900d25840 Author: Reynold Xin <r...@apache.org> Date: 2014-09-09T07:42:37Z Use Epoll.isAvailable in BlockServer as well. commit 2b44cf1b7547919bbe7386e954fe2f56be046790 Author: Reynold Xin <r...@apache.org> Date: 2014-09-09T21:36:31Z Added more documentation. commit 064747b50a591acb132b2c750957e79f54dfa88f Author: Reynold Xin <r...@apache.org> Date: 2014-09-10T06:38:38Z Reference count buffers and clean them up properly. commit b5c8d1fca6d3cf5c2b95395310200c8149a7eb16 Author: Reynold Xin <r...@apache.org> Date: 2014-09-10T08:09:44Z Fixed ShuffleBlockFetcherIteratorSuite. commit 108c9edaed06c5e046a21c9a8e54c50390da9a0b Author: Reynold Xin <r...@apache.org> Date: 2014-09-10T08:10:04Z Forgot to add TestSerializer to the commit list. commit 1be4e8ee7d932821c789cb974310e5d59df4ff84 Author: Reynold Xin <r...@apache.org> Date: 2014-09-10T08:11:40Z Shorten NioManagedBuffer and NettyManagedBuffer class names. commit cb589ec7b6d3758498249b63b395634efb83d8ba Author: Reynold Xin <r...@apache.org> Date: 2014-09-11T02:01:23Z Added more test cases covering cleanup when fault happens in ShuffleBlockFetcherIteratorSuite commit 5cd33d7798ae742e76107bb976d8478ab9476ae7 Author: Reynold Xin <r...@apache.org> Date: 2014-09-11T02:55:54Z Fixed style violation. commit 9e0cb8736be6d38e3f30766271d28875ceca1ae8 Author: Reynold Xin <r...@apache.org> Date: 2014-09-11T04:04:56Z Fixed BlockClientHandlerSuite commit d23ed7bfd912770ace7eed7cd0dff2db6ac826e3 Author: Reynold Xin <r...@apache.org> Date: 2014-09-12T01:28:45Z Incorporated feedback from Norman: - use same pool for boss and worker - remove ioratio - disable caching of byte buf allocator - childoption sendbuf/receivebuf - fire exception through pipeline In addition: - fire failure handler BlockFetchingListener at least once per block. - enabled a bunch of ignored tests commit b2f3281d0de540d38ea5b4c7bf576b775405d56d Author: Reynold Xin <r...@apache.org> Date: 2014-09-12T05:12:08Z Added connection pooling. commit 14323a55ebfa7ccc684c2ae78eac299a4426b353 Author: Reynold Xin <r...@apache.org> Date: 2014-09-12T05:13:02Z Removed BlockManager.getLocalShuffleFromDisk. commit f0a16e9ec7d5c811dff3cd5219548e05077099c8 Author: Reynold Xin <r...@apache.org> Date: 2014-09-12T07:40:53Z Fixed test hanging. commit 519d64dcb7768b3657438a4cfc85ee8065f56c2a Author: Reynold Xin <r...@apache.org> Date: 2014-09-12T21:18:58Z Mark private package visibility and MimaExcludes. commit c066309afbb0e248a8b2b808d997e6b37a2bff1e Author: Reynold Xin <r...@apache.org> Date: 2014-09-13T05:42:32Z Implement java.io.Closeable interface. commit 6afc435037a0448d6eb243bd18411ef25e3a2cf7 Author: Reynold Xin <r...@apache.org> Date: 2014-09-17T05:51:11Z Added logging. commit f63fb4c1976e503238b7d7151f8f45f40ced36e9 Author: Reynold Xin <r...@apache.org> Date: 2014-09-29T18:13:44Z Add more debug message. commit d68f3286a4a9795dfb61a8a63b8a20b3eafb4821 Author: Reynold Xin <r...@apache.org> Date: 2014-09-29T18:30:13Z Logging close() in case close() fails. commit 1bdd7eec5d9ddb5a9eb33c9733878aea3ca26ba6 Author: Reynold Xin <r...@apache.org> Date: 2014-09-29T19:07:53Z Fixed tests. commit bec4ea2b54659cfed6f54e527aa878dfbff829c7 Author: Reynold Xin <r...@apache.org> Date: 2014-09-29T19:22:01Z Removed OIO and added num threads settings. commit 4b18db29edcdb87577fd033835275fd1c2957dcd Author: Reynold Xin <r...@apache.org> Date: 2014-09-29T22:45:05Z Copy the buffer in fetchBlockSync. commit a0518c766f0f4eba24459ffac61dce789fc14092 Author: Reynold Xin <r...@apache.org> Date: 2014-09-30T02:22:34Z Implemented block uploads. commit 407e59afd3cb7385af9f63dc2263a40c7c21d783 Author: Reynold Xin <r...@apache.org> Date: 2014-09-30T02:37:28Z Fix style violation. commit f6c220df8406be14fbdb7270682727e1085518a4 Author: Reynold Xin <r...@apache.org> Date: 2014-09-30T06:30:17Z Merge with latest master. commit 5d98ce3de1deeeb7fbdc26b9303a591c46f1892b Author: Reynold Xin <r...@apache.org> Date: 2014-09-30T07:56:32Z Flip buffer. commit f7e7568414692989215d97abce9dda2fe172abb4 Author: Reynold Xin <r...@apache.org> Date: 2014-09-30T19:28:21Z Fixed spark.shuffle.io.receiveBuffer setting. commit c0cd242f375e939e1422e30d4b230a8a78b13b88 Author: Aaron Davidson <aa...@databricks.com> Date: 2014-10-06T00:58:43Z [SPARK-3453] Netty-based BlockTransferService, extracted from Spark core This PR encapsulates #2330, which is itself a continuation of #2240. The first goal of this PR is to provide an alternate, simpler implementation of the ConnectionManager which is based on Netty. In addition to this goal, however, we want to resolve [SPARK-3796](https://issues.apache.org/jira/browse/SPARK-3796), which calls for a standalone shuffle service which can be integrated into the YARN NodeManager, Standalone Worker, or on its own. This PR makes the first step in this direction by ensuring that the actual Netty service is as small as possible and extracted from Spark core. Given this, we should be able to construct this standalone jar which can be included in other JVMs without incurring significant dependency or runtime issues. The actual work to ensure that such a standalone shuffle service would work in Spark will be left for a future PR, however. In order to minimize dependencies and allow for the service to be long-running (possibly much longer-running than Spark, and possibly having to support multiple version of Spark simultaneously), the entire service has been ported to Java, where we have full control over the binary compatibility of the components and do not depend on the Scala runtime or version. These PRs have been addressed by folding in #2330: SPARK-3453: Refactor Netty module to use BlockTransferService interface SPARK-3018: Release all buffers upon task completion/failure SPARK-3002: Create a connection pool and reuse clients across different threads SPARK-3017: Integration tests and unit tests for connection failures SPARK-3049: Make sure client doesn't block when server/connection has error(s) SPARK-3502: SO_RCVBUF and SO_SNDBUF should be bootstrap childOption, not option SPARK-3503: Disable thread local cache in PooledByteBufAllocator TODO before mergeable: [ ] Implement uploadBlock() [ ] Unit tests for RPC side of code [ ] Performance testing [ ] Turn OFF by default (currently on for unit testing) ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org