cloud-fan opened a new pull request #23510: [SPARK-26590][SQL][CORE] make fetch-block-to-disk backward compatible URL: https://github.com/apache/spark/pull/23510 ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/16989 The fetch-block-to-disk feature is disabled by default, because it's not compatible with external shuffle service prior to Spark 2.2. The client sends stream request to fetch block chunks, and old shuffle service can't support it. This PR proposes a new approach: 1. extend `ChunkFetchRequest` to add an optional `fetchAsStream` boolean flag. It will only be encoded to the message when it's true. `ChunkFetchRequest` from old clients do not have this flag, which means this flag is false for them. 2. server side takes care of the new flag in `ChunkFetchRequest`. If the flag is true, return a new chunk stream response, otherwise return a normal chunk fetch response. 3. when client side sends `ChunkFetchRequest` with `fetchAsStream=true`, it will set up two callbacks for the new chunk stream response and the normal chunk fetch response. This is necessary because the server side may be an old version which ignores the `fetchAsStream` flag. This is fully compatible: 1. new client <-> new server: Definitely fine 2. old client <-> new server: The `ChunkFetchRequest` message doesn't have the `fetchAsStream` flag, the server treats it as a normal fetch request, and returns normal fetch request response. 3. new client <-> old server: The `ChunkFetchRequest` message contains the `fetchAsStream` flag, but the old server doesn't know about it and stops reading the message right before the `fetchAsStream` part. Then the old server returns normal chunk fetch response, and new client accept it. TODO: setup different versions of shuffle service and test it. ## How was this patch tested? existing tests.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
