[jira] [Created] (SPARK-47702) Shuffle service endpoint is not removed from the locations list when RDD block is removed form a node.
mahesh kumar behera created SPARK-47702: --- Summary: Shuffle service endpoint is not removed from the locations list when RDD block is removed form a node. Key: SPARK-47702 URL: https://issues.apache.org/jira/browse/SPARK-47702 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.1 Reporter: mahesh kumar behera If SHUFFLE_SERVICE_FETCH_RDD_ENABLED is set to true, driver stores both executor end point and the external shuffle end points for a RDD block. When the RDD is migrated, the location info is updated to add the end point corresponds to new location and the old end point is removed. But currently, only the executor end point is removed. The shuffle service end point is not removed. This cause failure during RDD read if the shuffle service end point is chosen due to task locality. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47521) Fix file handle leakage during shuffle data read from external storage.
mahesh kumar behera created SPARK-47521: --- Summary: Fix file handle leakage during shuffle data read from external storage. Key: SPARK-47521 URL: https://issues.apache.org/jira/browse/SPARK-47521 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.4, 3.5.0, 3.4.1 Reporter: mahesh kumar behera In method FallbackStorage.read method, the file handle is not closed if there is a failure during read operation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47141) Support enabling migration of shuffle data directly to external storage using config parameter
[ https://issues.apache.org/jira/browse/SPARK-47141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mahesh kumar behera updated SPARK-47141: Summary: Support enabling migration of shuffle data directly to external storage using config parameter (was: Support shuffle migration to external storage) > Support enabling migration of shuffle data directly to external storage using > config parameter > -- > > Key: SPARK-47141 > URL: https://issues.apache.org/jira/browse/SPARK-47141 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently Spark supports migration of shuffle data to peer nodes during node > decommissioning. If peer nodes are not accessible, then Spark falls back to > external storage. User needs to provide the storage location path. There are > scenarios where user may want to migrate to external storage instead of peer > nodes. This may be because of unstable nodes or due to the need of > aggressive scale down. So user should be able to configure to migrate the > shuffle data directly to external storage if the use case permits. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47141) Support shuffle migration to external storage
mahesh kumar behera created SPARK-47141: --- Summary: Support shuffle migration to external storage Key: SPARK-47141 URL: https://issues.apache.org/jira/browse/SPARK-47141 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: mahesh kumar behera Fix For: 4.0.0 Currently Spark supports migration of shuffle data to peer nodes during node decommissioning. If peer nodes are not accessible, then Spark falls back to external storage. User needs to provide the storage location path. There are scenarios where user may want to migrate to external storage instead of peer nodes. This may be because of unstable nodes or due to the need of aggressive scale down. So user should be able to configure to migrate the shuffle data directly to external storage if the use case permits. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33545) Support Fallback Storage during Worker decommission
[ https://issues.apache.org/jira/browse/SPARK-33545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17807166#comment-17807166 ] mahesh kumar behera commented on SPARK-33545: - [~dongjoon] As per this PR, the shuffle read from fallback storage is done along with read from local block. Is there any reason for the same ? The local reads are done in single thread, is there any obvious issue you see if we read shuffle data from fallback storage in multiple threads ? > Support Fallback Storage during Worker decommission > --- > > Key: SPARK-33545 > URL: https://issues.apache.org/jira/browse/SPARK-33545 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44307) Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold.
mahesh kumar behera created SPARK-44307: --- Summary: Bloom filter is not added for left outer join if the left side table is smaller than broadcast threshold. Key: SPARK-44307 URL: https://issues.apache.org/jira/browse/SPARK-44307 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.4.1 Reporter: mahesh kumar behera Fix For: 3.5.0 In case of left outer join, even if the left side table is small enough to be broadcasted, shuffle join is used. This is because of the property of the left outer join. If the left side is broadcasted in left outer join, the result generated will be wrong. But this is not taken care of in bloom filter. While injecting the bloom filter, if lest side is smaller than broadcast threshold, bloom filter is not added. It assumes that the left side will be broadcast and there is no need for a bloom filter. This causes bloom filter optimization to be missed in case of left outer join with small left side and huge right-side table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org