[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061337#comment-17061337 ] Dongjoon Hyun commented on SPARK-27665: --- Thank you for confirmation! > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the current approach in OneForOneBlockFetcher, we reuse the OpenBlocks > protocol to describe the fetch request for shuffle blocks, and it causes the > extension work for shuffle fetching like SPARK-9853 and SPARK-25341 very > awkward. We need a new protocol only for shuffle blocks fetcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060870#comment-17060870 ] Wenchen Fan commented on SPARK-27665: - I believe it's fixed by SPARK-29435 > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the current approach in OneForOneBlockFetcher, we reuse the OpenBlocks > protocol to describe the fetch request for shuffle blocks, and it causes the > extension work for shuffle fetching like SPARK-9853 and SPARK-25341 very > awkward. We need a new protocol only for shuffle blocks fetcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060767#comment-17060767 ] Dongjoon Hyun commented on SPARK-27665: --- [~koert]. Do you still hit the same issue at 3.0.0-preview2? > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the current approach in OneForOneBlockFetcher, we reuse the OpenBlocks > protocol to describe the fetch request for shuffle blocks, and it causes the > extension work for shuffle fetching like SPARK-9853 and SPARK-25341 very > awkward. We need a new protocol only for shuffle blocks fetcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948979#comment-16948979 ] koert kuipers commented on SPARK-27665: --- i tried using spark.shuffle.useOldFetchProtocol=true while using spark 3 (master) to launch job, with spark 2.4.1 shuffle service running in yarn. i cannot get it to work. for example on one cluster i saw: {code} Error occurred while fetching local blocks java.nio.file.NoSuchFileException: /mnt1/yarn/usercache/hadoop/appcache/application_1570697024032_0058/blockmgr-d1d009b1-1c95-4e2a-9a71-0ff20078b9a8/38/shuffle_0_0_0.index {code} on another: {code} org.apache.spark.shuffle.FetchFailedException: /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:596) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:511) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:67) at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458) at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:266) at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:337) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:850) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:850) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:127) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:455) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:458) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.NoSuchFileException: /data/9/hadoop/nm/usercache/koert/appcache/application_1568061697664_8250/blockmgr-57f28014-cdf2-431e-8e11-447ba5c2b2f2/0b/shuffle_0_0_0.index at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:204) at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:551) at org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchLocalBlocks(ShuffleBlockFetcherIterator.scala:349) at org.apache.spark.storage.ShuffleBlockFetcherIterator.initialize(ShuffleBlockFetcherIterator.scala:391) at org.apache.spark.storage.ShuffleBlockFetcherIterator.(ShuffleBlockFetcherIterator.scala:161) at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:60) at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:172) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:327) at org.apache.spark.rdd.RDD.iterator(RDD.scala:291) ... 11 more {code} > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 >
[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932833#comment-16932833 ] koert kuipers commented on SPARK-27665: --- oh wait i didnt realize there is a setting spark.shuffle.useOldFetchProtocol never mind! i will try that > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the current approach in OneForOneBlockFetcher, we reuse the OpenBlocks > protocol to describe the fetch request for shuffle blocks, and it causes the > extension work for shuffle fetching like SPARK-9853 and SPARK-25341 very > awkward. We need a new protocol only for shuffle blocks fetcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27665) Split fetch shuffle blocks protocol from OpenBlocks
[ https://issues.apache.org/jira/browse/SPARK-27665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16932823#comment-16932823 ] koert kuipers commented on SPARK-27665: --- i am a little nervous that this got merged into master without resolving the blocker SPARK-27780 currently this means spark 3.x will not be able to support dynamic allocation at all on yarn clusters that have spark 2 shuffle managers installed, which is all our client clusters pretty much. > Split fetch shuffle blocks protocol from OpenBlocks > --- > > Key: SPARK-27665 > URL: https://issues.apache.org/jira/browse/SPARK-27665 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the current approach in OneForOneBlockFetcher, we reuse the OpenBlocks > protocol to describe the fetch request for shuffle blocks, and it causes the > extension work for shuffle fetching like SPARK-9853 and SPARK-25341 very > awkward. We need a new protocol only for shuffle blocks fetcher. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org