Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-51158962
Okay here's the deal - this patch is causing some type of non-deterministic
failure which seems related to the shuffle write path. It looks like the test
is hanging, but up on further investigation, they issue is that on Jenkins I
see a single shuffle repeated over and over again and failing infinitely many
times. The failure is that the shuffle file is not found, even though there is
no ostensible error message printed when writing outputs. Here is an exerpt.
@sryza it's worth determining if you can reproduce this locally:
```
14/08/04 22:38:10.741 INFO TaskSetManager: Starting task 0.0 in stage 2.2
(TID 15, localhost, PROCESS_LOCAL, 1185 bytes)
14/08/04 22:38:10.741 INFO Executor: Running task 0.0 in stage 2.2 (TID 15)
14/08/04 22:38:10.742 INFO BlockManager: Found block broadcast_5 locally
14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/0d/shuffle_1_0_0
14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/0e/shuffle_1_0_1
14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/0f/shuffle_1_0_2
14/08/04 22:38:10.743 INFO BlockManager: Found block rdd_8_0 locally
14/08/04 22:38:10.743 INFO BlockManager: Found block rdd_11_0 locally
14/08/04 22:38:10.749 INFO Executor: Finished task 0.0 in stage 2.2 (TID
15). 1856 bytes result sent to driver
14/08/04 22:38:10.749 INFO TaskSetManager: Starting task 1.0 in stage 2.2
(TID 16, localhost, PROCESS_LOCAL, 1185 bytes)
14/08/04 22:38:10.749 INFO Executor: Running task 1.0 in stage 2.2 (TID 16)
14/08/04 22:38:10.750 INFO BlockManager: Found block broadcast_5 locally
14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/0e/shuffle_1_1_0
14/08/04 22:38:10.751 INFO TaskSetManager: Finished task 0.0 in stage 2.2
(TID 15) in 8 ms on localhost (1/3)
14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/0f/shuffle_1_1_1
14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle
file /tmp/spark-local-20140804223809-dc37/10/shuffle_1_1_2
14/08/04 22:38:10.751 INFO CacheManager: Partition rdd_8_1 not found,
computing it
14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
maxBytesInFlight: 50331648, targetRequestSize: 10066329
14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Getting 3 non-empty blocks out of 3 blocks
14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
Started 0 remote fetches in 0 ms
14/08/04 22:38:10.751 ERROR BlockFetcherIterator$BasicBlockFetcherIterator:
Error occurred while fetching local blocks
java.io.FileNotFoundException:
/tmp/spark-local-20140804223809-dc37/10/shuffle_2_1_1 (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:94)
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]