Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1481#issuecomment-51158962
  
    Okay here's the deal - this patch is causing some type of non-deterministic 
failure which seems related to the shuffle write path. It looks like the test 
is hanging, but up on further investigation, they issue is that on Jenkins I 
see a single shuffle repeated over and over again and failing infinitely many 
times. The failure is that the shuffle file is not found, even though there is 
no ostensible error message printed when writing outputs. Here is an exerpt. 
@sryza it's worth determining if you can reproduce this locally:
    
    ```
    14/08/04 22:38:10.741 INFO TaskSetManager: Starting task 0.0 in stage 2.2 
(TID 15, localhost, PROCESS_LOCAL, 1185 bytes)
    14/08/04 22:38:10.741 INFO Executor: Running task 0.0 in stage 2.2 (TID 15)
    14/08/04 22:38:10.742 INFO BlockManager: Found block broadcast_5 locally
    14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/0d/shuffle_1_0_0
    14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/0e/shuffle_1_0_1
    14/08/04 22:38:10.743 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/0f/shuffle_1_0_2
    14/08/04 22:38:10.743 INFO BlockManager: Found block rdd_8_0 locally
    14/08/04 22:38:10.743 INFO BlockManager: Found block rdd_11_0 locally
    14/08/04 22:38:10.749 INFO Executor: Finished task 0.0 in stage 2.2 (TID 
15). 1856 bytes result sent to driver
    14/08/04 22:38:10.749 INFO TaskSetManager: Starting task 1.0 in stage 2.2 
(TID 16, localhost, PROCESS_LOCAL, 1185 bytes)
    14/08/04 22:38:10.749 INFO Executor: Running task 1.0 in stage 2.2 (TID 16)
    14/08/04 22:38:10.750 INFO BlockManager: Found block broadcast_5 locally
    14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/0e/shuffle_1_1_0
    14/08/04 22:38:10.751 INFO TaskSetManager: Finished task 0.0 in stage 2.2 
(TID 15) in 8 ms on localhost (1/3)
    14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/0f/shuffle_1_1_1
    14/08/04 22:38:10.751 INFO ShuffleBlockManager: Removed existing shuffle 
file /tmp/spark-local-20140804223809-dc37/10/shuffle_1_1_2
    14/08/04 22:38:10.751 INFO CacheManager: Partition rdd_8_1 not found, 
computing it
    14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
maxBytesInFlight: 50331648, targetRequestSize: 10066329
    14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
Getting 3 non-empty blocks out of 3 blocks
    14/08/04 22:38:10.751 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
Started 0 remote fetches in 0 ms
    14/08/04 22:38:10.751 ERROR BlockFetcherIterator$BasicBlockFetcherIterator: 
Error occurred while fetching local blocks
    java.io.FileNotFoundException: 
/tmp/spark-local-20140804223809-dc37/10/shuffle_2_1_1 (No such file or 
directory)
            at java.io.RandomAccessFile.open(Native Method)
            at java.io.RandomAccessFile.<init>(RandomAccessFile.java:241)
            at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:94)
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to