Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/5324#issuecomment-88746568
  
    After testing for a while, it seems that the retry does not work, but the 
timeout on client side can help:
    ```
    15/04/01 22:42:14 WARN PythonRDD: Timed out after 4 seconds, retry once
    15/04/01 22:42:14 ERROR PythonRDD: Error while sending iterator
    java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at 
org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:624)
    **********************************************************************
    File "/Users/davies/work/spark/python/pyspark/rdd.py", line 1090, in 
__main__.RDD.variance
    Failed example:
        sc.parallelize([1, 2, 3]).variance()
    Exception raised:
        Traceback (most recent call last):
          File "//anaconda/lib/python2.7/doctest.py", line 1315, in __run
            compileflags, 1) in test.globs
          File "<doctest __main__.RDD.variance[0]>", line 1, in <module>
            sc.parallelize([1, 2, 3]).variance()
          File "/Users/davies/work/spark/python/pyspark/rdd.py", line 1093, in 
variance
            return self.stats().variance()
          File "/Users/davies/work/spark/python/pyspark/rdd.py", line 948, in 
stats
            return self.mapPartitions(lambda i: 
[StatCounter(i)]).reduce(redFunc)
          File "/Users/davies/work/spark/python/pyspark/rdd.py", line 745, in 
reduce
            vals = self.mapPartitions(func).collect()
          File "/Users/davies/work/spark/python/pyspark/rdd.py", line 720, in 
collect
            return list(_load_from_socket(port, self._jrdd_deserializer))
          File "/Users/davies/work/spark/python/pyspark/rdd.py", line 120, in 
_load_from_socket
            for item in serializer.load_stream(rf):
          File "/Users/davies/work/spark/python/pyspark/serializers.py", line 
131, in load_stream
            yield self._read_with_length(stream)
          File "/Users/davies/work/spark/python/pyspark/serializers.py", line 
148, in _read_with_length
            length = read_int(stream)
          File "/Users/davies/work/spark/python/pyspark/serializers.py", line 
526, in read_int
            length = stream.read(4)
          File "//anaconda/lib/python2.7/socket.py", line 380, in read
            data = self._sock.recv(left)
        timeout: timed out
    **********************************************************************
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to