Github user davies commented on the pull request:
https://github.com/apache/spark/pull/5324#issuecomment-88746568
After testing for a while, it seems that the retry does not work, but the
timeout on client side can help:
```
15/04/01 22:42:14 WARN PythonRDD: Timed out after 4 seconds, retry once
15/04/01 22:42:14 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at
org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:624)
**********************************************************************
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 1090, in
__main__.RDD.variance
Failed example:
sc.parallelize([1, 2, 3]).variance()
Exception raised:
Traceback (most recent call last):
File "//anaconda/lib/python2.7/doctest.py", line 1315, in __run
compileflags, 1) in test.globs
File "<doctest __main__.RDD.variance[0]>", line 1, in <module>
sc.parallelize([1, 2, 3]).variance()
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 1093, in
variance
return self.stats().variance()
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 948, in
stats
return self.mapPartitions(lambda i:
[StatCounter(i)]).reduce(redFunc)
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 745, in
reduce
vals = self.mapPartitions(func).collect()
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 720, in
collect
return list(_load_from_socket(port, self._jrdd_deserializer))
File "/Users/davies/work/spark/python/pyspark/rdd.py", line 120, in
_load_from_socket
for item in serializer.load_stream(rf):
File "/Users/davies/work/spark/python/pyspark/serializers.py", line
131, in load_stream
yield self._read_with_length(stream)
File "/Users/davies/work/spark/python/pyspark/serializers.py", line
148, in _read_with_length
length = read_int(stream)
File "/Users/davies/work/spark/python/pyspark/serializers.py", line
526, in read_int
length = stream.read(4)
File "//anaconda/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
timeout: timed out
**********************************************************************
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]