xuanyuanking commented on a change in pull request #23470: 
[SPARK-26549][PySpark] Fix for python worker reuse take no effect for Python3
URL: https://github.com/apache/spark/pull/23470#discussion_r245604250
 
 

 ##########
 File path: python/pyspark/worker.py
 ##########
 @@ -446,7 +446,12 @@ def process():
         pickleSer._write_with_length((aid, accum._value), outfile)
 
     # check end of stream
-    if read_int(infile) == SpecialLengths.END_OF_STREAM:
+    res = read_int(infile)
+    if sys.version >= '3' and res == SpecialLengths.END_OF_DATA_SECTION:
 
 Review comment:
   Sorry for the mess @viirya, it's nothing to do with specific python version, 
just bug for `parrllelize(xrange(x))`. `xrange` and `range` in parallelize has 
different code path, in the scenario of using range, `FramedSerializer` will 
handle END_OF_DATA_SECTION, while using xrange, will not enter FramedSerializer 
and range is xrange in Python3...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to