Ruslan Dautkhanov created SPARK-26019: -----------------------------------------
Summary: pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates() Key: SPARK-26019 URL: https://issues.apache.org/jira/browse/SPARK-26019 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.0, 2.3.2 Reporter: Ruslan Dautkhanov Started happening after 2.3.1 -> 2.3.2 upgrade. {code:python} Exception happened during processing of request from ('127.0.0.1', 43418) ---------------------------------------- Traceback (most recent call last): File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 290, in _handle_request_noblock self.process_request(request, client_address) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 318, in process_request self.finish_request(request, client_address) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 331, in finish_request self.RequestHandlerClass(request, client_address, self) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 652, in __init__ self.handle() File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 263, in handle poll(authenticate_and_accum_updates) File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 238, in poll if func(): File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", line 251, in authenticate_and_accum_updates received_token = self.rfile.read(len(auth_token)) TypeError: object of type 'NoneType' has no len() {code} Error happens here: https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 The PySpark code was just running a simple pipeline of binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) and then converting it to a dataframe and running a count on it. It seems error is flaky - on next rerun it didn't happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org