Saurabh Bajaj created ARROW-6150:
------------------------------------

             Summary: Intermittent Pyarrow HDFS IO error
                 Key: ARROW-6150
                 URL: https://issues.apache.org/jira/browse/ARROW-6150
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.14.1
            Reporter: Saurabh Bajaj


I'm running a Dask-YARN job that dumps a results dictionary into HDFS (code 
shown in traceback below) using PyArrow's HDFS IO library. However, the job 
intermittently runs into the error shown below, not every run, only sometimes. 
I'm unable to determine the root cause of this issue.

 

{{ File "/extractor.py", line 87, in __call__ json.dump(results_dict, 
fp=_UTF8Encoder(f), indent=4) File "pyarrow/io.pxi", line 72, in 
pyarrow.lib.NativeFile.__exit__ File "pyarrow/io.pxi", line 130, in 
pyarrow.lib.NativeFile.close File "pyarrow/error.pxi", line 87, in 
pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS CloseFile failed, 
errno: 255 (Unknown error 255) Please check that you are connecting to the 
correct HDFS RPC port}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to