GitHub user dipk-mish7 created a discussion: pd.read_parquet("gs://...") fails
with CURL error 56 when HTTPS_PROXY is set, pyarrow 23 C++ GCS client ignores
NO_PROXY
We recently upgraded from pyarrow=12 to pyarrow=23 and started seeing this
error when reading parquet files from GCS in environments where HTTPS_PROXY is
set:
OSError: google::cloud::Status(UNAVAILABLE: Retry policy exhausted ...
PerformWork() - CURL error [56]=Failure when receiving data from the peer)
After investigating, we found that setting empty storage option was letting me
read the file
df = pd.read_parquet("gs://bucket/file.parquet", storage_options={})
Questions:
1. Is this a known limitation of the C++ GCS client that it ignores NO_PROXY?
2. Is there any way to make pd.read_parquet("gs://...") honour NO_PROXY without
requiring a code change (e.g. an env var or config option)?
3. Should storage_options={} be considered the recommended pattern going
forward on pyarrow 13+?
GitHub link: https://github.com/apache/arrow/discussions/49979
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]