This is an automated email from the ASF dual-hosted git repository. westonpace pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git
The following commit(s) were added to refs/heads/main by this push: new 9750a64 Adding anonymous flag to s3 (#70) 9750a64 is described below commit 9750a6402436f0379a9a7bde4184076c615f5a93 Author: Tomek Drabas <draba...@gmail.com> AuthorDate: Fri Sep 10 16:46:18 2021 -0700 Adding anonymous flag to s3 (#70) * Adding anonymous flag to s3 * Fixing missing comma * Info about s3 credentials --- python/source/io.rst | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/python/source/io.rst b/python/source/io.rst old mode 100644 new mode 100755 index 2c1fd82..db03d74 --- a/python/source/io.rst +++ b/python/source/io.rst @@ -394,7 +394,10 @@ partitioned data coming from remote sources like S3 or HDFS. from pyarrow import fs # List content of s3://ursa-labs-taxi-data/2011 - s3 = fs.SubTreeFileSystem("ursa-labs-taxi-data", fs.S3FileSystem(region="us-east-2")) + s3 = fs.SubTreeFileSystem( + "ursa-labs-taxi-data", + fs.S3FileSystem(region="us-east-2", anonymous=True) + ) for entry in s3.get_file_info(fs.FileSelector("2011", recursive=True)): if entry.type == fs.FileType.File: print(entry.path) @@ -419,7 +422,7 @@ by ``month`` using .. testcode:: - dataset = ds.dataset("s3://ursa-labs-taxi-data/2011", + dataset = ds.dataset("s3://ursa-labs-taxi-data/2011", partitioning=["month"]) for f in dataset.files[:10]: print(f) @@ -447,6 +450,27 @@ or :meth:`pyarrow.dataset.Dataset.to_batches` like you would for a local one. It is possible to load partitioned data also in the ipc arrow format or in feather format. +.. warning:: + + If the above code throws an error most likely the reason is your + AWS credentials are not set. Follow these instructions to get + ``AWS Access Key Id`` and ``AWS Secret Access Key``: + `AWS Credentials <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html>`_. + + The credentials are normally stored in ``~/.aws/credentials`` (on Mac or Linux) + or in ``C:\Users\<USERNAME>\.aws\credentials`` (on Windows) file. + You will need to either create or update this file in the appropriate location. + + The contents of the file should look like this: + + .. code-block:: bash + + [default] + aws_access_key_id=<YOUR_AWS_ACCESS_KEY_ID> + aws_secret_access_key=<YOUR_AWS_SECRET_ACCESS_KEY> + + + Write a Feather file ====================