Hi,



according to the docs 
https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html 
it's possible to define metadata for files in S3 object storage using the 
'default_metadata' argument of pyarrow.fs.S3FileSystem().

Experimentally this works only for certain keys:

Some standard keys like 'Content-Type', 'Content-Language' or 'Expires' do 
work, but others like 'ACL', 'foo' or 'x-amz-meta-thud' don't.

It seems that Arrow filters for only a few keys and throws away any further.



In the docs of open_append_stream() it reads 'Unsupported metadata keys will be 
ignored.'

What  does 'unsupported' mean here - unsupported by PyArrow or unsupported by 
the S3 implementation?

How can Arrow know which keys are supported by the implementation (and, yes, 
there are more implementations available beyond the original AWS S3)?



According to the AWS docs 
https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html it 
should be possible to use 'User-defined object metadata' with keys that start 
with 'x-amz-meta-'.

It would be nice to add such user-defined object metadata using PyArrow.

(Yes, using other tools user-defined object metadata work like charm for me, so 
it's not a limitation in S3.)



Am I missing anything?




Regards,


elveshoern32

Reply via email to