Hello Luke, this is only partly implemented. You can do this and I already did do this but this is sadly not in a perfect state. boto3 itself seems to be lacking a proper file-like class. You can get the contents of a file in S3 as https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html#botocore.response.StreamingBody . This sadly seems to be missing a seek method. In my case I did access parquet files on S3 with per-column access using the simplekv project. There a small file-like class is implemented on top of boto (but not boto3): https://github.com/mbr/simplekv/blob/master/simplekv/net/botostore.py#L93 . This is what you are looking for, just the wrong boto package as well as I know that this implementation is sadly leaking http-connections and thus when you access too many files (even in serial) at once, your network will suffer. Cheers Uwe
On Thu, Oct 11, 2018, at 8:01 PM, Luke wrote: > I have parquet files (each self contained) in S3 and I want to read > certain columns into a pandas dataframe without reading the entire > object out of S3.> > Is this implemented? boto3 in python supports reading from offsets in > an S3 object but I wasn't sure anyone has made that work with a > parquet file corresponding to certain columns?> > thanks, > Luke
