Sorry for the slow response everyone. Thanks for the discussion and the help - 
I think I may have tracked down the issue and can confirm that this likely has 
nothing to do with PyArrow. 

Recently I had to implement a workaround due to a bug introduced in fsspec: 
https://github.com/fsspec/gcsfs/issues/404

This involved using a custom class to fix a directory creation issue, and this 
correlates with when I started having problems. This post on the GitHub issue 
appears to be relevant: 

“ Note that GCS does not have any directories below buckets. The online console 
and gsutils emulate buckets by using zero-length files, but they are not really 
directories. On the other hand, you can create any key without first making 
directories, and the intervening implied directories will be implicitly 
inferred.”

So, given this info, I believe that the workaround I used is creating a 
zero-length file in the root “directory” called “/“. That said, calling 
“isfile” on “/“ still returns False, but none the less, I have a hunch that 
it’s related to this workaround. 

I’m not exactly sure how to remedy this. I’d prefer to not have to re-upload 
and process the dataset, so I’m going to look into manually fixing the bucket. 
I’m also happy to hear any thoughts or suggestions on how to fix this as well.

Kelton.

> On Feb 23, 2022, at 4:20 PM, Micah Kornfield <[email protected]> wrote:
> 
> 
> > You might also try the GCS filesystem (released with 7.0.0) instead of
> going through fsspec.
> 
> I don't think the native GCS filesystem support is complete in 7.0.0, I think 
> if you are willing to compile from the latest commit in the repo it might be 
> useable.
> 
>> On Wed, Feb 23, 2022 at 11:41 AM Weston Pace <[email protected]> wrote:
>> I'm pretty sure GCS is similar to S3 in that there is no such thing as
>> a "directory".  Instead a directory is often emulated by an empty
>> file.  Note that the single file being detected is hires-sonde/ (with
>> a trailing slash).  I'm pretty sure this is the convention for
>> creating mock directories.  I'm guessing, if there were multiple
>> files, we would work ok because we just skip the empty files.
>> 
>> So perhaps this is a problem unique to gcsfs/fsspec and trying to read
>> an "empty directory".
>> 
>> You might also try the GCS filesystem (released with 7.0.0) instead of
>> going through fsspec.
>> 
>> On Wed, Feb 23, 2022 at 2:23 AM Joris Van den Bossche
>> <[email protected]> wrote:
>> >
>> >
>> > On Mon, 21 Feb 2022 at 00:04, Kelton Halbert <[email protected]> wrote:
>> >>
>> >> Hello,
>> >>
>> >> I’ve been learning and working with PyArrow recently for a project to 
>> >> store some atmospheric science data as part of a partitioned dataset, and 
>> >> recently the dataset class with the  fsspec/gcsfs filesystem has started 
>> >> producing a new error.
>> >
>> >
>> > Hi Kelton,
>> >
>> > One more question: you say that this started producing a new error, so I 
>> > suppose this worked a while ago? Do you know if you updated some packages 
>> > (eg gcsfs or fsspec) since then? Or something else that might have changed?
>> >
>> > Joris
>> >

Reply via email to