Thanks, Micah. I was about to build from the `apache-arrow-7.0.0` tag
and flip ARROW_GCS=ON, but if there's additional logic that your PR
has to make things actually functional, I'll switch to testing your
PR. :-)

-dv


On Wed, Apr 13, 2022 at 12:52 PM Micah Kornfield <[email protected]> wrote:
>
> GCS wasn't hooked up properly in general in 7.0 but having it off by default 
> would prevent this.
>
> https://github.com/apache/arrow/pull/12763 should fix this and align the GCS 
> filesystem closer to how others work in Arrow.
>
> On Wed, Apr 13, 2022 at 9:05 AM Dave Voutila <[email protected]> wrote:
>>
>> Hi user@,
>>
>> I'm working on a project using the 7.0.0 Java arrow-dataset library
>> and noticing that if I try to create a DataSet from a uri pointing to
>> a file in Google Cloud Storage (e.g. "gs://my-bucket/my-file.parquet")
>> I'm getting the following abbreviated stacktrace (with my
>> bucket/object names redacted):
>>
>> java.lang.RuntimeException: Unrecognized filesystem type in URI:
>> gs://<bucketname>/<filename>.parquet
>> at 
>> org.apache.arrow.dataset.file.JniWrapper.makeFileSystemDatasetFactory(Native
>> Method)
>> at 
>> org.apache.arrow.dataset.file.FileSystemDatasetFactory.createNative(FileSystemDatasetFactory.java:35)
>> at 
>> org.apache.arrow.dataset.file.FileSystemDatasetFactory.<init>(FileSystemDatasetFactory.java:31)
>> ...
>>
>> Looking at the Arrow cpp source, it seems the most likely culprit is
>> the included libarrow_dataset_jni shared library in the
>> arrow-dataset-7.0.0.jar was built without GCS support.
>>
>> Is this a mistake or a known issue? Anyone know?
>>
>> Thanks,
>> Dave Voutila

Reply via email to