GCS wasn't hooked up properly in general in 7.0 but having it off by
default would prevent this.

https://github.com/apache/arrow/pull/12763 should fix this and align the
GCS filesystem closer to how others work in Arrow.

On Wed, Apr 13, 2022 at 9:05 AM Dave Voutila <[email protected]> wrote:

> Hi user@,
>
> I'm working on a project using the 7.0.0 Java arrow-dataset library
> and noticing that if I try to create a DataSet from a uri pointing to
> a file in Google Cloud Storage (e.g. "gs://my-bucket/my-file.parquet")
> I'm getting the following abbreviated stacktrace (with my
> bucket/object names redacted):
>
> java.lang.RuntimeException: Unrecognized filesystem type in URI:
> gs://<bucketname>/<filename>.parquet
> at
> org.apache.arrow.dataset.file.JniWrapper.makeFileSystemDatasetFactory(Native
> Method)
> at
> org.apache.arrow.dataset.file.FileSystemDatasetFactory.createNative(FileSystemDatasetFactory.java:35)
> at
> org.apache.arrow.dataset.file.FileSystemDatasetFactory.<init>(FileSystemDatasetFactory.java:31)
> ...
>
> Looking at the Arrow cpp source, it seems the most likely culprit is
> the included libarrow_dataset_jni shared library in the
> arrow-dataset-7.0.0.jar was built without GCS support.
>
> Is this a mistake or a known issue? Anyone know?
>
> Thanks,
> Dave Voutila
>

Reply via email to