GCS wasn't hooked up properly in general in 7.0 but having it off by default would prevent this.
https://github.com/apache/arrow/pull/12763 should fix this and align the GCS filesystem closer to how others work in Arrow. On Wed, Apr 13, 2022 at 9:05 AM Dave Voutila <[email protected]> wrote: > Hi user@, > > I'm working on a project using the 7.0.0 Java arrow-dataset library > and noticing that if I try to create a DataSet from a uri pointing to > a file in Google Cloud Storage (e.g. "gs://my-bucket/my-file.parquet") > I'm getting the following abbreviated stacktrace (with my > bucket/object names redacted): > > java.lang.RuntimeException: Unrecognized filesystem type in URI: > gs://<bucketname>/<filename>.parquet > at > org.apache.arrow.dataset.file.JniWrapper.makeFileSystemDatasetFactory(Native > Method) > at > org.apache.arrow.dataset.file.FileSystemDatasetFactory.createNative(FileSystemDatasetFactory.java:35) > at > org.apache.arrow.dataset.file.FileSystemDatasetFactory.<init>(FileSystemDatasetFactory.java:31) > ... > > Looking at the Arrow cpp source, it seems the most likely culprit is > the included libarrow_dataset_jni shared library in the > arrow-dataset-7.0.0.jar was built without GCS support. > > Is this a mistake or a known issue? Anyone know? > > Thanks, > Dave Voutila >
