Thanks, Micah. I was about to build from the `apache-arrow-7.0.0` tag and flip ARROW_GCS=ON, but if there's additional logic that your PR has to make things actually functional, I'll switch to testing your PR. :-)
-dv On Wed, Apr 13, 2022 at 12:52 PM Micah Kornfield <[email protected]> wrote: > > GCS wasn't hooked up properly in general in 7.0 but having it off by default > would prevent this. > > https://github.com/apache/arrow/pull/12763 should fix this and align the GCS > filesystem closer to how others work in Arrow. > > On Wed, Apr 13, 2022 at 9:05 AM Dave Voutila <[email protected]> wrote: >> >> Hi user@, >> >> I'm working on a project using the 7.0.0 Java arrow-dataset library >> and noticing that if I try to create a DataSet from a uri pointing to >> a file in Google Cloud Storage (e.g. "gs://my-bucket/my-file.parquet") >> I'm getting the following abbreviated stacktrace (with my >> bucket/object names redacted): >> >> java.lang.RuntimeException: Unrecognized filesystem type in URI: >> gs://<bucketname>/<filename>.parquet >> at >> org.apache.arrow.dataset.file.JniWrapper.makeFileSystemDatasetFactory(Native >> Method) >> at >> org.apache.arrow.dataset.file.FileSystemDatasetFactory.createNative(FileSystemDatasetFactory.java:35) >> at >> org.apache.arrow.dataset.file.FileSystemDatasetFactory.<init>(FileSystemDatasetFactory.java:31) >> ... >> >> Looking at the Arrow cpp source, it seems the most likely culprit is >> the included libarrow_dataset_jni shared library in the >> arrow-dataset-7.0.0.jar was built without GCS support. >> >> Is this a mistake or a known issue? Anyone know? >> >> Thanks, >> Dave Voutila
