Hi there, The pyarrow dataset API is marked experimental so I'm curious if y'all have made any decisions on it for upcoming releases. Specifically, any thoughts on making the Scanner and things like FileSystemDataset part of the "public API" (i.e., putting declarations in the _dataset.pxd)? It would make it a lot easier for new data formats to be built on top of the Arrow platform. e.g., Lance supports efficient partial reads from s3 for limit/offset (via additional ScanOptions), but currently it's difficult to expose the scanner to the rest of Arrow. Instead we subclass Dataset and return a custom scanner we created. And our Dataset subclass *should* be a FileSystemDataset subclass, but FileSystemDataset is not "public API" etc. Happy to discuss additional details, for reference: github.com/eto-ai/lance
Thanks! Chang
