[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,32 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if isinstance(filesystem, FileSystem): +return filesystem + +# handle fsspec-compatible filesystems +try: +import fsspec +except ImportError: +pass +else: +if isinstance(filesystem, fsspec.AbstractFileSystem): +if type(filesystem).__name__ == 'LocalFileSystem': +# In case its a simple LocalFileSystem, use native arrow one +return LocalFileSystem() +return PyFileSystem(FSSpecHandler(filesystem)) + +# map old filesystems to new ones +from pyarrow.filesystem import LocalFileSystem as LegacyLocalFileSystem + +if isinstance(filesystem, LegacyLocalFileSystem): +return LocalFileSystem() +# TODO handle HDFS? Review comment: Do you have any idea about this one? The "legacy" one defined in `io-hdfs.pxi` is backed by the C++ `arrow::io::HadoopFileSystem`, and the new filesystem is backed by `arrow::fs::HadoopFileSystem` which itself holds a `arrow::io::HadoopFileSystem` as "client". So naively I would say (with some additional methods), one should be able to make a new HadoopFileSystem by wrapping the C++ "client" of the legacy filesystem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,32 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if isinstance(filesystem, FileSystem): +return filesystem + +# handle fsspec-compatible filesystems +try: +import fsspec +except ImportError: +pass +else: +if isinstance(filesystem, fsspec.AbstractFileSystem): +if type(filesystem).__name__ == 'LocalFileSystem': +# In case its a simple LocalFileSystem, use native arrow one +return LocalFileSystem() +return PyFileSystem(FSSpecHandler(filesystem)) + +# map old filesystems to new ones +from pyarrow.filesystem import LocalFileSystem as LegacyLocalFileSystem + +if isinstance(filesystem, LegacyLocalFileSystem): +return LocalFileSystem() +# TODO handle HDFS? Review comment: @pitrou Do you have any idea about this one? The "legacy" one defined in `io-hdfs.pxi` is backed by the C++ `arrow::io::HadoopFileSystem`, and the new filesystem is backed by `arrow::fs::HadoopFileSystem` which itself holds a `arrow::io::HadoopFileSystem` as "client". So naively I would say (with some additional methods), one should be able to make a new HadoopFileSystem by wrapping the C++ "client" of the legacy filesystem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452154974 ## File path: python/pyarrow/fs.py ## @@ -63,6 +63,31 @@ def __getattr__(name): ) +def _ensure_filesystem(filesystem): +if isinstance(filesystem, FileSystem): +return filesystem + +# handle fsspec-compatible filesystems +try: +import fsspec +if isinstance(filesystem, fsspec.AbstractFileSystem): +if type(filesystem).__name__ == 'LocalFileSystem': +# In case its a simple LocalFileSystem, use native arrow one +return LocalFileSystem() +return PyFileSystem(FSSpecHandler(filesystem)) +except ImportError: +pass Review comment: Ah, yes, that's better This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API
jorisvandenbossche commented on a change in pull request #7688: URL: https://github.com/apache/arrow/pull/7688#discussion_r452154905 ## File path: python/pyarrow/dataset.py ## @@ -239,15 +240,18 @@ def _ensure_filesystem(fs_or_uri): ) filesystem = SubTreeFileSystem(prefix, filesystem) return filesystem, is_local -elif isinstance(fs_or_uri, (LocalFileSystem, _MockFileSystem)): -return fs_or_uri, True -elif isinstance(fs_or_uri, FileSystem): -return fs_or_uri, False -else: + +try: +filesystem = _ensure_filesystem(fs_or_uri) Review comment: Renamed the one in dataset.py This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org