[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox


jorisvandenbossche commented on a change in pull request #7688:
URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369



##
File path: python/pyarrow/fs.py
##
@@ -63,6 +63,32 @@ def __getattr__(name):
 )
 
 
+def _ensure_filesystem(filesystem):
+if isinstance(filesystem, FileSystem):
+return filesystem
+
+# handle fsspec-compatible filesystems
+try:
+import fsspec
+except ImportError:
+pass
+else:
+if isinstance(filesystem, fsspec.AbstractFileSystem):
+if type(filesystem).__name__ == 'LocalFileSystem':
+# In case its a simple LocalFileSystem, use native arrow one
+return LocalFileSystem()
+return PyFileSystem(FSSpecHandler(filesystem))
+
+# map old filesystems to new ones
+from pyarrow.filesystem import LocalFileSystem as LegacyLocalFileSystem
+
+if isinstance(filesystem, LegacyLocalFileSystem):
+return LocalFileSystem()
+# TODO handle HDFS?

Review comment:
   Do you have any idea about this one? The "legacy" one defined in 
`io-hdfs.pxi` is backed by the C++ `arrow::io::HadoopFileSystem`, and the new 
filesystem is backed by `arrow::fs::HadoopFileSystem` which itself holds a 
`arrow::io::HadoopFileSystem` as "client". 
   So naively I would say (with some additional methods), one should be able to 
make a new HadoopFileSystem by wrapping the C++ "client" of the legacy 
filesystem? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox


jorisvandenbossche commented on a change in pull request #7688:
URL: https://github.com/apache/arrow/pull/7688#discussion_r452158369



##
File path: python/pyarrow/fs.py
##
@@ -63,6 +63,32 @@ def __getattr__(name):
 )
 
 
+def _ensure_filesystem(filesystem):
+if isinstance(filesystem, FileSystem):
+return filesystem
+
+# handle fsspec-compatible filesystems
+try:
+import fsspec
+except ImportError:
+pass
+else:
+if isinstance(filesystem, fsspec.AbstractFileSystem):
+if type(filesystem).__name__ == 'LocalFileSystem':
+# In case its a simple LocalFileSystem, use native arrow one
+return LocalFileSystem()
+return PyFileSystem(FSSpecHandler(filesystem))
+
+# map old filesystems to new ones
+from pyarrow.filesystem import LocalFileSystem as LegacyLocalFileSystem
+
+if isinstance(filesystem, LegacyLocalFileSystem):
+return LocalFileSystem()
+# TODO handle HDFS?

Review comment:
   @pitrou Do you have any idea about this one? The "legacy" one defined in 
`io-hdfs.pxi` is backed by the C++ `arrow::io::HadoopFileSystem`, and the new 
filesystem is backed by `arrow::fs::HadoopFileSystem` which itself holds a 
`arrow::io::HadoopFileSystem` as "client". 
   So naively I would say (with some additional methods), one should be able to 
make a new HadoopFileSystem by wrapping the C++ "client" of the legacy 
filesystem? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox


jorisvandenbossche commented on a change in pull request #7688:
URL: https://github.com/apache/arrow/pull/7688#discussion_r452154974



##
File path: python/pyarrow/fs.py
##
@@ -63,6 +63,31 @@ def __getattr__(name):
 )
 
 
+def _ensure_filesystem(filesystem):
+if isinstance(filesystem, FileSystem):
+return filesystem
+
+# handle fsspec-compatible filesystems
+try:
+import fsspec
+if isinstance(filesystem, fsspec.AbstractFileSystem):
+if type(filesystem).__name__ == 'LocalFileSystem':
+# In case its a simple LocalFileSystem, use native arrow one
+return LocalFileSystem()
+return PyFileSystem(FSSpecHandler(filesystem))
+except ImportError:
+pass

Review comment:
   Ah, yes, that's better





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7688: ARROW-9383: [Python] Support fsspec filesystems in Dataset API

2020-07-09 Thread GitBox


jorisvandenbossche commented on a change in pull request #7688:
URL: https://github.com/apache/arrow/pull/7688#discussion_r452154905



##
File path: python/pyarrow/dataset.py
##
@@ -239,15 +240,18 @@ def _ensure_filesystem(fs_or_uri):
 )
 filesystem = SubTreeFileSystem(prefix, filesystem)
 return filesystem, is_local
-elif isinstance(fs_or_uri, (LocalFileSystem, _MockFileSystem)):
-return fs_or_uri, True
-elif isinstance(fs_or_uri, FileSystem):
-return fs_or_uri, False
-else:
+
+try:
+filesystem = _ensure_filesystem(fs_or_uri)

Review comment:
   Renamed the one in dataset.py





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org