[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222196#comment-16222196 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on issue #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#issuecomment-339946849 @wesm my username on JIRA is `benjigoldberg` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221637#comment-16221637 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#issuecomment-339859299 @benjigoldberg thanks for your contribution. Could you let me know your JIRA id (or create an ID if you don't have one) so I can assign this issue to you on https://issues.apache.org/jira/browse/ARROW-1555? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220999#comment-16220999 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#issuecomment-339761765 Rebased and fixed flake8 warnings This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220597#comment-16220597 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r147173669 ## File path: python/pyarrow/filesystem.py ## @@ -135,6 +135,12 @@ def isfile(self, path): """ raise NotImplementedError +def isfilestore(self): Review comment: updated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219913#comment-16219913 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#issuecomment-339542169 Build looks ok, the failure is unrelated (I restarted the failing job anyway so we can get a green build) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219911#comment-16219911 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r147039732 ## File path: python/pyarrow/filesystem.py ## @@ -135,6 +135,12 @@ def isfile(self, path): """ raise NotImplementedError +def isfilestore(self): Review comment: Can you make this a private API (`_isfilestore`)? Unclear if normal users would need this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215670#comment-16215670 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146362868 ## File path: python/pyarrow/parquet.py ## @@ -985,7 +985,7 @@ def write_to_dataset(table, root_path, partition_cols=None, else: fs = _ensure_filesystem(filesystem) -if not fs.exists(root_path): +if not fs.exists(root_path) and fs.isfilestore(): Review comment: Yes, great point! Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215665#comment-16215665 ] ASF GitHub Bot commented on ARROW-1555: --- johnjiang commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146362289 ## File path: python/pyarrow/parquet.py ## @@ -985,7 +985,7 @@ def write_to_dataset(table, root_path, partition_cols=None, else: fs = _ensure_filesystem(filesystem) -if not fs.exists(root_path): +if not fs.exists(root_path) and fs.isfilestore(): Review comment: Would be better to switch the checks around ``` if fs.isfilestore() and not fs.exists(root_path) : ``` That way we don't do an unnecessary `exists()` check if it's not a file store.` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215465#comment-16215465 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146332399 ## File path: python/pyarrow/filesystem.py ## @@ -251,6 +251,10 @@ def isfile(self, path): def delete(self, path, recursive=False): return self.fs.rm(path, recursive=recursive) +@implements(FileSystem.exists) +def exists(self, path): +return os.path.exists(path) Review comment: Oh thats interesting @johnjiang I will test this with S3 as @wesm suggested and see if I can come up with a solution that is satisfactory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215413#comment-16215413 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146323847 ## File path: python/pyarrow/filesystem.py ## @@ -251,6 +251,10 @@ def isfile(self, path): def delete(self, path, recursive=False): return self.fs.rm(path, recursive=recursive) +@implements(FileSystem.exists) +def exists(self, path): +return os.path.exists(path) Review comment: Yeah, I did not implement exists because of this issue. This needs to be tested against S3 before being merged This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215401#comment-16215401 ] ASF GitHub Bot commented on ARROW-1555: --- johnjiang commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146322137 ## File path: python/pyarrow/filesystem.py ## @@ -251,6 +251,10 @@ def isfile(self, path): def delete(self, path, recursive=False): return self.fs.rm(path, recursive=recursive) +@implements(FileSystem.exists) +def exists(self, path): +return os.path.exists(path) Review comment: I wrote my own wrapper to get around this issue and I realised that pyarrow would try and create a directory if 'exists' returns False. This is an issue on s3 since there's no concept of directories. So what ends up happening is that there's a duplicate file with the same name as the directory. You almost need to remove the `exists()` check for s3 filesystem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215371#comment-16215371 ] ASF GitHub Bot commented on ARROW-1555: --- benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146318772 ## File path: python/pyarrow/filesystem.py ## @@ -251,6 +251,10 @@ def isfile(self, path): def delete(self, path, recursive=False): return self.fs.rm(path, recursive=recursive) +@implements(FileSystem.exists) +def exists(self, path): +return os.path.exists(path) Review comment: @wesm yes 臘♂️ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215362#comment-16215362 ] ASF GitHub Bot commented on ARROW-1555: --- wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement Dask exists function URL: https://github.com/apache/arrow/pull/1240#discussion_r146317602 ## File path: python/pyarrow/filesystem.py ## @@ -251,6 +251,10 @@ def isfile(self, path): def delete(self, path, recursive=False): return self.fs.rm(path, recursive=recursive) +@implements(FileSystem.exists) +def exists(self, path): +return os.path.exists(path) Review comment: Does this need to be `self.fs.exists(path)`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Labels: pull-request-available > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215275#comment-16215275 ] Benjamin Goldberg commented on ARROW-1555: -- I created a PR to resolve this issue: https://github.com/apache/arrow/pull/1240 > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172932#comment-16172932 ] Young-Jun Ko commented on ARROW-1555: - I think the simplest way to fix this would be to just expose the fs functions implemented by `s3fs`, `exists` being one of them. I suppose that's what Florian had in mind. Thanks guys for looking into this! > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)