[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222196#comment-16222196
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on issue #1240: ARROW-1555 [Python] Implement Dask 
exists function
URL: https://github.com/apache/arrow/pull/1240#issuecomment-339946849
 
 
   @wesm my username on JIRA is `benjigoldberg`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221637#comment-16221637
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists 
function
URL: https://github.com/apache/arrow/pull/1240#issuecomment-339859299
 
 
   @benjigoldberg thanks for your contribution. Could you let me know your JIRA 
id (or create an ID if you don't have one) so I can assign this issue to you on 
https://issues.apache.org/jira/browse/ARROW-1555?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220999#comment-16220999
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists 
function
URL: https://github.com/apache/arrow/pull/1240#issuecomment-339761765
 
 
   Rebased and fixed flake8 warnings


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220597#comment-16220597
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r147173669
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -135,6 +135,12 @@ def isfile(self, path):
 """
 raise NotImplementedError
 
+def isfilestore(self):
 
 Review comment:
     updated


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219913#comment-16219913
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on issue #1240: ARROW-1555 [Python] Implement Dask exists 
function
URL: https://github.com/apache/arrow/pull/1240#issuecomment-339542169
 
 
   Build looks ok, the failure is unrelated (I restarted the failing job anyway 
so we can get a green build)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219911#comment-16219911
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement 
Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r147039732
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -135,6 +135,12 @@ def isfile(self, path):
 """
 raise NotImplementedError
 
+def isfilestore(self):
 
 Review comment:
   Can you make this a private API (`_isfilestore`)? Unclear if normal users 
would need this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215670#comment-16215670
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146362868
 
 

 ##
 File path: python/pyarrow/parquet.py
 ##
 @@ -985,7 +985,7 @@ def write_to_dataset(table, root_path, partition_cols=None,
 else:
 fs = _ensure_filesystem(filesystem)
 
-if not fs.exists(root_path):
+if not fs.exists(root_path) and fs.isfilestore():
 
 Review comment:
   Yes, great point! Updated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215665#comment-16215665
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

johnjiang commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146362289
 
 

 ##
 File path: python/pyarrow/parquet.py
 ##
 @@ -985,7 +985,7 @@ def write_to_dataset(table, root_path, partition_cols=None,
 else:
 fs = _ensure_filesystem(filesystem)
 
-if not fs.exists(root_path):
+if not fs.exists(root_path) and fs.isfilestore():
 
 Review comment:
   Would be better to switch the checks around
   
   ```
if fs.isfilestore() and not fs.exists(root_path) :
   ```
   
   That way we don't do an unnecessary `exists()` check if it's not a file 
store.`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215465#comment-16215465
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146332399
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -251,6 +251,10 @@ def isfile(self, path):
 def delete(self, path, recursive=False):
 return self.fs.rm(path, recursive=recursive)
 
+@implements(FileSystem.exists)
+def exists(self, path):
+return os.path.exists(path)
 
 Review comment:
   Oh thats interesting @johnjiang 
   
   I will test this with S3 as @wesm suggested and see if I can come up with a 
solution that is satisfactory.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215413#comment-16215413
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement 
Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146323847
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -251,6 +251,10 @@ def isfile(self, path):
 def delete(self, path, recursive=False):
 return self.fs.rm(path, recursive=recursive)
 
+@implements(FileSystem.exists)
+def exists(self, path):
+return os.path.exists(path)
 
 Review comment:
   Yeah, I did not implement exists because of this issue. This needs to be 
tested against S3 before being merged


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215401#comment-16215401
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

johnjiang commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146322137
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -251,6 +251,10 @@ def isfile(self, path):
 def delete(self, path, recursive=False):
 return self.fs.rm(path, recursive=recursive)
 
+@implements(FileSystem.exists)
+def exists(self, path):
+return os.path.exists(path)
 
 Review comment:
   I wrote my own wrapper to get around this issue and I realised that pyarrow 
would try and create a directory if 'exists' returns False. This is an issue on 
s3 since there's no concept of directories. So what ends up happening is that 
there's a duplicate file with the same name as the directory.
   You almost need to remove the `exists()` check for s3 filesystem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215371#comment-16215371
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

benjigoldberg commented on a change in pull request #1240: ARROW-1555 [Python] 
Implement Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146318772
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -251,6 +251,10 @@ def isfile(self, path):
 def delete(self, path, recursive=False):
 return self.fs.rm(path, recursive=recursive)
 
+@implements(FileSystem.exists)
+def exists(self, path):
+return os.path.exists(path)
 
 Review comment:
   @wesm yes 臘‍♂️ 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215362#comment-16215362
 ] 

ASF GitHub Bot commented on ARROW-1555:
---

wesm commented on a change in pull request #1240: ARROW-1555 [Python] Implement 
Dask exists function
URL: https://github.com/apache/arrow/pull/1240#discussion_r146317602
 
 

 ##
 File path: python/pyarrow/filesystem.py
 ##
 @@ -251,6 +251,10 @@ def isfile(self, path):
 def delete(self, path, recursive=False):
 return self.fs.rm(path, recursive=recursive)
 
+@implements(FileSystem.exists)
+def exists(self, path):
+return os.path.exists(path)
 
 Review comment:
   Does this need to be `self.fs.exists(path)`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-10-23 Thread Benjamin Goldberg (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215275#comment-16215275
 ] 

Benjamin Goldberg commented on ARROW-1555:
--

I created a PR to resolve this issue:
https://github.com/apache/arrow/pull/1240

> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3

2017-09-20 Thread Young-Jun Ko (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172932#comment-16172932
 ] 

Young-Jun Ko commented on ARROW-1555:
-

I think the simplest way to fix this would be to just expose the fs functions 
implemented by `s3fs`, `exists` being one of them. I suppose that's what 
Florian had in mind.

Thanks guys for looking into this!


> [Python] write_to_dataset on s3
> ---
>
> Key: ARROW-1555
> URL: https://issues.apache.org/jira/browse/ARROW-1555
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Young-Jun Ko
>Assignee: Florian Jetter
>Priority: Trivial
> Fix For: 0.8.0
>
>
> When writing a arrow table to s3, I get an NotImplemented Exception.
> The root cause is in _ensure_filesystem and can be reproduced as follows:
> import pyarrow
> import pyarrow.parquet as pqa
> import s3fs
> s3 = s3fs.S3FileSystem()
> pqa._ensure_filesystem(s3).exists("anything")
> It appears that the S3FSWrapper that is instantiated in _ensure_filesystem 
> does not expose the exist method of s3.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)