steveloughran commented on a change in pull request #30714:
URL: https://github.com/apache/spark/pull/30714#discussion_r567444807
##########
File path: docs/cloud-integration.md
##########
@@ -49,7 +49,6 @@ They cannot be used as a direct replacement for a cluster
filesystem such as HDF
Key differences are:
-* Changes to stored objects may not be immediately visible, both in directory
listings and actual data access.
Review comment:
I worry about openstack swift. The original version was woefully
inconsistent. Not been near it long enough to know what's change. with swift I
could consistently replicate an inconsistency in a few lines
1. write a file of length, say, 1KB
2. overwrite with a shorter dataset, e.g. 512 bytes
3. open file
4. read from 0-128: get the new data
5. read from 768-1024: get the old data
No S3 implementation that I know of (i.e. the open source or commercial
ones) are inconsistent, nor have they ever had the 404 caching issue. But
people should still be aware of the risk.
oh, and I haven' t played with any of the chinese cloud stores for which
connectors now exist. So I can't make statements there. All I can say is the
"big three outside china" are consistent.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]