steveloughran commented on a change in pull request #30714:
URL: https://github.com/apache/spark/pull/30714#discussion_r567444807



##########
File path: docs/cloud-integration.md
##########
@@ -49,7 +49,6 @@ They cannot be used as a direct replacement for a cluster 
filesystem such as HDF
 
 Key differences are:
 
-* Changes to stored objects may not be immediately visible, both in directory 
listings and actual data access.

Review comment:
       I worry about openstack swift. The original version was woefully 
inconsistent. Not been near it long enough to know what's change. with swift I 
could consistently replicate an inconsistency in a few lines
   
   1. write a file of length, say, 1KB
   2. overwrite with a shorter dataset, e.g. 512 bytes
   3. open file
   4. read from 0-128: get the new data
   5. read from 768-1024: get the old data
   
   No S3 implementation that I know of (i.e. the open source or commercial 
ones) are inconsistent, nor have they ever had the 404 caching issue. But 
people should still be aware of the risk.
   
   oh, and I haven' t played with any of the chinese cloud stores for which 
connectors now exist. So I can't make statements there. All I can say is the 
"big three outside china" are consistent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to