steveloughran commented on PR #34864: URL: https://github.com/apache/spark/pull/34864#issuecomment-2059907838
@michaelbilow hadoop s3a is on v2 sdk; the com.amazonaws classes are not on the CP and amazon are slowly stopping support. you cannot for example use the lower latency S3 express stores with it. Like I say: I think you would be better off using the Hue file system APIs to talk to s3. If there are aspects of s3 storage which aren't available through the API -or just very inefficiently due to the effort to preserve the Posix metaphor, then lets fix the API so that other stores can offer the same features, and other apps can pick up. For example, here's our ongoing delete API for iceberg and other manifest-based tables https://github.com/apache/hadoop/pull/6726 It maps to s3 bulk delete calls, but there's scope to add to other stores (we now actually want to add it as a page-size == 1 option for all filesystems as it simplifies iceberg integration). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
