Github user steveloughran commented on the issue:
https://github.com/apache/spark/pull/12004
@nchammas the AWS SDK you get will be in sync with hadoop-aws; you have to
keep them in sync.
what is more brittle is the transients: httpclient, joda time, jackson,
etc, which is what recent patches in hadoop-aws have been trying to lock down
for hadoop consistency. Which doesn't help spark, as its choices are different.
Hence the explicit declaration of the version of things in the aws module;
force exclusion of the aws dependencies because the spark ones are declared
closer to the root of the tree. Oddly enough, some of the hadoop explicit
version declarations make things worse, as they raise up the declaration of
some artifacts higher, and with maven's closest-version-wins policy, that
breaks other things. Fault there is mvn conflict resolution policy of closeness
over newness, for better or worse.
A particular issue is {{jackson-dataformat-cbor}}, which is a jackson
artifact not used/declared by the rest of spark. Because it's not used
elsewhere, there's no eviction of the one coming from the aws sdk, so packaging
works, but link fails at runtime time. This patch declares the JAR, using
Spark's jackson version to fix it in place. Without this, you will see stack
traces against some versions of hadoop-aws/aws-sdk-s3
Those problems aren't being checked in this module, grab
https://github.com/steveloughran/spark-cloud-examples/tree/master for that.
Actually, joda-time is only correctly picked up if you grab spark-hive.
I've added a declaration of it here so that if someone pulls in spark-cloud
without spark-hive they don't get auth errors against s3 caused by
misformatting of timestamps on http requests. Dependency management is an
enternal conflict
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]