JoshRosen commented on a change in pull request #34100:
URL: https://github.com/apache/spark/pull/34100#discussion_r715946127



##########
File path: pom.xml
##########
@@ -3273,7 +3273,7 @@
         <curator.version>2.7.1</curator.version>
         <commons-io.version>2.4</commons-io.version>
         <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
-        
<hadoop-client-runtime.artifact>hadoop-client</hadoop-client-runtime.artifact>
+        
<hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>

Review comment:
       Ahhh, this is a clever fix:
   
   Instead of the `hadoop-2.7` profile resulting in a duplicate direct 
dependency on `hadoop-client`, we now just declare an explicit dependency on 
one of `hadoop-client`'s transitive dependencies (`hadoop-yarn-api` in this 
case). Anything which depends on `hadoop-client-runtime.artifact` must also 
depend on `hadoop-client-api.artifact`, so this doesn't end up changing the set 
of dependencies pulled in.
   
   It looks like we didn't need to do that for 
`hadoop-client-minicluster.artifact` because that's only used in the 
`resource-managers/yarn` POM and that's already using Maven profiles to control 
the dependency selection (so the other workaround is fairly non-invasive in 
that context). In principle, though, I guess we could have changed that to some 
other transitive dep.
   
   ---
   
   Could you maybe add a one or two line comment above these Hadoop 2.7 lines 
to explain what's going on? And maybe edit the comment at 
https://github.com/apache/spark/blob/d73562ed3635bb3454ac67029ca6541b30ae0c02/pom.xml#L251-L255
 to reflect this change? This fix is clever but a little subtle, so I think a 
comment calling it out (and maybe mentioning SPARK-36835 might help future 
readers.
   
   **Edit:** could you also update the PR description to reflect this final 
fix? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to