dongjoon-hyun edited a comment on pull request #28897: URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941
For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very easy. I can remove the following one-line change from this PR. Since PyPi uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi. ``` - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr" + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr" ``` I didn't see specific complains about the followings. Instead, I've seen many complains about Hadoop 2.7.4 dependency for a long time. > Will the PySpark users hit the migration issue if they upgrade from Spark 3.0 to 3.1 due to this PR? Did I miss any things in the mailing thread? It would be great if you answer my question, too. Do you have a specific issue? > We need to answer this before doing any further action. In short, let's focus on `non-PySpark` scope because I provide a workaround with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will be the same? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
