[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

GitBox Tue, 23 Jun 2020 01:59:03 -0700


dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941



   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very 
easy. I can remove the following one-line change from this PR. Since PyPi 
uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen 
many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 
3.0 to 3.1 due to this PR?
   
   Did I miss any things in the mailing thread? It would be great if you answer 
my question, too. Do you have a specific issue?
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PySpark` scope because I provide a workaround 
with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 
3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark will 
be the same?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28897: [SPARK-32058][BUILD] Use Apache Hadoop 3.2.0 dependency by default

Reply via email to