dongjoon-hyun edited a comment on pull request #28897:
URL: https://github.com/apache/spark/pull/28897#issuecomment-648007941


   For PySpark, if we need to keep in `Hadoop 2.7` distribution. That's very 
easy. I can remove the following one-line change from this PR. Since PyPi 
uploading is a manual process, we can keep PySpark with Hadoop 2.7 in PyPi.
   ```
   - BINARY_PKGS_EXTRA["hadoop2.7"]="withpip,withr"
   + BINARY_PKGS_EXTRA["hadoop3.2"]="withpip,withr"
   ```
   
   I didn't see specific complains about the followings. Instead, I've seen 
many complains about Hadoop 2.7.4 dependency for a long time.
   > Will the PySpark users hit the migration issue if they upgrade from Spark 
3.0 to 3.1 due to this PR?
   
   I'm wondering if I miss any things in the mailing thread. It would be great 
if you can answer my question, too. Do you have a specific issue? Could you 
share it with the community? If possible, on the dev mailing list? Then, we can 
try to fix it together in order to move forward.
   > We need to answer this before doing any further action.
   
   In short, let's focus on `non-PyPi` scope because I provide a workaround 
with `BINARY_PKGS_EXTRA`. Do we need to stick on Hadoop 2.7 in Apache Spark 
3.1.0 when we have both Hadoop 2.7 and Hadoop 3.2 distribution and PySpark can 
be be the same with Spark 3.0.0?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to