Yikun commented on a change in pull request #35977:
URL: https://github.com/apache/spark/pull/35977#discussion_r836004084



##########
File path: python/pyspark/ml/image.py
##########
@@ -28,7 +28,7 @@
 from typing import Any, Dict, List, NoReturn, Optional, cast
 
 import numpy as np
-from distutils.version import LooseVersion
+from packaging.version import Version

Review comment:
       > Is this 3rd party library?
   
   https://pypi.org/project/packaging/ , yes, it is. And in the standard 
library, seems not have a way to handle this since python 3.12 in future.
   
   > Adding a new dep is problematic
   
   Looks like we might have below way to solve this:
   - Maintaince the code like `distutils`/`packaging` version in PySpark, just 
like 
[cloudpickle](https://github.com/apache/spark/tree/master/python/pyspark/cloudpickle),
 or at least a simple version implementations.
   - Introduce the packaging as 3rd party lib, add deps in setup and 
[docs](https://spark.apache.org/docs/latest/api/python/getting_started/install.html#dependencies).
   
   BTW, I just think that we always matainence all 3rd deps in pyspark isn't an 
ideal way, have we considered the extra third-party dependent installations as 
a required/option step in the installation of pyspark (especially via 
downloading distribution)? such as require users install before pyspark startup 
or install deps in 
[`bin/pyspark`](https://github.com/apache/spark/blob/master/bin/pyspark) 
automatically (
   This may require additional network access).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to