Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

via GitHub Mon, 25 Nov 2024 23:41:21 -0800


grundprinzip commented on PR #48791:
URL: https://github.com/apache/spark/pull/48791#issuecomment-2499883456


   Hi @wbo4958 ,
   
   I just applied the following patch on your PR:
   
   ```
   diff --git 
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator 
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
   index f3fd21ad4c3..fe690cb8eeb 100644
   --- 
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
   +++ 
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
   @@ -17,4 +17,13 @@
    
    # Spark Connect ML uses ServiceLoader to find out the supported Spark Ml 
estimators.
    # So register the supported estimator here if you're trying to add a new 
one.
   -org.apache.spark.ml.classification.LogisticRegression
   \ No newline at end of file
   +org.apache.spark.ml.classification.LogisticRegression
   +org.apache.spark.ml.feature.QuantileDiscretizer
   +org.apache.spark.ml.feature.StringIndexer
   +org.apache.spark.ml.feature.OneHotEncoder
   +org.apache.spark.ml.feature.PCA
   +org.apache.spark.ml.feature.StandardScaler
   +org.apache.spark.ml.feature.MaxAbsScaler
   +org.apache.spark.ml.feature.MinMaxScaler
   +org.apache.spark.ml.feature.VectorIndexer
   +org.apache.spark.ml.feature.RobustScaler
   \ No newline at end of file
   diff --git 
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer 
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
   index 24b1133cb5b..da5db5834e7 100644
   --- 
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
   +++ 
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
   @@ -17,4 +17,17 @@
    
    # Spark Connect ML uses ServiceLoader to find out the supported Spark Ml 
non-model transformer.
    # So register the supported transformer here if you're trying to add a new 
one.
   -org.apache.spark.ml.feature.VectorAssembler
   \ No newline at end of file
   +org.apache.spark.ml.feature.VectorAssembler
   +org.apache.spark.ml.feature.Bucketizer
   +org.apache.spark.ml.feature.Tokenizer
   +org.apache.spark.ml.feature.RegexTokenizer
   +org.apache.spark.ml.feature.StopWordsRemover
   +org.apache.spark.ml.feature.NGram
   +org.apache.spark.ml.feature.Binarizer
   +org.apache.spark.ml.feature.Normalizer
   +org.apache.spark.ml.feature.PolynomialExpansion
   +org.apache.spark.ml.feature.DCT
   +org.apache.spark.ml.feature.Interaction
   +org.apache.spark.ml.feature.ElementwiseProduct
   +org.apache.spark.ml.feature.SQLTransformer
   +org.apache.spark.ml.feature.VectorSizeHint
   ```
   
   To enable all of the feature engineering algorithms and it all worked out of 
the box modulo very few:
   
   * StopWordRemove tries to detect the locale based on a Java String which 
obviously doesn't work
   * StringIndexer fails with:
     `pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.lang.NullPointerException) Cannot invoke 
"org.apache.spark.ml.Model.copy(org.apache.spark.ml.param.ParamMap)" because 
the return value of 
"org.apache.spark.sql.connect.ml.ModelAttributeHelper.model()" is null`
   * Normlizer tries to copy a java object ` File 
"/Users/martin.grund/Development/spark/python/pyspark/ml/wrapper.py", line 357, 
in copy
       that._java_obj = self._java_obj.copy(self._empty_java_param_map())`
   * QunatileDiscretizer tries to call another Java method -> `  File 
"/Users/martin.grund/Development/spark/python/pyspark/ml/feature.py", line 
3756, in _create_model
       splits=list(java_model.getSplits()),`
   
   
   Otherwise I could run all of the examples from the Spark Homepage! Awesome!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49907][ML][CONNECT] Support spark.ml on Connect [spark]

Reply via email to