grundprinzip commented on PR #48791:
URL: https://github.com/apache/spark/pull/48791#issuecomment-2499883456
Hi @wbo4958 ,
I just applied the following patch on your PR:
```
diff --git
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
index f3fd21ad4c3..fe690cb8eeb 100644
---
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
+++
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Estimator
@@ -17,4 +17,13 @@
# Spark Connect ML uses ServiceLoader to find out the supported Spark Ml
estimators.
# So register the supported estimator here if you're trying to add a new
one.
-org.apache.spark.ml.classification.LogisticRegression
\ No newline at end of file
+org.apache.spark.ml.classification.LogisticRegression
+org.apache.spark.ml.feature.QuantileDiscretizer
+org.apache.spark.ml.feature.StringIndexer
+org.apache.spark.ml.feature.OneHotEncoder
+org.apache.spark.ml.feature.PCA
+org.apache.spark.ml.feature.StandardScaler
+org.apache.spark.ml.feature.MaxAbsScaler
+org.apache.spark.ml.feature.MinMaxScaler
+org.apache.spark.ml.feature.VectorIndexer
+org.apache.spark.ml.feature.RobustScaler
\ No newline at end of file
diff --git
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
index 24b1133cb5b..da5db5834e7 100644
---
a/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
+++
b/mllib/src/main/resources/META-INF/services/org.apache.spark.ml.Transformer
@@ -17,4 +17,17 @@
# Spark Connect ML uses ServiceLoader to find out the supported Spark Ml
non-model transformer.
# So register the supported transformer here if you're trying to add a new
one.
-org.apache.spark.ml.feature.VectorAssembler
\ No newline at end of file
+org.apache.spark.ml.feature.VectorAssembler
+org.apache.spark.ml.feature.Bucketizer
+org.apache.spark.ml.feature.Tokenizer
+org.apache.spark.ml.feature.RegexTokenizer
+org.apache.spark.ml.feature.StopWordsRemover
+org.apache.spark.ml.feature.NGram
+org.apache.spark.ml.feature.Binarizer
+org.apache.spark.ml.feature.Normalizer
+org.apache.spark.ml.feature.PolynomialExpansion
+org.apache.spark.ml.feature.DCT
+org.apache.spark.ml.feature.Interaction
+org.apache.spark.ml.feature.ElementwiseProduct
+org.apache.spark.ml.feature.SQLTransformer
+org.apache.spark.ml.feature.VectorSizeHint
```
To enable all of the feature engineering algorithms and it all worked out of
the box modulo very few:
* StopWordRemove tries to detect the locale based on a Java String which
obviously doesn't work
* StringIndexer fails with:
`pyspark.errors.exceptions.connect.SparkConnectGrpcException:
(java.lang.NullPointerException) Cannot invoke
"org.apache.spark.ml.Model.copy(org.apache.spark.ml.param.ParamMap)" because
the return value of
"org.apache.spark.sql.connect.ml.ModelAttributeHelper.model()" is null`
* Normlizer tries to copy a java object ` File
"/Users/martin.grund/Development/spark/python/pyspark/ml/wrapper.py", line 357,
in copy
that._java_obj = self._java_obj.copy(self._empty_java_param_map())`
* QunatileDiscretizer tries to call another Java method -> ` File
"/Users/martin.grund/Development/spark/python/pyspark/ml/feature.py", line
3756, in _create_model
splits=list(java_model.getSplits()),`
Otherwise I could run all of the examples from the Spark Homepage! Awesome!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]