[ https://issues.apache.org/jira/browse/SPARK-28902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Junichi Koizumi updated SPARK-28902: -------------------------------------- Comment: was deleted (was: Since, versions aren't the main concern here should I create a PR ? ) > Spark ML Pipeline with nested Pipelines fails to load when saved from Python > ---------------------------------------------------------------------------- > > Key: SPARK-28902 > URL: https://issues.apache.org/jira/browse/SPARK-28902 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.4.3 > Reporter: Saif Addin > Priority: Minor > > Hi, this error is affecting a bunch of our nested use cases. > Saving a *PipelineModel* with one of its stages being another > *PipelineModel*, fails when loading it from Scala if it is saved in Python. > *Python side:* > > {code:java} > from pyspark.ml import Pipeline > from pyspark.ml.feature import Tokenizer > t = Tokenizer() > p = Pipeline().setStages([t]) > d = spark.createDataFrame([["Hello Peter Parker"]]) > pm = p.fit(d) > np = Pipeline().setStages([pm]) > npm = np.fit(d) > npm.write().save('./npm_test') > {code} > > > *Scala side:* > > {code:java} > scala> import org.apache.spark.ml.PipelineModel > scala> val pp = PipelineModel.load("./npm_test") > java.lang.IllegalArgumentException: requirement failed: Error loading > metadata: Expected class name org.apache.spark.ml.PipelineModel but found > class name pyspark.ml.pipeline.PipelineModel > at scala.Predef$.require(Predef.scala:224) > at > org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:638) > at > org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:616) > at org.apache.spark.ml.Pipeline$SharedReadWrite$.load(Pipeline.scala:267) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:348) > at > org.apache.spark.ml.PipelineModel$PipelineModelReader.load(Pipeline.scala:342) > at org.apache.spark.ml.util.MLReadable$class.load(ReadWrite.scala:380) > at org.apache.spark.ml.PipelineModel$.load(Pipeline.scala:332) > ... 50 elided > {code} > -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org