Re: [PR] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

via GitHub Tue, 29 Apr 2025 04:45:16 -0700


LuciferYang commented on PR #50665:
URL: https://github.com/apache/spark/pull/50665#issuecomment-2838503692


   It appears that after merging this pr, it caused test failures for 
`org.apache.spark.sql.connect.ml.MLSuite` in the connect module.
   
   - https://github.com/apache/spark/actions/runs/14728199705/job/41335911530
   
   Here's how I conducted the local inspection:
   
   
   ```
   git reset --hard 6f9bf73c345d70c3d27ea2e1ebadaa03a275fb3c // this one 
   build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"
   ```
   
   ```
   [info] - LogisticRegression works *** FAILED *** (8 seconds, 2 milliseconds)
   [info]   org.apache.spark.SparkRuntimeException: 
[EXPRESSION_DECODING_FAILED] Failed to decode a row to a value of the 
expressions: newInstance(class 
org.apache.spark.ml.classification.LogisticRegressionModel$Data). SQLSTATE: 
42846
   [info]   at 
org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1364)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:95)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
   [info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
   [info]   at 
org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
   [info]   at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
   [info]   at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
   [info]   at 
org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
   [info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
   [info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
   [info]   at 
org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
   [info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
   [info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
   ...
   [info]   Cause: java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, 
Column 8: Failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 35, Column 8: Private member cannot be accessed 
from type 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
   [info]   at 
com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:604)
   [info]   at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:559)
   [info]   at 
com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:114)
   [info]   at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:247)
   [info]   at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2349)
   [info]   at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317)
   [info]   at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
   [info]   at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
   [info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
   [info]   at 
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
   [info]   at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
   [info]   at 
org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
   [info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
   [info]   at 
org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
   [info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
   [info]   at 
org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
   [info]   at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
   [info]   at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
   [info]   at 
org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
   [info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
   [info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
   [info]   at 
org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
   [info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
   [info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
   [info]   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [info]   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   [info]   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
   [info]   at 
org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
   [info]   at 
org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
   [info]   at 
org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
   [info]   at 
org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
   ...
   [info]   Cause: org.codehaus.commons.compiler.CompileException: File 
'generated.java', Line 35, Column 8: Failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, 
Column 8: Private member cannot be accessed from type 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
   [info]   at 
org.apache.spark.sql.errors.QueryExecutionErrors$.compilerError(QueryExecutionErrors.scala:688)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.doCompile(CodeGenerator.scala:1557)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.$anonfun$cache$1(CodeGenerator.scala:1636)
   [info]   at 
org.apache.spark.util.NonFateSharingCache$$anon$1.load(NonFateSharingCache.scala:68)
   [info]   at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574)
   [info]   at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
   [info]   at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
   [info]   at 
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
   [info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
   [info]   at 
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
   [info]   at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
   [info]   at 
org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
   [info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
   [info]   at 
org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
   [info]   at 
org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
   [info]   at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
   [info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
   [info]   at 
org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
   [info]   at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
   [info]   at 
org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
   [info]   at 
org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
   [info]   at 
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
   [info]   at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
   [info]   at 
org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
   [info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
   [info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
   [info]   at 
org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
   [info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
   [info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
   [info]   at 
org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
   [info]   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [info]   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
   [info]   at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
   [info]   at 
org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
   [info]   at 
org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
   [info]   at 
org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
   [info]   at 
org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
   [info]   at 
org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
   ...
   ```
   
   ```
   git reset --hard 86bf4c84805e89354d139ab72b298d3d4155fd0d // before this one 
   build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"
   ```
   
   ```
   [info] MLSuite:
   [info] - reconcileParam (141 milliseconds)
   [info] - LogisticRegression works (5 seconds, 808 milliseconds)
   [info] - Exception: Unsupported ML operator (15 milliseconds)
   [info] - Exception: cannot retrieve object (246 milliseconds)
   [info] - access the attribute which is not in allowed list (205 milliseconds)
   [info] - Model must be registered into ServiceLoader when loading (1 
millisecond)
   [info] - RegressionEvaluator works (164 milliseconds)
   [info] - VectorAssembler works (178 milliseconds)
   [info] - Memory limitation of MLCache works (951 milliseconds)
   [info] Run completed in 10 seconds, 668 milliseconds.
   [info] Total number of tests run: 9
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   ```
   
   
   The reason why this issue wasn't detected during the GitHub Actions of this 
pr is that the changes in the `mllib` module do not trigger the tests for the 
`connect` module now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [spark]

Reply via email to