[jira] [Commented] (SPARK-23734) InvalidSchemaException While Saving ALSModel
[ https://issues.apache.org/jira/browse/SPARK-23734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713403#comment-16713403 ] Stanley Poon commented on SPARK-23734: -- Just confirmed the problem is fixed in Spark 2.3.1. The test environment uses Scala 2.11.11. And there are no other dependency. I will close the case. > InvalidSchemaException While Saving ALSModel > > > Key: SPARK-23734 > URL: https://issues.apache.org/jira/browse/SPARK-23734 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0 > Environment: macOS 10.13.2 > Scala 2.11.8 > Spark 2.3.0 v2.3.0-rc5 (Feb 22 2018) >Reporter: Stanley Poon >Priority: Major > Labels: ALS, parquet, persistence > > After fitting an ALSModel, get following error while saving the model: > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > Exactly the same code ran ok on 2.2.1. > Same issue also occurs on other ALSModels we have. > h2. *To reproduce* > Get ALSExample: > [https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala] > and add the following line to save the model right before "spark.stop". > {quote} model.write.overwrite().save("SparkExampleALSModel") > {quote} > h2. Stack Trace > Exception in thread "main" java.lang.ExceptionInInitializerError > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:140) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) > at > org.apache.spark.ml.recommendation.ALSModel$ALSModelWriter.saveImpl(ALS.scala:510) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) > at com.vitalmove.model.ALSExample$.main(ALSExample.scala:83) > at com.vitalmove.model.ALSExample.main(ALSExample.scala) > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > at org.apache.parquet.schema.GroupType.(GroupType.java:92) > at org.apache.parquet.schema.GroupType.(GroupType.java:48) > at org.apache.parquet.schema.MessageType.(MessageType.java:50) > at org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala:567) > at >
[jira] [Commented] (SPARK-23734) InvalidSchemaException While Saving ALSModel
[ https://issues.apache.org/jira/browse/SPARK-23734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434219#comment-16434219 ] Stanley Poon commented on SPARK-23734: -- [~viirya] Thank you for checking into this. I added the Spark release details where this is reproducible. And will verify that it is fixed in the next release. > InvalidSchemaException While Saving ALSModel > > > Key: SPARK-23734 > URL: https://issues.apache.org/jira/browse/SPARK-23734 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0 > Environment: macOS 10.13.2 > Scala 2.11.8 > Spark 2.3.0 v2.3.0-rc5 >Reporter: Stanley Poon >Priority: Major > Labels: ALS, parquet, persistence > > After fitting an ALSModel, get following error while saving the model: > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > Exactly the same code ran ok on 2.2.1. > Same issue also occurs on other ALSModels we have. > h2. *To reproduce* > Get ALSExample: > [https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala] > and add the following line to save the model right before "spark.stop". > {quote} model.write.overwrite().save("SparkExampleALSModel") > {quote} > h2. Stack Trace > Exception in thread "main" java.lang.ExceptionInInitializerError > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:140) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) > at > org.apache.spark.ml.recommendation.ALSModel$ALSModelWriter.saveImpl(ALS.scala:510) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) > at com.vitalmove.model.ALSExample$.main(ALSExample.scala:83) > at com.vitalmove.model.ALSExample.main(ALSExample.scala) > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > at org.apache.parquet.schema.GroupType.(GroupType.java:92) > at org.apache.parquet.schema.GroupType.(GroupType.java:48) > at org.apache.parquet.schema.MessageType.(MessageType.java:50) > at org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala:567) > at >
[jira] [Commented] (SPARK-23734) InvalidSchemaException While Saving ALSModel
[ https://issues.apache.org/jira/browse/SPARK-23734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410914#comment-16410914 ] Liang-Chi Hsieh commented on SPARK-23734: - I use the latest master branch and can't reproduce the reported issue. > InvalidSchemaException While Saving ALSModel > > > Key: SPARK-23734 > URL: https://issues.apache.org/jira/browse/SPARK-23734 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.0 > Environment: macOS 10.13.2 > Scala 2.11.8 > Spark 2.3.0 >Reporter: Stanley Poon >Priority: Major > Labels: ALS, parquet, persistence > > After fitting an ALSModel, get following error while saving the model: > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > Exactly the same code ran ok on 2.2.1. > Same issue also occurs on other ALSModels we have. > h2. *To reproduce* > Get ALSExample: > [https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala] > and add the following line to save the model right before "spark.stop". > {quote} model.write.overwrite().save("SparkExampleALSModel") > {quote} > h2. Stack Trace > Exception in thread "main" java.lang.ExceptionInInitializerError > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$$anonfun$setSchema$2.apply(ParquetWriteSupport.scala:444) > at scala.collection.immutable.List.foreach(List.scala:392) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport$.setSchema(ParquetWriteSupport.scala:444) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.prepareWrite(ParquetFileFormat.scala:112) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:140) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:154) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:654) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225) > at > org.apache.spark.ml.recommendation.ALSModel$ALSModelWriter.saveImpl(ALS.scala:510) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) > at com.vitalmove.model.ALSExample$.main(ALSExample.scala:83) > at com.vitalmove.model.ALSExample.main(ALSExample.scala) > Caused by: org.apache.parquet.schema.InvalidSchemaException: A group type can > not be empty. Parquet does not support empty group without leaves. Empty > group: spark_schema > at org.apache.parquet.schema.GroupType.(GroupType.java:92) > at org.apache.parquet.schema.GroupType.(GroupType.java:48) > at org.apache.parquet.schema.MessageType.(MessageType.java:50) > at org.apache.parquet.schema.Types$MessageTypeBuilder.named(Types.java:1256) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala:567) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.(ParquetSchemaConverter.scala) > -- This message was