[ https://issues.apache.org/jira/browse/SPARK-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352016#comment-15352016 ]
Davies Liu commented on SPARK-15700: ------------------------------------ My guess is that the SQL metrics required more memory than before. cc [~cloud_fan] , could you test a join with these many partitions to measure the memory used by SQL metrics? > Spark 2.0 dataframes using more memory (reading/writing parquet) > ---------------------------------------------------------------- > > Key: SPARK-15700 > URL: https://issues.apache.org/jira/browse/SPARK-15700 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.0.0 > Reporter: Thomas Graves > > I was running a large 15TB join job with 100000 map tasks, 20000 reducers > that I frequently have run on Spark 1.6 successfully (with very little GC) > and it failed with an out of heap memory on the driver. Driver had 10GB heap > with 3GB overhead. > 16/05/31 22:47:44 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOfRange(Arrays.java:3520) > at > org.apache.parquet.io.api.Binary$ByteArrayBackedBinary.getBytes(Binary.java:262) > at > org.apache.parquet.column.statistics.BinaryStatistics.getMinBytes(BinaryStatistics.java:67) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetStatistics(ParquetMetadataConverter.java:242) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.addRowGroup(ParquetMetadataConverter.java:184) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.toParquetMetadata(ParquetMetadataConverter.java:95) > at > org.apache.parquet.hadoop.ParquetFileWriter.serializeFooter(ParquetFileWriter.java:472) > at > org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:500) > at > org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:490) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:63) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:221) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:479) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:252) > at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:234) > at > org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:626) > I haven't had a chance to look into this further yet just reporting it for > now. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org