[ https://issues.apache.org/jira/browse/SPARK-24841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552231#comment-16552231 ]
Kazuaki Ishizaki commented on SPARK-24841: ------------------------------------------ Thank you for reporting an issue with heap profiling. Would it be possible to post a standalone program that can reproduce this problem? > Memory leak in converting spark dataframe to pandas dataframe > ------------------------------------------------------------- > > Key: SPARK-24841 > URL: https://issues.apache.org/jira/browse/SPARK-24841 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.0 > Environment: Running PySpark in standalone mode > Reporter: Piyush Seth > Priority: Minor > > I am running a continuous running application using PySpark. In one of the > operations I have to convert PySpark data frame to Pandas data frame using > toPandas API on pyspark driver. After running for a while I am getting > "java.lang.OutOfMemoryError: GC overhead limit exceeded" error. > I tried running this in a loop and could see that the heap memory is > increasing continuously. When I ran jmap for the first time I had the > following top rows: > num #instances #bytes class name > ---------------------------------------------- > 1: 1757 411477568 [J > {color:#FF0000} *2: 124188 266323152 [C*{color} > 3: 167219 46821320 org.apache.spark.status.TaskDataWrapper > 4: 69683 27159536 [B > 5: 359278 8622672 java.lang.Long > 6: 221808 7097856 > java.util.concurrent.ConcurrentHashMap$Node > 7: 283771 6810504 scala.collection.immutable.$colon$colon > After running several iterations I had the following > num #instances #bytes class name > ---------------------------------------------- > {color:#FF0000} *1: 110760 3439887928 [C*{color} > 2: 698 411429088 [J > 3: 238096 66666880 org.apache.spark.status.TaskDataWrapper > 4: 68819 24050520 [B > 5: 498308 11959392 java.lang.Long > 6: 292741 9367712 > java.util.concurrent.ConcurrentHashMap$Node > 7: 282878 6789072 scala.collection.immutable.$colon$colon -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org