Hi, *In a reduce operation I am trying to accumulate a list of SparseVectors. The code is given below;* val WNode = trainingData.reduce{(node1:Node,node2:Node) => val wNode = new Node(num1,num2) wNode.WhatList ++= (node1.WList) wNode.WList ++= (node2.WList) wNode }
where Whatlist is a list of SparseVectors. The average size of a SparseVector is 21000 and the approximate number of elements in the final list at the end of the reduce operation varies between 20 to 100. *However, at run time I am getting the following error messages from some of the executor machines.* 14/10/20 22:38:41 INFO BlockManagerInfo: Added taskresult_30 in memory on cse-hadoop-113:34602 (size: 789.0 MB, free: 22.2 GB) 14/10/20 22:38:41 INFO TaskSetManager: Starting task 1.0:12 as TID 34 on executor 6: cse-hadoop-113 (PROCESS_LOCAL) 14/10/20 22:38:41 INFO TaskSetManager: Serialized task 1.0:12 as 2170 bytes in 2 ms 14/10/20 22:38:41 INFO SendingConnection: Initiating connection to [cse-hadoop-113/192.168.0.113:34602] 14/10/20 22:38:41 INFO SendingConnection: Connected to [cse-hadoop-113/192.168.0.113:34602], 1 messages pending 14/10/20 22:38:41 INFO ConnectionManager: Accepted connection from [cse-hadoop-113/192.168.0.113] Exception in thread "pool-5-thread-3" java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.apache.spark.network.Message$.create(Message.scala:88) at org.apache.spark.network.ReceivingConnection$Inbox.org$apache$spark$network$ReceivingConnection$Inbox$$createNewMessage$1(Connection.scala:438) at org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448) at org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.spark.network.ReceivingConnection$Inbox.getChunk(Connection.scala:448) at org.apache.spark.network.ReceivingConnection.read(Connection.scala:525) at org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) *Please help.* -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Java-heap-space-during-reduce-operation-tp16835.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org