Re: set spark.storage.memoryFraction to 0 when no cached RDD and memory area for broadcast value?

2015-04-14 Thread Akhil Das
You could try leaving all the configuration values to default and running
your application and see if you are still hitting the heap issue, If so try
adding a Swap space to the machines which will definitely help. Another way
would be to set the heap space manually (export _JAVA_OPTIONS=-Xmx5g)

Thanks
Best Regards

On Wed, Apr 8, 2015 at 12:45 AM, Shuai Zheng szheng.c...@gmail.com wrote:

 Hi All,



 I am a bit confused on spark.storage.memoryFraction, this is used to set
 the area for RDD usage, will this RDD means only for cached and persisted
 RDD? So if my program has no cached RDD at all (means that I have no
 .cache() or .persist() call on any RDD), then I can set this
 spark.storage.memoryFraction to a very small number or even zero?



 I am writing a program which consume a lot of memory (broadcast value,
 runtime, etc). But I have no cached RDD, so should I just turn off this
 spark.storage.memoryFraction to 0 (which will help me to improve the
 performance)?



 And I have another issue on the broadcast, when I try to get a broadcast
 value, it throws me out of memory error, which part of memory should I
 allocate more (if I can’t increase my overall memory size).



 java.lang.OutOfMemoryError: Java heap spac

 e

 at
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$DoubleA

 rraySerializer.read(DefaultArraySerializers.java:218)

 at
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$DoubleA

 rraySerializer.read(DefaultArraySerializers.java:200)

 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.rea

 d(FieldSerializer.java:611)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSeria

 lizer.java:221)

 at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.rea

 d(FieldSerializer.java:605)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSeria

 lizer.java:221)

 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(Kryo

 Serializer.scala:138)

 at
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Ser

 ializer.scala:133)

 at
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:2

 48)

 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:13

 6)

 at
 org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:5

 49)

 at
 org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:431

 )

 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlo

 ck$1.apply(TorrentBroadcast.scala:167)

 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)

 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(Torren

 tBroadcast.scala:164)

 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(Torrent

 Broadcast.scala:64)

 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.s

 cala:64)

 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast

 .scala:87)





 Regards,



 Shuai



Re: set spark.storage.memoryFraction to 0 when no cached RDD and memory area for broadcast value?

2015-04-14 Thread twinkle sachdeva
Hi,

In one of the application we have made which had no clone stuff, we have
set the value of spark.storage.memoryFraction to very low, and yes that
gave us performance benefits.

Regarding that issue, you should also look at the data you are trying to
broadcast, as sometimes creating that data structure at executor's itself
as singleton helps.

Thanks,


On Tue, Apr 14, 2015 at 12:23 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 You could try leaving all the configuration values to default and running
 your application and see if you are still hitting the heap issue, If so try
 adding a Swap space to the machines which will definitely help. Another way
 would be to set the heap space manually (export _JAVA_OPTIONS=-Xmx5g)

 Thanks
 Best Regards

 On Wed, Apr 8, 2015 at 12:45 AM, Shuai Zheng szheng.c...@gmail.com
 wrote:

 Hi All,



 I am a bit confused on spark.storage.memoryFraction, this is used to set
 the area for RDD usage, will this RDD means only for cached and persisted
 RDD? So if my program has no cached RDD at all (means that I have no
 .cache() or .persist() call on any RDD), then I can set this
 spark.storage.memoryFraction to a very small number or even zero?



 I am writing a program which consume a lot of memory (broadcast value,
 runtime, etc). But I have no cached RDD, so should I just turn off this
 spark.storage.memoryFraction to 0 (which will help me to improve the
 performance)?



 And I have another issue on the broadcast, when I try to get a broadcast
 value, it throws me out of memory error, which part of memory should I
 allocate more (if I can’t increase my overall memory size).



 java.lang.OutOfMemoryError: Java heap spac

 e

 at
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$DoubleA

 rraySerializer.read(DefaultArraySerializers.java:218)

 at
 com.esotericsoftware.kryo.serializers.DefaultArraySerializers$DoubleA

 rraySerializer.read(DefaultArraySerializers.java:200)

 at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.rea

 d(FieldSerializer.java:611)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSeria

 lizer.java:221)

 at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.rea

 d(FieldSerializer.java:605)

 at
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSeria

 lizer.java:221)

 at
 com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)

 at
 org.apache.spark.serializer.KryoDeserializationStream.readObject(Kryo

 Serializer.scala:138)

 at
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Ser

 ializer.scala:133)

 at
 org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)

 at
 org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:2

 48)

 at
 org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:13

 6)

 at
 org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:5

 49)

 at
 org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:431

 )

 at
 org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlo

 ck$1.apply(TorrentBroadcast.scala:167)

 at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)

 at
 org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(Torren

 tBroadcast.scala:164)

 at
 org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(Torrent

 Broadcast.scala:64)

 at
 org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.s

 cala:64)

 at
 org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast

 .scala:87)





 Regards,



 Shuai