java.lang.OutOfMemoryError: Java heap space - Spark driver.

2018-08-29 Thread Guillermo Ortiz Fernández
I got this error from spark driver, it seems that I should increase the
memory in the driver although it's 5g (and 4 cores) right now. It seems
weird to me because I'm not using Kryo or broadcast in this process but in
the log there are references to Kryo and broadcast.
How could I figure out the reason of this outOfMemory? Is it normal that
there are references to Kryo and broadcasting when I'm not using it?

05:11:19.110 [streaming-job-executor-0] WARN
c.datastax.driver.core.CodecRegistry - Ignoring codec DateRangeCodec
['org.apache.cassandra.db.marshal.DateRangeType' <->
com.datastax.driver.dse.search.DateRange] because it collides with
previously registered codec DateRangeCodec
['org.apache.cassandra.db.marshal.DateRangeType' <->
com.datastax.driver.dse.search.DateRange]
05:11:26.806 [dag-scheduler-event-loop] WARN  org.apache.spark.util.Utils -
Suppressing exception in finally: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
~[na:1.8.0_162]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_162]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
~[lz4-1.3.0.jar:na]
at
net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
~[lz4-1.3.0.jar:na]
at com.esotericsoftware.kryo.io.Output.flush(Output.java:181)
~[kryo-3.0.3.jar:na]
at com.esotericsoftware.kryo.io.Output.close(Output.java:191)
~[kryo-3.0.3.jar:na]
at
org.apache.spark.serializer.KryoSerializationStream.close(KryoSerializer.scala:209)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$1.apply$mcV$sp(TorrentBroadcast.scala:238)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1319)
~[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:237)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:107)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:86)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1387)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1012)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:933)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:936)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.apply(DAGScheduler.scala:935)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at scala.collection.immutable.List.foreach(List.scala:392)
[scala-library-2.11.11.jar:na]
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:935)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:873)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1630)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
[spark-core_2.11-2.0.2.15.jar:2.0.2.15]
05:40:53.535 [dse-app-client-thread-pool-0] WARN
c.datastax.driver.core.CodecRegistry - Ignoring codec DateRangeCodec
['org.apache.cassandra.db.marshal.D

Re: Spark Mlib - java.lang.OutOfMemoryError: Java heap space

2017-04-24 Thread Selvam Raman
This is where job going out of memory

17/04/24 10:09:22 INFO TaskSetManager: Finished task 122.0 in stage 1.0
(TID 356) in 4260 ms on ip-...-45.dev (124/234)
17/04/24 10:09:26 INFO BlockManagerInfo: Removed taskresult_361 on
ip-10...-185.dev:36974 in memory (size: 5.2 MB, free: 8.5 GB)
17/04/24 10:09:26 INFO BlockManagerInfo: Removed taskresult_362 on
ip-...-45.dev:40963 in memory (size: 5.2 MB, free: 8.9 GB)
17/04/24 10:09:26 INFO TaskSetManager: Finished task 125.0 in stage 1.0
(TID 359) in 4383 ms on ip-...-45.dev (125/234)
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 15090"...
Killed

Node-45.dev contains 8.9GB free while it throws out of memory. Can anyone
please help me to understand the issue?

On Mon, Apr 24, 2017 at 11:22 AM, Selvam Raman <sel...@gmail.com> wrote:

> Hi,
>
> I have 1 master and 4 slave node. Input data size is 14GB.
> Slave Node config : 32GB Ram,16 core
>
>
> I am trying to train word embedding model using spark. It is going out of
> memory. To train 14GB of data how much memory do i require?.
>
>
> I have givem 20gb per executor but below shows it is using 11.8GB out of
> 20 GB.
> BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-.-.-.dev:35035
> (size: 4.6 KB, free: 11.8 GB)
>
>
> This is the code
> if __name__ == "__main__":
> sc = SparkContext(appName="Word2VecExample")  # SparkContext
>
> # $example on$
> inp = sc.textFile("s3://word2vec/data/word2vec_word_data.txt/").map(lambda
> row: row.split(" "))
>
> word2vec = Word2Vec()
> model = word2vec.fit(inp)
>
> model.save(sc, "s3://pysparkml/word2vecresult2/")
> sc.stop()
>
>
> Spark-submit Command:
> spark-submit --master yarn --conf 
> 'spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/mnt/tmp -XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal
> -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
> -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' --num-executors 4
> --executor-cores 2 --executor-memory 20g Word2VecExample.py
>
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Spark Mlib - java.lang.OutOfMemoryError: Java heap space

2017-04-24 Thread Selvam Raman
Hi,

I have 1 master and 4 slave node. Input data size is 14GB.
Slave Node config : 32GB Ram,16 core


I am trying to train word embedding model using spark. It is going out of
memory. To train 14GB of data how much memory do i require?.


I have givem 20gb per executor but below shows it is using 11.8GB out of 20
GB.
BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-.-.-.dev:35035
(size: 4.6 KB, free: 11.8 GB)


This is the code
if __name__ == "__main__":
sc = SparkContext(appName="Word2VecExample")  # SparkContext

# $example on$
inp =
sc.textFile("s3://word2vec/data/word2vec_word_data.txt/").map(lambda row:
row.split(" "))

word2vec = Word2Vec()
model = word2vec.fit(inp)

model.save(sc, "s3://pysparkml/word2vecresult2/")
sc.stop()


Spark-submit Command:
spark-submit --master yarn --conf
'spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/mnt/tmp -XX:+UseG1GC -XX:+UseG1GC -XX:+PrintFlagsFinal
-XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark' --num-executors 4
--executor-cores 2 --executor-memory 20g Word2VecExample.py


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Spark Sql - "broadcast-exchange-1" java.lang.OutOfMemoryError: Java heap space

2016-10-25 Thread Selvam Raman
Hi,

Need a help to figure out and solve heap space problem.

I have query which contains 15+ table and when i trying to print out the
result(Just 23 rows) it throws heap space error.

Following command i tried in standalone mode:
(My mac machine having 8 core and 15GB ram)

spark.conf().set("spark.sql.shuffle.partitions", 20);

./spark-submit --master spark://selva:7077 --executor-memory 2g
--total-executor-cores 4 --class MemIssue --conf
'spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:+UseG1GC
-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark'
/Users/rs/Desktop/test.jar

This is my below query:

select concat(sf1.scode, ''-'', m.mcode, ''-'', rf.rnum) , sf1.scode ,
concat(p.lname,'', '',ci.pyear), at.atext Alias, m.mcode Method, mt.mcode,
v.vname, nd.vmeas " +

" from  result r " +

"  join  var v on v.vnum = r.vnum " +

"  join  numa nd on nd.rnum = r.num " +

"  join  feat  fa on fa.fnum = r.fnum " +

"  join  samp  sf1 on sf1.snum = fa.snum " +

"  join  spe  sp on sf1.snum = sp.snum and sp.mnum not in
(1,2)" +

"  join  act  a on a.anum = fa.anum " +

"  join  met  m on m.mnum = a.mnum " +

"  join  sampl  sfa on sfa.snum = sf1.snum " +

"  join  ann  at on at.anum = sfa.anum AND at.atypenum = 11 " +

"  join  data  dr on r.rnum = dr.rnum " +

"  join  cit  cd on dr.dnum = cd.dnum " +

"  join  cit  on cd.cnum = ci.cnum " +

"  join  aut  al on ci.cnum = al.cnum and al.aorder = 1 " +

"  join  per  p on al.pnum = p.pnum " +

"  left join  rel  rf on sf1.snum = rf.snum " +

"  left join  samp sf2 on rf.rnum = sf2.snum " +

"  left join  spe  s on s.snum = sf1.snum " +

"  left join  mat  mt on mt.mnum = s.mnum " +

" where sf1.sampling_feature_code = '1234test''" +

" order by 1,2


spark.sql(query).show


When i checked wholstagecode, first it reads all data from the table. Why
it is reading all the data from table and doing sort merge join for 3 or 4
tables. Why it is not applying any filtering value.


Though i have given large memory for executor it is still throws the same
error. when spark sql do the joining how it is utilizing memory and cores.

Any guidelines would be greatly welcome.
-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space

2016-07-22 Thread Andy Davidson
Hi Ted

In general I want this application to use all available resources. I just
bumped the driver memory to 2G. I also bumped the executor memory up to 2G.

It will take a couple of hours before I know if this made a difference or
not

I am not sure if setting executor memory is a good idea. I am concerned that
this will reduce concurrency

Thanks

Andy

From:  Ted Yu <yuzhih...@gmail.com>
Date:  Friday, July 22, 2016 at 2:54 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: Exception in thread "dispatcher-event-loop-1"
java.lang.OutOfMemoryError: Java heap space

> How much heap memory do you give the driver ?
> 
> On Fri, Jul 22, 2016 at 2:17 PM, Andy Davidson <a...@santacruzintegration.com>
> wrote:
>> Given I get a stack trace in my python notebook I am guessing the driver is
>> running out of memory?
>> 
>> My app is simple it creates a list of dataFrames from s3://, and counts each
>> one. I would not think this would take a lot of driver memory
>> 
>> I am not running my code locally. Its using 12 cores. Each node has 6G.
>> 
>> Any suggestions would be greatly appreciated
>> 
>> Andy
>> 
>> def work():
>> 
>> constituentDFS = getDataFrames(constituentDataSets)
>> 
>> results = ["{} {}".format(name, constituentDFS[name].count()) for name in
>> constituentDFS]
>> 
>> print(results)
>> 
>> return results
>> 
>> 
>> 
>> %timeit -n 1 -r 1 results = work()
>> 
>> 
>>  in (.0)  1 def work():  2
>> constituentDFS = getDataFrames(constituentDataSets)> 3 results = ["{}
>> {}".format(name, constituentDFS[name].count()) for name in constituentDFS]
>> 4 print(results)  5 return results
>> 
>> 16/07/22 17:54:38 WARN TaskSetManager: Stage 146 contains a task of very
>> large size (145 KB). The maximum recommended task size is 100 KB.
>> 
>> 16/07/22 18:39:47 WARN HeartbeatReceiver: Removing executor 2 with no recent
>> heartbeats: 153037 ms exceeds timeout 12 ms
>> 
>> Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
>> Java heap space
>> 
>> at java.util.jar.Manifest$FastInputStream.(Manifest.java:332)
>> 
>> at java.util.jar.Manifest$FastInputStream.(Manifest.java:327)
>> 
>> at java.util.jar.Manifest.read(Manifest.java:195)
>> 
>> at java.util.jar.Manifest.(Manifest.java:69)
>> 
>> at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
>> 
>> at java.util.jar.JarFile.getManifest(JarFile.java:180)
>> 
>> at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:944)
>> 
>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
>> 
>> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>> 
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>> 
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>> 
>> at java.security.AccessController.doPrivileged(Native Method)
>> 
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>> 
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> 
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>> 
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> 
>> at 
>> org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImp
>> l.scala:510)
>> 
>> at 
>> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.s
>> cala:473)
>> 
>> at 
>> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceive
>> r$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:199)
>> 
>> at 
>> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceive
>> r$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:195)
>> 
>> at 
>> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Traversa
>> bleLike.scala:772)
>> 
>> at 
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> 
>> at 
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> 
>> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>> 
>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>> 
>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>> 
>> at 
>> 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771>>
)
>> 
>> at org.apache.spark.HeartbeatReceiver.org
>> <http://org.apache.spark.HeartbeatReceiver.org>
>> $apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:195)
>> 
>> at 
>> org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(Hea
>> rtbeatReceiver.scala:118)
>> 
>> at 
>> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:
>> 104)
>> 
>> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>> 
>> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>> 
>> 16/07/22 19:08:29 WARN NettyRpcEnv: Ignored message: true
>> 
>> 
> 




Re: Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space

2016-07-22 Thread Ted Yu
How much heap memory do you give the driver ?

On Fri, Jul 22, 2016 at 2:17 PM, Andy Davidson <
a...@santacruzintegration.com> wrote:

> Given I get a stack trace in my python notebook I am guessing the driver
> is running out of memory?
>
> My app is simple it creates a list of dataFrames from s3://, and counts
> each one. I would not think this would take a lot of driver memory
>
> I am not running my code locally. Its using 12 cores. Each node has 6G.
>
> Any suggestions would be greatly appreciated
>
> Andy
>
> def work():
>
> constituentDFS = getDataFrames(constituentDataSets)
>
> results = ["{} {}".format(name, constituentDFS[name].count()) for name
> in constituentDFS]
>
> print(results)
>
> return results
>
>
> %timeit -n 1 -r 1 results = work()
>
>
>  in (.0)  1 def work():  2
>  constituentDFS = getDataFrames(constituentDataSets)> 3 results = 
> ["{} {}".format(name, constituentDFS[name].count()) for name in 
> constituentDFS]  4 print(results)  5 return results
>
>
> 16/07/22 17:54:38 WARN TaskSetManager: Stage 146 contains a task of very
> large size (145 KB). The maximum recommended task size is 100 KB.
>
> 16/07/22 18:39:47 WARN HeartbeatReceiver: Removing executor 2 with no
> recent heartbeats: 153037 ms exceeds timeout 12 ms
>
> Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
> Java heap space
>
> at java.util.jar.Manifest$FastInputStream.(Manifest.java:332)
>
> at java.util.jar.Manifest$FastInputStream.(Manifest.java:327)
>
> at java.util.jar.Manifest.read(Manifest.java:195)
>
> at java.util.jar.Manifest.(Manifest.java:69)
>
> at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)
>
> at java.util.jar.JarFile.getManifest(JarFile.java:180)
>
> at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:944)
>
> at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)
>
> at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerImpl.scala:510)
>
> at
> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:473)
>
> at
> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:199)
>
> at
> org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiver$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:195)
>
> at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>
> at
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>
> at org.apache.spark.HeartbeatReceiver.org
> $apache$spark$HeartbeatReceiver$$expireDeadHosts(HeartbeatReceiver.scala:195)
>
> at
> org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(HeartbeatReceiver.scala:118)
>
> at
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)
>
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>
> 16/07/22 19:08:29 WARN NettyRpcEnv: Ignored message: true
>
>
>


Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space

2016-07-22 Thread Andy Davidson
Given I get a stack trace in my python notebook I am guessing the driver is
running out of memory?

My app is simple it creates a list of dataFrames from s3://, and counts each
one. I would not think this would take a lot of driver memory

I am not running my code locally. Its using 12 cores. Each node has 6G.

Any suggestions would be greatly appreciated

Andy

def work():

constituentDFS = getDataFrames(constituentDataSets)

results = ["{} {}".format(name, constituentDFS[name].count()) for name
in constituentDFS]

print(results)

return results



%timeit -n 1 -r 1 results = work()


 in (.0)  1 def work():  2
constituentDFS = getDataFrames(constituentDataSets)> 3 results =
["{} {}".format(name, constituentDFS[name].count()) for name in
constituentDFS]  4 print(results)  5 return results

16/07/22 17:54:38 WARN TaskSetManager: Stage 146 contains a task of very
large size (145 KB). The maximum recommended task size is 100 KB.

16/07/22 18:39:47 WARN HeartbeatReceiver: Removing executor 2 with no recent
heartbeats: 153037 ms exceeds timeout 12 ms

Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError:
Java heap space

at java.util.jar.Manifest$FastInputStream.(Manifest.java:332)

at java.util.jar.Manifest$FastInputStream.(Manifest.java:327)

at java.util.jar.Manifest.read(Manifest.java:195)

at java.util.jar.Manifest.(Manifest.java:69)

at java.util.jar.JarFile.getManifestFromReference(JarFile.java:199)

at java.util.jar.JarFile.getManifest(JarFile.java:180)

at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:944)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:450)

at java.net.URLClassLoader.access$100(URLClassLoader.java:73)

at java.net.URLClassLoader$1.run(URLClassLoader.java:368)

at java.net.URLClassLoader$1.run(URLClassLoader.java:362)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:361)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at 
org.apache.spark.scheduler.TaskSchedulerImpl.logExecutorLoss(TaskSchedulerIm
pl.scala:510)

at 
org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.
scala:473)

at 
org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiv
er$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:199)

at 
org.apache.spark.HeartbeatReceiver$$anonfun$org$apache$spark$HeartbeatReceiv
er$$expireDeadHosts$3.apply(HeartbeatReceiver.scala:195)

at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(Travers
ableLike.scala:772)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)

at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)

at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:77
1)

at 
org.apache.spark.HeartbeatReceiver.org$apache$spark$HeartbeatReceiver$$expir
eDeadHosts(HeartbeatReceiver.scala:195)

at 
org.apache.spark.HeartbeatReceiver$$anonfun$receiveAndReply$1.applyOrElse(He
artbeatReceiver.scala:118)

at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala
:104)

at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)

at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)

16/07/22 19:08:29 WARN NettyRpcEnv: Ignored message: true






Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
Can you show me at Spark UI -> executors tab and storage tab.
It will show us how many executor was executed and how much memory we use to 
cache.

 


> On Jul 14, 2016, at 9:49 AM, Jean Georges Perrin <j...@jgp.net> wrote:
> 
> I use it as a standalone cluster.
> 
> I run it through start-master, then start-slave. I only have one slave now, 
> but I will probably have a few soon.
> 
> The "application" is run on a separate box.
> 
> When everything was running on my mac, i was in local mode, but i never setup 
> anything in local mode. Going "production" was a little more complex that I 
> thought.
> 
>> On Jul 13, 2016, at 10:35 PM, Chanh Le <giaosu...@gmail.com 
>> <mailto:giaosu...@gmail.com>> wrote:
>> 
>> Hi Jean,
>> How do you run your Spark Application? Local Mode, Cluster Mode? 
>> If you run in local mode did you use —driver-memory and —executor-memory 
>> because in local mode your setting about executor and driver didn’t work 
>> that you expected.
>> 
>> 
>> 
>> 
>>> On Jul 14, 2016, at 8:43 AM, Jean Georges Perrin <j...@jgp.net 
>>> <mailto:j...@jgp.net>> wrote:
>>> 
>>> Looks like replacing the setExecutorEnv() by set() did the trick... let's 
>>> see how fast it'll process my 50x 10ˆ15 data points...
>>> 
>>>> On Jul 13, 2016, at 9:24 PM, Jean Georges Perrin <j...@jgp.net 
>>>> <mailto:j...@jgp.net>> wrote:
>>>> 
>>>> I have added:
>>>> 
>>>>SparkConf conf = new 
>>>> SparkConf().setAppName("app").setExecutorEnv("spark.executor.memory", "8g")
>>>>.setMaster("spark://10.0.100.120:7077 
>>>> ");
>>>> 
>>>> but it did not change a thing
>>>> 
>>>>> On Jul 13, 2016, at 9:14 PM, Jean Georges Perrin <j...@jgp.net 
>>>>> <mailto:j...@jgp.net>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have a Java memory issue with Spark. The same application working on my 
>>>>> 8GB Mac crashes on my 72GB Ubuntu server...
>>>>> 
>>>>> I have changed things in the conf file, but it looks like Spark does not 
>>>>> care, so I wonder if my issues are with the driver or executor.
>>>>> 
>>>>> I set:
>>>>> 
>>>>> spark.driver.memory 20g
>>>>> spark.executor.memory   20g
>>>>> And, whatever I do, the crash is always at the same spot in the app, 
>>>>> which makes me think that it is a driver problem.
>>>>> 
>>>>> The exception I get is:
>>>>> 
>>>>> 16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 
>>>>> 208, micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
>>>>> at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
>>>>> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
>>>>> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
>>>>> at org.apache.hadoop.io.Text.decode(Text.java:412)
>>>>> at org.apache.hadoop.io.Text.decode(Text.java:389)
>>>>> at org.apache.hadoop.io.Text.toString(Text.java:280)
>>>>> at 
>>>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>>>> at 
>>>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>> at 
>>>>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>>>>> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>>>>> at 
>>>>> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>>>>> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>>>>> at 
>>&

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Jean Georges Perrin
I use it as a standalone cluster.

I run it through start-master, then start-slave. I only have one slave now, but 
I will probably have a few soon.

The "application" is run on a separate box.

When everything was running on my mac, i was in local mode, but i never setup 
anything in local mode. Going "production" was a little more complex that I 
thought.

> On Jul 13, 2016, at 10:35 PM, Chanh Le <giaosu...@gmail.com> wrote:
> 
> Hi Jean,
> How do you run your Spark Application? Local Mode, Cluster Mode? 
> If you run in local mode did you use —driver-memory and —executor-memory 
> because in local mode your setting about executor and driver didn’t work that 
> you expected.
> 
> 
> 
> 
>> On Jul 14, 2016, at 8:43 AM, Jean Georges Perrin <j...@jgp.net 
>> <mailto:j...@jgp.net>> wrote:
>> 
>> Looks like replacing the setExecutorEnv() by set() did the trick... let's 
>> see how fast it'll process my 50x 10ˆ15 data points...
>> 
>>> On Jul 13, 2016, at 9:24 PM, Jean Georges Perrin <j...@jgp.net 
>>> <mailto:j...@jgp.net>> wrote:
>>> 
>>> I have added:
>>> 
>>> SparkConf conf = new 
>>> SparkConf().setAppName("app").setExecutorEnv("spark.executor.memory", "8g")
>>> .setMaster("spark://10.0.100.120:7077 
>>> ");
>>> 
>>> but it did not change a thing
>>> 
>>>> On Jul 13, 2016, at 9:14 PM, Jean Georges Perrin <j...@jgp.net 
>>>> <mailto:j...@jgp.net>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I have a Java memory issue with Spark. The same application working on my 
>>>> 8GB Mac crashes on my 72GB Ubuntu server...
>>>> 
>>>> I have changed things in the conf file, but it looks like Spark does not 
>>>> care, so I wonder if my issues are with the driver or executor.
>>>> 
>>>> I set:
>>>> 
>>>> spark.driver.memory 20g
>>>> spark.executor.memory   20g
>>>> And, whatever I do, the crash is always at the same spot in the app, which 
>>>> makes me think that it is a driver problem.
>>>> 
>>>> The exception I get is:
>>>> 
>>>> 16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 
>>>> 208, micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
>>>> at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
>>>> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
>>>> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
>>>> at org.apache.hadoop.io.Text.decode(Text.java:412)
>>>> at org.apache.hadoop.io.Text.decode(Text.java:389)
>>>> at org.apache.hadoop.io.Text.toString(Text.java:280)
>>>> at 
>>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>>> at 
>>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>> at 
>>>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>>>> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>>>> at 
>>>> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>>>> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>>>> at 
>>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>>>> at 
>>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>>>> at 
>>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>>>> at 
>>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>>>> at 
>>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>>

Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Chanh Le
Hi Jean,
How do you run your Spark Application? Local Mode, Cluster Mode? 
If you run in local mode did you use —driver-memory and —executor-memory 
because in local mode your setting about executor and driver didn’t work that 
you expected.




> On Jul 14, 2016, at 8:43 AM, Jean Georges Perrin <j...@jgp.net> wrote:
> 
> Looks like replacing the setExecutorEnv() by set() did the trick... let's see 
> how fast it'll process my 50x 10ˆ15 data points...
> 
>> On Jul 13, 2016, at 9:24 PM, Jean Georges Perrin <j...@jgp.net 
>> <mailto:j...@jgp.net>> wrote:
>> 
>> I have added:
>> 
>>  SparkConf conf = new 
>> SparkConf().setAppName("app").setExecutorEnv("spark.executor.memory", "8g")
>>  .setMaster("spark://10.0.100.120:7077 
>> ");
>> 
>> but it did not change a thing
>> 
>>> On Jul 13, 2016, at 9:14 PM, Jean Georges Perrin <j...@jgp.net 
>>> <mailto:j...@jgp.net>> wrote:
>>> 
>>> Hi,
>>> 
>>> I have a Java memory issue with Spark. The same application working on my 
>>> 8GB Mac crashes on my 72GB Ubuntu server...
>>> 
>>> I have changed things in the conf file, but it looks like Spark does not 
>>> care, so I wonder if my issues are with the driver or executor.
>>> 
>>> I set:
>>> 
>>> spark.driver.memory 20g
>>> spark.executor.memory   20g
>>> And, whatever I do, the crash is always at the same spot in the app, which 
>>> makes me think that it is a driver problem.
>>> 
>>> The exception I get is:
>>> 
>>> 16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, 
>>> micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
>>> at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
>>> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
>>> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
>>> at org.apache.hadoop.io.Text.decode(Text.java:412)
>>> at org.apache.hadoop.io.Text.decode(Text.java:389)
>>> at org.apache.hadoop.io.Text.toString(Text.java:280)
>>> at 
>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>> at 
>>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>> at 
>>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>>> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>>> at 
>>> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>>> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at 
>>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>>> at 
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 
>>> I have set a small memory "dumper" in my app. At the beginning, it says:
>>> 
>>> **  Free . 1,413,566
>>> **  Allocated  1,705,984
>>> **  Max .. 16,495,104
>>> **> Total free ... 16,202,686
>>> Just before the crash, it says:
>>> 
>>> **  Free . 1,461,633
>>> **  Allocated  1,786,880
>>> **  Max .. 16,495,104
>>> **> Total free ... 16,169,857
>>> 
>>> 
>>> 
>>> 
>> 
> 



Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Jean Georges Perrin
Looks like replacing the setExecutorEnv() by set() did the trick... let's see 
how fast it'll process my 50x 10ˆ15 data points...

> On Jul 13, 2016, at 9:24 PM, Jean Georges Perrin <j...@jgp.net> wrote:
> 
> I have added:
> 
>   SparkConf conf = new 
> SparkConf().setAppName("app").setExecutorEnv("spark.executor.memory", "8g")
>   .setMaster("spark://10.0.100.120:7077 
> ");
> 
> but it did not change a thing
> 
>> On Jul 13, 2016, at 9:14 PM, Jean Georges Perrin <j...@jgp.net 
>> <mailto:j...@jgp.net>> wrote:
>> 
>> Hi,
>> 
>> I have a Java memory issue with Spark. The same application working on my 
>> 8GB Mac crashes on my 72GB Ubuntu server...
>> 
>> I have changed things in the conf file, but it looks like Spark does not 
>> care, so I wonder if my issues are with the driver or executor.
>> 
>> I set:
>> 
>> spark.driver.memory 20g
>> spark.executor.memory   20g
>> And, whatever I do, the crash is always at the same spot in the app, which 
>> makes me think that it is a driver problem.
>> 
>> The exception I get is:
>> 
>> 16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, 
>> micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
>> at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
>> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
>> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
>> at org.apache.hadoop.io.Text.decode(Text.java:412)
>> at org.apache.hadoop.io.Text.decode(Text.java:389)
>> at org.apache.hadoop.io.Text.toString(Text.java:280)
>> at 
>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>> at 
>> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>> at 
>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>> at 
>> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
>> at 
>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 
>> I have set a small memory "dumper" in my app. At the beginning, it says:
>> 
>> **  Free . 1,413,566
>> **  Allocated  1,705,984
>> **  Max .. 16,495,104
>> **> Total free ... 16,202,686
>> Just before the crash, it says:
>> 
>> **  Free . 1,461,633
>> **  Allocated  1,786,880
>> **  Max .. 16,495,104
>> **> Total free ... 16,169,857
>> 
>> 
>> 
>> 
> 



Re: Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Jean Georges Perrin
I have added:

SparkConf conf = new 
SparkConf().setAppName("app").setExecutorEnv("spark.executor.memory", "8g")
.setMaster("spark://10.0.100.120:7077");

but it did not change a thing

> On Jul 13, 2016, at 9:14 PM, Jean Georges Perrin <j...@jgp.net> wrote:
> 
> Hi,
> 
> I have a Java memory issue with Spark. The same application working on my 8GB 
> Mac crashes on my 72GB Ubuntu server...
> 
> I have changed things in the conf file, but it looks like Spark does not 
> care, so I wonder if my issues are with the driver or executor.
> 
> I set:
> 
> spark.driver.memory 20g
> spark.executor.memory   20g
> And, whatever I do, the crash is always at the same spot in the app, which 
> makes me think that it is a driver problem.
> 
> The exception I get is:
> 
> 16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, 
> micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
> at java.nio.CharBuffer.allocate(CharBuffer.java:335)
> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
> at org.apache.hadoop.io.Text.decode(Text.java:412)
> at org.apache.hadoop.io.Text.decode(Text.java:389)
> at org.apache.hadoop.io.Text.toString(Text.java:280)
> at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
> at 
> org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
> at 
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
> at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
> at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
> at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
> at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
> at 
> org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 
> I have set a small memory "dumper" in my app. At the beginning, it says:
> 
> **  Free . 1,413,566
> **  Allocated  1,705,984
> **  Max .. 16,495,104
> **> Total free ... 16,202,686
> Just before the crash, it says:
> 
> **  Free . 1,461,633
> **  Allocated  1,786,880
> **  Max .. 16,495,104
> **> Total free ... 16,169,857
> 
> 
> 
> 



Memory issue java.lang.OutOfMemoryError: Java heap space

2016-07-13 Thread Jean Georges Perrin
Hi,

I have a Java memory issue with Spark. The same application working on my 8GB 
Mac crashes on my 72GB Ubuntu server...

I have changed things in the conf file, but it looks like Spark does not care, 
so I wonder if my issues are with the driver or executor.

I set:

spark.driver.memory 20g
spark.executor.memory   20g
And, whatever I do, the crash is always at the same spot in the app, which 
makes me think that it is a driver problem.

The exception I get is:

16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, 
micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:335)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
at org.apache.hadoop.io.Text.decode(Text.java:412)
at org.apache.hadoop.io.Text.decode(Text.java:389)
at org.apache.hadoop.io.Text.toString(Text.java:280)
at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I have set a small memory "dumper" in my app. At the beginning, it says:

**  Free . 1,413,566
**  Allocated  1,705,984
**  Max .. 16,495,104
**> Total free ... 16,202,686
Just before the crash, it says:

**  Free . 1,461,633
**  Allocated  1,786,880
**  Max .. 16,495,104
**> Total free ... 16,169,857






SparkDriver throwing java.lang.OutOfMemoryError: Java heap space

2016-04-04 Thread Nirav Patel
Hi,

We are using spark 1.5.2 and recently hitting this issue after our dataset
grew from 140GB to 160GB. Error is thrown during shuffle fetch on reduce
side which all should happen on executors and executor should report them!
However its gets reported only on driver. SparkContext gets shutdown from
driver side after this error occur.

Here's what I see in driver logs.



2016-04-04 03:51:32,889 INFO [sparkDriver-akka.actor.default-dispatcher-17]
org.apache.spark.MapOutputTrackerMasterEndpoint: Asked to send map output
locations for shuffle 3 to hdn3.mycomp:37339
2016-04-04 03:51:32,890 INFO [sparkDriver-akka.actor.default-dispatcher-17]
org.apache.spark.MapOutputTrackerMasterEndpoint: Asked to send map output
locations for shuffle 3 to hdn3.mycomp:57666
2016-04-04 03:51:33,133 INFO [sparkDriver-akka.actor.default-dispatcher-21]
org.apache.spark.storage.BlockManagerInfo: Removed broadcast_12_piece0 on
10.250.70.117:42566 in memory (size: 1939.0 B, free: 232.5 MB)
2016-04-04 03:51:38,432 ERROR
[sparkDriver-akka.actor.default-dispatcher-14]
org.apache.spark.rpc.akka.ErrorMonitor: Uncaught fatal error from thread
[sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at
akka.remote.transport.AkkaPduProtobufCodec$.constructMessage(AkkaPduCodec.scala:138)
at akka.remote.EndpointWriter.writeSend(Endpoint.scala:740)
at akka.remote.EndpointWriter$$anonfun$2.applyOrElse(Endpoint.scala:718)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:411)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2016-04-04 03:51:38,432 ERROR
[sparkDriver-akka.actor.default-dispatcher-21] akka.actor.ActorSystemImpl:
Uncaught fatal error from thread
[sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at
akka.remote.transport.AkkaPduProtobufCodec$.constructMessage(AkkaPduCodec.scala:138)
at akka.remote.EndpointWriter.writeSend(Endpoint.scala:740)
at akka.remote.EndpointWriter$$anonfun$2.applyOrElse(Endpoint.scala:718)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:411)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2016-04-04 03:51:40,246 ERROR [sparkDriver-akka.actor.default-dispatcher-4]
akka.actor.ActorSystemImpl: Uncaught fatal error from thread
[sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
ActorSystem [sparkDriver]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
at
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply$mcV$sp(Serializer.scala:129)
at
akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:129)
at
akka.serialization.JavaSerializer$$anonfun$toBinary$1.apply(Serializer.scala:129)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at akka.serialization.JavaSerializer.toBinary(Serializer.scala:129)
at akka.remote.MessageSerializer$.serialize(MessageSerializer.scala:36

Re: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-12-15 Thread Deenar Toraskar
On 16 December 2015 at 06:19, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:

> Hi
>
> I had the same problem. There is a query with a lot of small tables (5x)
> all below the broadcast threshold and Spark is broadcasting all these
> tables together without checking if there is sufficient memory available.
>
> I got around this issue by reducing the
> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
> tables in the query.
>
> This looks like a issue to me. A fix would be to
> a) ensure that in addition to the per table threshold, there is a total
> broadcast size per query, so only data upto that limit is broadcast
> preventing executors running out of memory.
>
> Shall I raise a JIRA for this?
>
> Regards
> Deenar
>
>
> On 4 November 2015 at 22:55, Shuai Zheng <szheng.c...@gmail.com> wrote:
>
>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>> that there is no issue on the logic and data, it is caused by the new
>> version of Spark.
>>
>>
>>
>> So I want to know any new setup I should set in Spark 1.5 to make it
>> work?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>>
>>
>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>> *To:* user@spark.apache.org
>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>> java.lang.OutOfMemoryError: Java heap space
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I have a program which actually run a bit complex business (join) in
>> spark. And I have below exception:
>>
>>
>>
>> I running on Spark 1.5, and with parameter:
>>
>>
>>
>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>> --executor-memory=45G –class …
>>
>>
>>
>> Some other setup:
>>
>>
>>
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>> "2047m");
>>
>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>> "104857600");
>>
>>
>>
>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>> parameter I should set if I have below OutOfMemoryError exception.
>>
>>
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> #   Executing /bin/sh -c "kill -9 10181"...
>>
>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>> Java heap space
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>> Source)
>>
>> at
>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>
>> at
>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Any hint will be very helpful.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>
>


Re: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-12-15 Thread Deenar Toraskar
Hi

I have created an issue for this
https://issues.apache.org/jira/browse/SPARK-12358

Regards
Deenar

On 16 December 2015 at 06:21, Deenar Toraskar <deenar.toras...@gmail.com>
wrote:

>
>
> On 16 December 2015 at 06:19, Deenar Toraskar <
> deenar.toras...@thinkreactive.co.uk> wrote:
>
>> Hi
>>
>> I had the same problem. There is a query with a lot of small tables (5x)
>> all below the broadcast threshold and Spark is broadcasting all these
>> tables together without checking if there is sufficient memory available.
>>
>> I got around this issue by reducing the
>> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
>> tables in the query.
>>
>> This looks like a issue to me. A fix would be to
>> a) ensure that in addition to the per table threshold, there is a total
>> broadcast size per query, so only data upto that limit is broadcast
>> preventing executors running out of memory.
>>
>> Shall I raise a JIRA for this?
>>
>> Regards
>> Deenar
>>
>>
>> On 4 November 2015 at 22:55, Shuai Zheng <szheng.c...@gmail.com> wrote:
>>
>>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>>> that there is no issue on the logic and data, it is caused by the new
>>> version of Spark.
>>>
>>>
>>>
>>> So I want to know any new setup I should set in Spark 1.5 to make it
>>> work?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>>
>>>
>>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I have a program which actually run a bit complex business (join) in
>>> spark. And I have below exception:
>>>
>>>
>>>
>>> I running on Spark 1.5, and with parameter:
>>>
>>>
>>>
>>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>>> --executor-memory=45G –class …
>>>
>>>
>>>
>>> Some other setup:
>>>
>>>
>>>
>>> sparkConf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>>> "2047m");
>>>
>>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>>> "104857600");
>>>
>>>
>>>
>>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>>> parameter I should set if I have below OutOfMemoryError exception.
>>>
>>>
>>>
>>> #
>>>
>>> # java.lang.OutOfMemoryError: Java heap space
>>>
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>
>>> #   Executing /bin/sh -c "kill -9 10181"...
>>>
>>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>>> Java heap space
>>>
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>
>>> at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> Any hint will be very helpful.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>
>>
>


RE: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-11-04 Thread Shuai Zheng
And an update is: this ONLY happen in Spark 1.5, I try to run it under Spark
1.4 and 1.4.1, there are no issue (the program is developed under Spark 1.4
last time, and I just re-test it, it works). So this is proven that there is
no issue on the logic and data, it is caused by the new version of Spark.

 

So I want to know any new setup I should set in Spark 1.5 to make it work? 

 

Regards,

 

Shuai

 

From: Shuai Zheng [mailto:szheng.c...@gmail.com] 
Sent: Wednesday, November 04, 2015 3:22 PM
To: user@spark.apache.org
Subject: [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
java.lang.OutOfMemoryError: Java heap space

 

Hi All,

 

I have a program which actually run a bit complex business (join) in spark.
And I have below exception:

 

I running on Spark 1.5, and with parameter:

 

spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
--executor-memory=45G -class . 

 

Some other setup:

 

sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buff
er.max", "2047m");

sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
"104857600");

 

This is running on AWS c3*8xlarge instance. I am not sure what kind of
parameter I should set if I have below OutOfMemoryError exception.

 

#

# java.lang.OutOfMemoryError: Java heap space

# -XX:OnOutOfMemoryError="kill -9 %p"

#   Executing /bin/sh -c "kill -9 10181"...

Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java
heap space

at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProje
ction.apply(Unknown Source)

at
org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelat
ion.scala:380)

at
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.sc
ala:123)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.sc
ala:100)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.
scala:24)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

at java.lang.Thread.run(Thread.java:745)

 

Any hint will be very helpful.

 

Regards,

 

Shuai



[Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space

2015-11-04 Thread Shuai Zheng
Hi All,

 

I have a program which actually run a bit complex business (join) in spark.
And I have below exception:

 

I running on Spark 1.5, and with parameter:

 

spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
--executor-memory=45G -class . 

 

Some other setup:

 

sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buff
er.max", "2047m");

sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
"104857600");

 

This is running on AWS c3*8xlarge instance. I am not sure what kind of
parameter I should set if I have below OutOfMemoryError exception.

 

#

# java.lang.OutOfMemoryError: Java heap space

# -XX:OnOutOfMemoryError="kill -9 %p"

#   Executing /bin/sh -c "kill -9 10181"...

Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java
heap space

at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProje
ction.apply(Unknown Source)

at
org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelat
ion.scala:380)

at
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.sc
ala:123)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.sc
ala:100)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.
scala:24)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

at java.lang.Thread.run(Thread.java:745)

 

Any hint will be very helpful.

 

Regards,

 

Shuai



Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-17 Thread Sanjay Subramanian
ok solved. Looks like breathing the the spark-summit SFO air for 3 days helped 
a lot !
Piping the 7 million records to local disk still runs out of memory.So piped 
the results into another Hive table. I can live with that :-) 
/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e use aers; create table 
unique_aers_demo as select distinct isr,event_dt,age,age_cod,sex,year,quarter 
from aers.aers_demo_view  --driver-memory 4G --total-executor-cores 12 
--executor-memory 4G

thanks

  From: Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID
 To: user@spark.apache.org user@spark.apache.org 
 Sent: Thursday, June 11, 2015 8:43 AM
 Subject: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java 
heap space
   
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node



| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

aers.aers_demo_view (7 million+ records)===isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod`,  
 `aers_demo_v6`.`sex` AS `GNDR_COD`,   `aers_demo_v6`.`year`,   
`aers_demo_v6`.`quarter`FROM  `aers`.`aers_demo_v6`) `aers_demo_view`






15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
user handler while handling an exception event ([id: 0x01b99855, 
/10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space)java.lang.OutOfMemoryError: Java heap space        at 
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)      
  at 
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
        at 
org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)        at 
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
        at 
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)        
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)      
  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)       
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-16 Thread Sanjay Subramanian
Hi Josh

It was great meeting u in person at the spark-summit SFO yesterday.
Thanks for discussing potential solutions to the problem.
I verified that 2 hive gateway nodes had not been configured correctly. My bad.
I added hive-site.xml to the spark Conf directories for these 2 additional hive 
gateway nodes. 

Plus I increased the driver-memory parameter to 1gb. That solved the memory 
issue. 

So good news is I can get spark-SQL running in standalone mode (on a CDH 5.3.3 
with spark 1.2 on YARN)

Not so good news is that the following params have no effect

--master yarn   --deployment-mode client

So the spark-SQL query runs with only ONE executor :-(

I am planning on bugging u for 5-10 minutes at the Spark office hours :-) and 
hopefully we can solve this. 

Thanks 
Best regards 
Sanjay 

Sent from my iPhone

 On Jun 13, 2015, at 5:38 PM, Josh Rosen rosenvi...@gmail.com wrote:
 
 Try using Spark 1.4.0 with SQL code generation turned on; this should make a 
 huge difference.
 
 On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com wrote:
 hey guys
 
 I tried the following settings as well. No luck
 
 --total-executor-cores 24 --executor-memory 4G
 
 
 BTW on the same cluster , impala absolutely kills it. same query 9 seconds. 
 no memory issues. no issues.
 
 In fact I am pretty disappointed with Spark-SQL.
 I have worked with Hive during the 0.9.x stages and taken projects to 
 production successfully and Hive actually very rarely craps out.
 
 Whether the spark folks like what I say or not, yes my expectations are 
 pretty high of Spark-SQL if I were to change the ways we are doing things at 
 my workplace.
 Until that time, we are going to be hugely dependent on Impala and  
 Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow 
 now).
 
 I want to clarify for those of u who may be asking - why I am not using 
 spark with Scala and insisting on using spark-sql ?
 
 - I have already pipelined data from enterprise tables to Hive
 - I am using CDH 5.3.3 (Cloudera starving developers version)
 - I have close to 300 tables defined in Hive external tables.
 - Data if on HDFS
 - On an average we have 150 columns per table
 - One an everyday basis , we do crazy amounts of ad-hoc joining of new and 
 old tables in getting datasets ready for supervised ML
 - I thought that quite simply I can point Spark to the Hive meta and do 
 queries as I do - in fact the existing queries would work as is unless I am 
 using some esoteric Hive/Impala function
 
 Anyway, if there are some settings I can use and get spark-sql to run even 
 on standalone mode that will be huge help.
 
 On the pre-production cluster I have spark on YARN but could never get it to 
 run fairly complex queries and I have no answers from this group of the CDH 
 groups.
 
 So my assumption is that its possibly not solved , else I have always got 
 very quick answers and responses :-) to my questions on all CDH groups, 
 Spark, Hive
 
 best regards
 
 sanjay
 
  
 
 From: Josh Rosen rosenvi...@gmail.com
 To: Sanjay Subramanian sanjaysubraman...@yahoo.com 
 Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Friday, June 12, 2015 7:15 AM
 Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space
 
 It sounds like this might be caused by a memory configuration problem.  In 
 addition to looking at the executor memory, I'd also bump up the driver 
 memory, since it appears that your shell is running out of memory when 
 collecting a large query result.
 
 Sent from my phone
 
 
 
 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark   
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have 
 any effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-13 Thread Sanjay Subramanian
hey guys
I tried the following settings as well. No luck
--total-executor-cores 24 --executor-memory 4G

BTW on the same cluster , impala absolutely kills it. same query 9 seconds. no 
memory issues. no issues.
In fact I am pretty disappointed with Spark-SQL.I have worked with Hive during 
the 0.9.x stages and taken projects to production successfully and Hive 
actually very rarely craps out.
Whether the spark folks like what I say or not, yes my expectations are pretty 
high of Spark-SQL if I were to change the ways we are doing things at my 
workplace.Until that time, we are going to be hugely dependent on Impala and  
Hive(with SSD speeding up the shuffle stage , even MR jobs are not that slow 
now).
I want to clarify for those of u who may be asking - why I am not using spark 
with Scala and insisting on using spark-sql ?
- I have already pipelined data from enterprise tables to Hive- I am using CDH 
5.3.3 (Cloudera starving developers version)- I have close to 300 tables 
defined in Hive external tables.
- Data if on HDFS- On an average we have 150 columns per table- One an everyday 
basis , we do crazy amounts of ad-hoc joining of new and old tables in getting 
datasets ready for supervised ML- I thought that quite simply I can point Spark 
to the Hive meta and do queries as I do - in fact the existing queries would 
work as is unless I am using some esoteric Hive/Impala function
Anyway, if there are some settings I can use and get spark-sql to run even on 
standalone mode that will be huge help.
On the pre-production cluster I have spark on YARN but could never get it to 
run fairly complex queries and I have no answers from this group of the CDH 
groups.
So my assumption is that its possibly not solved , else I have always got very 
quick answers and responses :-) to my questions on all CDH groups, Spark, Hive
best regards
sanjay
 
  From: Josh Rosen rosenvi...@gmail.com
 To: Sanjay Subramanian sanjaysubraman...@yahoo.com 
Cc: user@spark.apache.org user@spark.apache.org 
 Sent: Friday, June 12, 2015 7:15 AM
 Subject: Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space
   
It sounds like this might be caused by a memory configuration problem.  In 
addition to looking at the executor memory, I'd also bump up the driver memory, 
since it appears that your shell is running out of memory when collecting a 
large query result.

Sent from my phone


On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com.INVALID wrote:


hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node

| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

aers.aers_demo_view (7 million+ records)===isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-13 Thread Josh Rosen
Try using Spark 1.4.0 with SQL code generation turned on; this should make
a huge difference.

On Sat, Jun 13, 2015 at 5:08 PM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com wrote:

 hey guys

 I tried the following settings as well. No luck

 --total-executor-cores 24 --executor-memory 4G


 BTW on the same cluster , impala absolutely kills it. same query 9
 seconds. no memory issues. no issues.

 In fact I am pretty disappointed with Spark-SQL.
 I have worked with Hive during the 0.9.x stages and taken projects to
 production successfully and Hive actually very rarely craps out.

 Whether the spark folks like what I say or not, yes my expectations are
 pretty high of Spark-SQL if I were to change the ways we are doing things
 at my workplace.
 Until that time, we are going to be hugely dependent on Impala and
  Hive(with SSD speeding up the shuffle stage , even MR jobs are not that
 slow now).

 I want to clarify for those of u who may be asking - why I am not using
 spark with Scala and insisting on using spark-sql ?

 - I have already pipelined data from enterprise tables to Hive
 - I am using CDH 5.3.3 (Cloudera starving developers version)
 - I have close to 300 tables defined in Hive external tables.
 - Data if on HDFS
 - On an average we have 150 columns per table
 - One an everyday basis , we do crazy amounts of ad-hoc joining of new and
 old tables in getting datasets ready for supervised ML
 - I thought that quite simply I can point Spark to the Hive meta and do
 queries as I do - in fact the existing queries would work as is unless I am
 using some esoteric Hive/Impala function

 Anyway, if there are some settings I can use and get spark-sql to run even
 on standalone mode that will be huge help.

 On the pre-production cluster I have spark on YARN but could never get it
 to run fairly complex queries and I have no answers from this group of the
 CDH groups.

 So my assumption is that its possibly not solved , else I have always got
 very quick answers and responses :-) to my questions on all CDH groups,
 Spark, Hive

 best regards

 sanjay



   --
  *From:* Josh Rosen rosenvi...@gmail.com
 *To:* Sanjay Subramanian sanjaysubraman...@yahoo.com
 *Cc:* user@spark.apache.org user@spark.apache.org
 *Sent:* Friday, June 12, 2015 7:15 AM
 *Subject:* Re: spark-sql from CLI ---EXCEPTION:
 java.lang.OutOfMemoryError: Java heap space

 It sounds like this might be caused by a memory configuration problem.  In
 addition to looking at the executor memory, I'd also bump up the driver
 memory, since it appears that your shell is running out of memory when
 collecting a large query result.

 Sent from my phone



 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:

 hey guys

 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode

 Currently in my sandbox I am using the Spark (standalone mode) in the CDH
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node

 spark
 1.2.0+cdh5.3.3+371


 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is
 512M

 These options when used to start the spark-sql CLI does not seem to have
 any effect
 --total-executor-cores 12 --executor-memory 4G



 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int


 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`,
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-12 Thread Josh Rosen


Sent from my phone

 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark 
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have any 
 effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT
`aers_demo_v5`.`primaryid` AS `ISR`,
`aers_demo_v5`.`event_dt`,
`aers_demo_v5`.`age`,
`aers_demo_v5`.`age_cod`,
`aers_demo_v5`.`gndr_cod`,
`aers_demo_v5`.`year`,
`aers_demo_v5`.`quarter`
 FROM
   `aers`.`aers_demo_v5`
 UNION ALL
 SELECT
`aers_demo_v6`.`primaryid` AS `ISR`,
`aers_demo_v6`.`event_dt`,
`aers_demo_v6`.`age`,
`aers_demo_v6`.`age_cod`,
`aers_demo_v6`.`sex` AS `GNDR_COD`,
`aers_demo_v6`.`year`,
`aers_demo_v6`.`quarter`
 FROM
   `aers`.`aers_demo_v6`) `aers_demo_view`
 
 
 
 
 
 
 
 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
 user handler while handling an exception event ([id: 0x01b99855, 
 /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
 at 
 org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
 at 
 org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
 at 
 org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
 at 
 org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 15/06/11 08:36:40 ERROR Utils: Uncaught exception in thread 
 task-result-getter-0
 java.lang.OutOfMemoryError: GC

Re: spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-12 Thread Josh Rosen
It sounds like this might be caused by a memory configuration problem.  In 
addition to looking at the executor memory, I'd also bump up the driver memory, 
since it appears that your shell is running out of memory when collecting a 
large query result.

Sent from my phone

 On Jun 11, 2015, at 8:43 AM, Sanjay Subramanian 
 sanjaysubraman...@yahoo.com.INVALID wrote:
 
 hey guys
 
 Using Hive and Impala daily intensively.
 Want to transition to spark-sql in CLI mode
 
 Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
 distribution (starving developer version 5.3.3)
 3 datanode hadoop cluster
 32GB RAM per node
 8 cores per node
 
 spark 
 1.2.0+cdh5.3.3+371
 
 
 I am testing some stuff on one view and getting memory errors
 Possibly reason is default memory per executor showing on 18080 is 
 512M
 
 These options when used to start the spark-sql CLI does not seem to have any 
 effect 
 --total-executor-cores 12 --executor-memory 4G
 
 
 
 /opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
 isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view
 
 aers.aers_demo_view (7 million+ records)
 ===
 isr bigint  case id
 event_dtbigint  Event date
 age double  age of patient
 age_cod string  days,months years
 sex string  M or F
 yearint
 quarter int
 
 
 VIEW DEFINITION
 
 CREATE VIEW `aers.aers_demo_view` AS SELECT `isr` AS `isr`, `event_dt` AS 
 `event_dt`, `age` AS `age`, `age_cod` AS `age_cod`, `gndr_cod` AS `sex`, 
 `year` AS `year`, `quarter` AS `quarter` FROM (SELECT
`aers_demo_v1`.`isr`,
`aers_demo_v1`.`event_dt`,
`aers_demo_v1`.`age`,
`aers_demo_v1`.`age_cod`,
`aers_demo_v1`.`gndr_cod`,
`aers_demo_v1`.`year`,
`aers_demo_v1`.`quarter`
 FROM
   `aers`.`aers_demo_v1`
 UNION ALL
 SELECT
`aers_demo_v2`.`isr`,
`aers_demo_v2`.`event_dt`,
`aers_demo_v2`.`age`,
`aers_demo_v2`.`age_cod`,
`aers_demo_v2`.`gndr_cod`,
`aers_demo_v2`.`year`,
`aers_demo_v2`.`quarter`
 FROM
   `aers`.`aers_demo_v2`
 UNION ALL
 SELECT
`aers_demo_v3`.`isr`,
`aers_demo_v3`.`event_dt`,
`aers_demo_v3`.`age`,
`aers_demo_v3`.`age_cod`,
`aers_demo_v3`.`gndr_cod`,
`aers_demo_v3`.`year`,
`aers_demo_v3`.`quarter`
 FROM
   `aers`.`aers_demo_v3`
 UNION ALL
 SELECT
`aers_demo_v4`.`isr`,
`aers_demo_v4`.`event_dt`,
`aers_demo_v4`.`age`,
`aers_demo_v4`.`age_cod`,
`aers_demo_v4`.`gndr_cod`,
`aers_demo_v4`.`year`,
`aers_demo_v4`.`quarter`
 FROM
   `aers`.`aers_demo_v4`
 UNION ALL
 SELECT
`aers_demo_v5`.`primaryid` AS `ISR`,
`aers_demo_v5`.`event_dt`,
`aers_demo_v5`.`age`,
`aers_demo_v5`.`age_cod`,
`aers_demo_v5`.`gndr_cod`,
`aers_demo_v5`.`year`,
`aers_demo_v5`.`quarter`
 FROM
   `aers`.`aers_demo_v5`
 UNION ALL
 SELECT
`aers_demo_v6`.`primaryid` AS `ISR`,
`aers_demo_v6`.`event_dt`,
`aers_demo_v6`.`age`,
`aers_demo_v6`.`age_cod`,
`aers_demo_v6`.`sex` AS `GNDR_COD`,
`aers_demo_v6`.`year`,
`aers_demo_v6`.`quarter`
 FROM
   `aers`.`aers_demo_v6`) `aers_demo_view`
 
 
 
 
 
 
 
 15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
 user handler while handling an exception event ([id: 0x01b99855, 
 /10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
 Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 at 
 org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
 at 
 org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
 at 
 org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
 at 
 org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
 at 
 org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
 at 
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at 
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
 at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
 at 
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145

spark-sql from CLI ---EXCEPTION: java.lang.OutOfMemoryError: Java heap space

2015-06-11 Thread Sanjay Subramanian
hey guys
Using Hive and Impala daily intensively.Want to transition to spark-sql in CLI 
mode
Currently in my sandbox I am using the Spark (standalone mode) in the CDH 
distribution (starving developer version 5.3.3)
3 datanode hadoop cluster32GB RAM per node8 cores per node

| spark | 1.2.0+cdh5.3.3+371 |



I am testing some stuff on one view and getting memory errorsPossibly reason is 
default memory per executor showing on 18080 is 512M

These options when used to start the spark-sql CLI does not seem to have any 
effect --total-executor-cores 12 --executor-memory 4G



/opt/cloudera/parcels/CDH/lib/spark/bin/spark-sql -e  select distinct 
isr,event_dt,age,age_cod,sex,year,quarter from aers.aers_demo_view

aers.aers_demo_view (7 million+ records)===isr     bigint  case 
idevent_dt        bigint  Event dateage     double  age of patientage_cod 
string  days,months yearssex     string  M or Fyear    intquarter int

VIEW DEFINITIONCREATE VIEW `aers.aers_demo_view` AS SELECT 
`isr` AS `isr`, `event_dt` AS `event_dt`, `age` AS `age`, `age_cod` AS 
`age_cod`, `gndr_cod` AS `sex`, `year` AS `year`, `quarter` AS `quarter` FROM 
(SELECT   `aers_demo_v1`.`isr`,   `aers_demo_v1`.`event_dt`,   
`aers_demo_v1`.`age`,   `aers_demo_v1`.`age_cod`,   `aers_demo_v1`.`gndr_cod`,  
 `aers_demo_v1`.`year`,   `aers_demo_v1`.`quarter`FROM  
`aers`.`aers_demo_v1`UNION ALLSELECT   `aers_demo_v2`.`isr`,   
`aers_demo_v2`.`event_dt`,   `aers_demo_v2`.`age`,   `aers_demo_v2`.`age_cod`,  
 `aers_demo_v2`.`gndr_cod`,   `aers_demo_v2`.`year`,   
`aers_demo_v2`.`quarter`FROM  `aers`.`aers_demo_v2`UNION ALLSELECT   
`aers_demo_v3`.`isr`,   `aers_demo_v3`.`event_dt`,   `aers_demo_v3`.`age`,   
`aers_demo_v3`.`age_cod`,   `aers_demo_v3`.`gndr_cod`,   `aers_demo_v3`.`year`, 
  `aers_demo_v3`.`quarter`FROM  `aers`.`aers_demo_v3`UNION ALLSELECT   
`aers_demo_v4`.`isr`,   `aers_demo_v4`.`event_dt`,   `aers_demo_v4`.`age`,   
`aers_demo_v4`.`age_cod`,   `aers_demo_v4`.`gndr_cod`,   `aers_demo_v4`.`year`, 
  `aers_demo_v4`.`quarter`FROM  `aers`.`aers_demo_v4`UNION ALLSELECT   
`aers_demo_v5`.`primaryid` AS `ISR`,   `aers_demo_v5`.`event_dt`,   
`aers_demo_v5`.`age`,   `aers_demo_v5`.`age_cod`,   `aers_demo_v5`.`gndr_cod`,  
 `aers_demo_v5`.`year`,   `aers_demo_v5`.`quarter`FROM  
`aers`.`aers_demo_v5`UNION ALLSELECT   `aers_demo_v6`.`primaryid` AS `ISR`,   
`aers_demo_v6`.`event_dt`,   `aers_demo_v6`.`age`,   `aers_demo_v6`.`age_cod`,  
 `aers_demo_v6`.`sex` AS `GNDR_COD`,   `aers_demo_v6`.`year`,   
`aers_demo_v6`.`quarter`FROM  `aers`.`aers_demo_v6`) `aers_demo_view`






15/06/11 08:36:36 WARN DefaultChannelPipeline: An exception was thrown by a 
user handler while handling an exception event ([id: 0x01b99855, 
/10.0.0.19:58117 = /10.0.0.19:52016] EXCEPTION: java.lang.OutOfMemoryError: 
Java heap space)java.lang.OutOfMemoryError: Java heap space        at 
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)      
  at 
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChannelBuffer.java:34)
        at 
org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)        at 
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
        at 
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.newCumulationBuffer(FrameDecoder.java:507)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.updateCumulation(FrameDecoder.java:345)
        at 
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:312)
        at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)        
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)      
  at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)       
 at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
        at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:745)15/06/11 08:36:40 ERROR Utils: 
Uncaught exception in thread task-result-getter-0java.lang.OutOfMemoryError: GC 
overhead limit exceeded        at java.lang.Long.valueOf(Long.java:577)        
at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:113)
        at 
com.esotericsoftware.kryo.serializers.DefaultSerializers$LongSerializer.read(DefaultSerializers.java:103

MLLib SVMWithSGD : java.lang.OutOfMemoryError: Java heap space

2015-04-16 Thread sarath
Hi,

I'm trying to train an SVM on KDD2010 dataset (available from libsvm). But
I'm getting java.lang.OutOfMemoryError: Java heap space error. The dataset
is really sparse and have around 8 million data points and 20 million
features. I'm using a cluster of 8 nodes (each with 8 cores and 64G RAM). 

I have used both Spark's SVMWithSGD and Liblinear's Spark implementation and
I'm getting java.lang.OutOfMemoryError: Java heap space error for both.

I have used following settings:
executor-memory - 60G
num-executors - 64
And other default settings

Also I tried increasing the number of partitions. And tried with reduced
dataset of half million data points. But I'm still getting the same error.

Here is the stack trace for Spark's SVMWithSGD:

Exception in thread main java.lang.OutOfMemoryError: Java heap space  


at
org.apache.spark.mllib.optimization.GradientDescent$.runMiniBatchSGD(GradientDescent.scala:182)
at
org.apache.spark.mllib.optimization.GradientDescent.optimize(GradientDescent.scala:107)
at
org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:263)
at
org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:190)
at 
org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:201)
at 
org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:235)
at org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala)
at org.linearsvm.SVMClassifier.main(SVMClassifier.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




And the stack trace for Liblinear's Spark implementation :

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at
org.apache.spark.network.BlockTransferService$$anon$1.onBlockFetchSuccess(BlockTransferService.scala:95)
at
org.apache.spark.network.shuffle.RetryingBlockFetcher$RetryingBlockFetchListener.onBlockFetchSuccess(RetryingBlockFetcher.java:206)
at
org.apache.spark.network.shuffle.OneForOneBlockFetcher$ChunkCallback.onSuccess(OneForOneBlockFetcher.java:72)
at
org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:124)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:93)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
at
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354

Re: MLLib SVMWithSGD : java.lang.OutOfMemoryError: Java heap space

2015-04-16 Thread Akhil Das
Try increasing your driver memory.

Thanks
Best Regards

On Thu, Apr 16, 2015 at 6:09 PM, sarath sarathkrishn...@gmail.com wrote:

 Hi,

 I'm trying to train an SVM on KDD2010 dataset (available from libsvm). But
 I'm getting java.lang.OutOfMemoryError: Java heap space error. The
 dataset
 is really sparse and have around 8 million data points and 20 million
 features. I'm using a cluster of 8 nodes (each with 8 cores and 64G RAM).

 I have used both Spark's SVMWithSGD and Liblinear's Spark implementation
 and
 I'm getting java.lang.OutOfMemoryError: Java heap space error for both.

 I have used following settings:
 executor-memory - 60G
 num-executors - 64
 And other default settings

 Also I tried increasing the number of partitions. And tried with reduced
 dataset of half million data points. But I'm still getting the same error.

 Here is the stack trace for Spark's SVMWithSGD:

 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at

 org.apache.spark.mllib.optimization.GradientDescent$.runMiniBatchSGD(GradientDescent.scala:182)
 at

 org.apache.spark.mllib.optimization.GradientDescent.optimize(GradientDescent.scala:107)
 at

 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:263)
 at

 org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(GeneralizedLinearAlgorithm.scala:190)
 at
 org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:201)
 at
 org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:235)
 at
 org.apache.spark.mllib.classification.SVMWithSGD.train(SVM.scala)
 at org.linearsvm.SVMClassifier.main(SVMClassifier.java:39)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:622)
 at

 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
 at
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
 at
 org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)




 And the stack trace for Liblinear's Spark implementation :

 java.lang.OutOfMemoryError: Java heap space
 at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
 at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
 at

 org.apache.spark.network.BlockTransferService$$anon$1.onBlockFetchSuccess(BlockTransferService.scala:95)
 at

 org.apache.spark.network.shuffle.RetryingBlockFetcher$RetryingBlockFetchListener.onBlockFetchSuccess(RetryingBlockFetcher.java:206)
 at

 org.apache.spark.network.shuffle.OneForOneBlockFetcher$ChunkCallback.onSuccess(OneForOneBlockFetcher.java:72)
 at

 org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:124)
 at

 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:93)
 at

 org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
 at

 io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at

 io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at

 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
 at

 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
 at

 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
 at

 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
 at

 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
 at
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at

 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at

 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382

Re: MLLib /ALS : java.lang.OutOfMemoryError: Java heap space

2014-12-18 Thread Xiangrui Meng
Hi Jay,

Please try increasing executor memory (if the available memory is more
than 2GB) and reduce numBlocks in ALS. The current implementation
stores all subproblems in memory and hence the memory requirement is
significant when k is large. You can also try reducing k and see
whether the problem is still there. I made a PR that improves the ALS
implementation, which generates subproblems one by one. You can try
that as well.

https://github.com/apache/spark/pull/3720

Best,
Xiangrui

On Wed, Dec 17, 2014 at 6:57 PM, buring qyqb...@gmail.com wrote:
 I am not sure this can help you. I have 57 million rating,about 4million user
 and 4k items. I used 7-14 total-executor-cores,executal-memory 13g,cluster
 have 4 nodes,each have 4cores,max memory 16g.
 I found set as follows may help avoid this problem:
 conf.set(spark.shuffle.memoryFraction,0.65) //default is 0.2
 conf.set(spark.storage.memoryFraction,0.3)//default is 0.6
 I have to set rank value under 40, otherwise occure this problem.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-java-lang-OutOfMemoryError-Java-heap-space-tp20584p20755.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib /ALS : java.lang.OutOfMemoryError: Java heap space

2014-12-17 Thread buring
I am not sure this can help you. I have 57 million rating,about 4million user
and 4k items. I used 7-14 total-executor-cores,executal-memory 13g,cluster
have 4 nodes,each have 4cores,max memory 16g. 
I found set as follows may help avoid this problem:
conf.set(spark.shuffle.memoryFraction,0.65) //default is 0.2 
conf.set(spark.storage.memoryFraction,0.3)//default is 0.6
I have to set rank value under 40, otherwise occure this problem.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-java-lang-OutOfMemoryError-Java-heap-space-tp20584p20755.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: MLLib /ALS : java.lang.OutOfMemoryError: Java heap space

2014-12-16 Thread Gen
Hi,How many clients and how many products do you have?CheersGen
jaykatukuri wrote
 Hi all,I am running into an out of memory error while running ALS using
 MLLIB on a reasonably small data set consisting of around 6 Million
 ratings.The stack trace is below:java.lang.OutOfMemoryError: Java heap
 space at org.jblas.DoubleMatrix.

 (DoubleMatrix.java:323)   at
 org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)   at
 org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)   at
 org.apache.spark.mllib.recommendation.ALS$$anonfun$21.apply(ALS.scala:465)
 at
 org.apache.spark.mllib.recommendation.ALS$$anonfun$21.apply(ALS.scala:465)
 at scala.Array$.fill(Array.scala:267) at
 org.apache.spark.mllib.recommendation.ALS.org$apache$spark$mllib$recommendation$ALS$$updateBlock(ALS.scala:465)
 at
 org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:445)
 at
 org.apache.spark.mllib.recommendation.ALS$$anonfun$org$apache$spark$mllib$recommendation$ALS$$updateFeatures$2.apply(ALS.scala:444)
 at
 org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
 at
 org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:31)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)at
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:156)
 at
 org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
 org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:154) at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)   at
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)  at
 org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
 at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)   at
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)  at
 org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)at
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)  at
 org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)at
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)   at
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)  at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
 at
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
 at org.apache.spark.scheduler.Task.run(Task.scala:51) at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)I am
 using 2GB for executors memory.  I tried with 100 executors.Can some one
 please point me in the right direction ?Thanks,Jay





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-java-lang-OutOfMemoryError-Java-heap-space-tp20584p20714.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: MLLib /ALS : java.lang.OutOfMemoryError: Java heap space

2014-12-10 Thread happyyxw
How many working nodes do these 100 executors locate at? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-java-lang-OutOfMemoryError-Java-heap-space-tp20584p20610.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.lang.OutOfMemoryError: Java heap space during reduce operation

2014-10-20 Thread ayandas84
Hi,

*In a reduce operation I am trying to accumulate a list of SparseVectors.
The code is given below;*
val WNode = trainingData.reduce{(node1:Node,node2:Node) =
  val wNode = new Node(num1,num2)
  wNode.WhatList ++= (node1.WList)
  wNode.WList ++= (node2.WList)
  wNode
}

where Whatlist is a list of SparseVectors. The average size of a
SparseVector is 21000 and the approximate number of 
elements in the final list at the end of the reduce operation varies between
20 to 100.

*However, at run time I am getting the following error messages from some of
the executor machines.*
14/10/20 22:38:41 INFO BlockManagerInfo: Added taskresult_30 in memory on
cse-hadoop-113:34602 (size: 789.0 MB, free: 22.2 GB)
14/10/20 22:38:41 INFO TaskSetManager: Starting task 1.0:12 as TID 34 on
executor 6: cse-hadoop-113 (PROCESS_LOCAL)
14/10/20 22:38:41 INFO TaskSetManager: Serialized task 1.0:12 as 2170 bytes
in 2 ms
14/10/20 22:38:41 INFO SendingConnection: Initiating connection to
[cse-hadoop-113/192.168.0.113:34602]
14/10/20 22:38:41 INFO SendingConnection: Connected to
[cse-hadoop-113/192.168.0.113:34602], 1 messages pending
14/10/20 22:38:41 INFO ConnectionManager: Accepted connection from
[cse-hadoop-113/192.168.0.113]
Exception in thread pool-5-thread-3 java.lang.OutOfMemoryError: Java heap
space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.spark.network.Message$.create(Message.scala:88)
at
org.apache.spark.network.ReceivingConnection$Inbox.org$apache$spark$network$ReceivingConnection$Inbox$$createNewMessage$1(Connection.scala:438)
at
org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448)
at
org.apache.spark.network.ReceivingConnection$Inbox$$anonfun$1.apply(Connection.scala:448)
at
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at
org.apache.spark.network.ReceivingConnection$Inbox.getChunk(Connection.scala:448)
at 
org.apache.spark.network.ReceivingConnection.read(Connection.scala:525)
at
org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:176)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
*Please help.*



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Java-heap-space-during-reduce-operation-tp16835.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



java.lang.OutOfMemoryError: Java heap space when running job via spark-submit

2014-10-09 Thread Jaonary Rabarisoa
Dear all,

I have a spark job with the following configuration

*val conf = new SparkConf()*
* .setAppName(My Job)*
* .set(spark.serializer,
org.apache.spark.serializer.KryoSerializer)*
* .set(spark.kryo.registrator, value.serializer.Registrator)*
* .setMaster(local[4])*
* .set(spark.executor.memory, 4g)*


that I can run manually with sbt run without any problem.

But, I try to run the same job with spark-submit

*./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \*
* --class value.jobs.MyJob \*
* --master local[4] \*
* --conf spark.executor.memory=4g \*
* --conf spark.driver.memory=2g \*
* target/scala-2.10/my-job_2.10-1.0.jar*


I get the following error :

*Exception in thread stdin writer for List(patch_matching_similarity)
java.lang.OutOfMemoryError: Java heap space*
* at java.util.Arrays.copyOf(Arrays.java:2271)*
* at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)*
* at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)*
* at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)*
* at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)*
* at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)*
* at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)*
* at com.esotericsoftware.krput.writeString_slow(Output.java:420)*
* at com.esotericsoftware.kryo.io.Output.writeString(Output.java:326)*
* at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)*
* at
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)*
* at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)*
* at
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570)*
* at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)*
* at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)*
* at
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119)*
* at
org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110)*
* at
org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1047)*
* at
org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1056)*
* at org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:93)*
* at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:745)*
* at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)*
* at
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)*
* at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)*
* at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)*
* at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)*
* at
org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:75)*
* at
org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:74)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
yo.io.Output.require(Output.java:135)
at com.esotericsoftware.kryo.io.*Out*


I don't understand why since I set the same amount of memory in the two
cases.

Any ideas will be helpfull. I use spark 1.1.0.

Cheers,

Jao


Re: java.lang.OutOfMemoryError: Java heap space when running job via spark-submit

2014-10-09 Thread Jaonary Rabarisoa
in fact with --driver-memory 2G I can get it working

On Thu, Oct 9, 2014 at 6:20 PM, Xiangrui Meng men...@gmail.com wrote:

 Please use --driver-memory 2g instead of --conf
 spark.driver.memory=2g. I'm not sure whether this is a bug. -Xiangrui

 On Thu, Oct 9, 2014 at 9:00 AM, Jaonary Rabarisoa jaon...@gmail.com
 wrote:
  Dear all,
 
  I have a spark job with the following configuration
 
  val conf = new SparkConf()
   .setAppName(My Job)
   .set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer)
   .set(spark.kryo.registrator, value.serializer.Registrator)
   .setMaster(local[4])
   .set(spark.executor.memory, 4g)
 
 
  that I can run manually with sbt run without any problem.
 
  But, I try to run the same job with spark-submit
 
  ./spark-1.1.0-bin-hadoop2.4/bin/spark-submit \
   --class value.jobs.MyJob \
   --master local[4] \
   --conf spark.executor.memory=4g \
   --conf spark.driver.memory=2g \
   target/scala-2.10/my-job_2.10-1.0.jar
 
 
  I get the following error :
 
  Exception in thread stdin writer for List(patch_matching_similarity)
  java.lang.OutOfMemoryError: Java heap space
  at java.util.Arrays.copyOf(Arrays.java:2271)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
  at
 
 java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
  at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
  at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
  at com.esotericsoftware.krput.writeString_slow(Output.java:420)
  at com.esotericsoftware.kryo.io.Output.writeString(Output.java:326)
  at
 
 com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:153)
  at
 
 com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:146)
  at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
  at
 
 com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:570)
  at
 
 com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213)
  at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
  at
 
 org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119)
  at
 
 org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110)
  at
 
 org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1047)
  at
 
 org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1056)
  at org.apache.spark.storage.MemoryStore.putArray(MemoryStore.scala:93)
  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:745)
  at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
  at
 org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
  at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
  at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
  at
 
 org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:75)
  at
 
 org.apache.spark.rdd.CartesianRDD$$anonfun$compute$1.apply(CartesianRDD.scala:74)
  at
 
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)yo.io.Output.require(Output.java:135)
  at com.esotericsoftware.kryo.io.Out
 
 
  I don't understand why since I set the same amount of memory in the two
  cases.
 
  Any ideas will be helpfull. I use spark 1.1.0.
 
  Cheers,
 
  Jao



Why I get java.lang.OutOfMemoryError: Java heap space with join ?

2014-09-12 Thread Jaonary Rabarisoa
Dear all,


I'm facing the following problem and I can't figure how to solve it.

I need to join 2 rdd in order to find their intersections. The first RDD
represent an image encoded in base64 string associated with image id. The
second RDD represent a set of geometric primitives (rectangle) associated
with image id. My goal is to draw these primitives on the corresponding
image. So my first attempt is to join images and primitives by image ids
and then do the drawing.

But, when I do

*primitives.join(images) *


I got the following error :

*java.lang.OutOfMemoryError: Java heap space*
* at java.util.Arrays.copyOf(Arrays.java:2367)*
* at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)*
* at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)*
* at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)*
* at java.lang.StringBuilder.append(StringBuilder.java:204)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3143)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3051)*
* at
java.io.ObjectInputStream$BlockDataInputStream.readLongUTF(ObjectInputStream.java:3034)*
* at java.io.ObjectInputStream.readString(ObjectInputStream.java:1642)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)*
* at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)*
* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)*
* at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)*
* at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)*
* at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)*
* at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)*
* at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)*
* at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)*
* at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)*
* at
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)*
* at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)*
* at
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
* at
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)*
* at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)*
* at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)*
* at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)*
* at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)*
* at scala.collection.Iterator$class.foreach(Iterator.scala:727)*
* at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)*
* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)*

I notice that sometime if I change the partition of the images RDD with
coalesce I can get it working.

What I'm doing wrong ?

Cheers,

Jaonary


java.lang.OutOfMemoryError: Java heap space

2014-07-31 Thread Sameer Tilak
Hi everyone,I have the following configuration. I am currently running my app 
in local mode.
  val conf = new 
SparkConf().setMaster(local[2]).setAppName(ApproxStrMatch).set(spark.executor.memory,
 3g).set(spark.storage.memoryFraction, 0.1)
I am getting the following error. I tried setting up spark.executor.memory and 
memory fraction setting, however my UI does not show the increase and I still 
get these errors. I am loading a TSV file from HDFS (around 5 GB). Does this 
mean, I should update these settings and add more memory or is it somethign 
else? Spark master has 24 GB physical memory and workers have 16 GB, but we are 
running other services (CDH 5.1) on these nodes as well. 
14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 
2 non-empty blocks out of 2 blocks14/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out 
of 2 blocks14/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 6 
ms14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
Started 0 remote fetches in 6 ms14/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, 
targetRequestSize: 1006632914/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, 
targetRequestSize: 1006632914/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out 
of 2 blocks14/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out 
of 2 blocks14/07/31 09:48:09 INFO 
BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 1 
ms14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
Started 0 remote fetches in 1 ms14/07/31 09:48:17 ERROR Executor: Exception in 
task ID 5java.lang.OutOfMemoryError: Java heap space at 
java.util.Arrays.copyOf(Arrays.java:2271)at 
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 ERROR 
ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor 
task launch worker-3,5,main]java.lang.OutOfMemoryError: Java heap space  at 
java.util.Arrays.copyOf(Arrays.java:2271)at 
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 WARN 
TaskSetManager: Lost TID 5 (task 1.0:0)14/07/31 09:48:17 WARN TaskSetManager: 
Loss was due to java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError: Java heap 
space  at java.util.Arrays.copyOf(Arrays.java:2271)at 
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
 at java.lang.Thread.run(Thread.java:744)14/07/31 09:48:17 ERROR 
TaskSetManager: Task 1.0:0 failed 1 times; aborting job14/07/31 09:48:17 INFO 
TaskSchedulerImpl: Cancelling stage 114/07/31 09:48:17 INFO DAGScheduler: 
Failed to run collect at ComputeScores.scala:7614/07/31 09:48:17 INFO Executor: 
Executor is trying to kill task 614/07/31 09:48:17 INFO TaskSchedulerImpl: 
Stage 1 was cancelled  

Re: java.lang.OutOfMemoryError: Java heap space

2014-07-31 Thread Haiyang Fu
Hi,
here are two tips for you,
1. increase the parallism level
2.increase the driver memory


On Fri, Aug 1, 2014 at 12:58 AM, Sameer Tilak ssti...@live.com wrote:

 Hi everyone,
 I have the following configuration. I am currently running my app in local
 mode.

   val conf = new
 SparkConf().setMaster(local[2]).setAppName(ApproxStrMatch).set(spark.executor.memory,
 3g).set(spark.storage.memoryFraction, 0.1)

 I am getting the following error. I tried setting up spark.executor.memory
 and memory fraction setting, however my UI does not show the increase and I
 still get these errors. I am loading a TSV file from HDFS (around 5 GB).
 Does this mean, I should update these settings and add more memory or is it
 somethign else? Spark master has 24 GB physical memory and workers have 16
 GB, but we are running other services (CDH 5.1) on these nodes as well.

 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Getting 2 non-empty blocks out of 2 blocks
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Getting 2 non-empty blocks out of 2 blocks
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Started 0 remote fetches in 6 ms
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Started 0 remote fetches in 6 ms
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 maxBytesInFlight: 50331648, targetRequestSize: 10066329
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 maxBytesInFlight: 50331648, targetRequestSize: 10066329
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Getting 2 non-empty blocks out of 2 blocks
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Getting 2 non-empty blocks out of 2 blocks
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Started 0 remote fetches in 1 ms
 14/07/31 09:48:09 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
 Started 0 remote fetches in 1 ms
 14/07/31 09:48:17 ERROR Executor: Exception in task ID 5
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at
 java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
 at
 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 14/07/31 09:48:17 ERROR ExecutorUncaughtExceptionHandler: Uncaught
 exception in thread Thread[Executor task launch worker-3,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at
 java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
 at
 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 14/07/31 09:48:17 WARN TaskSetManager: Lost TID 5 (task 1.0:0)
 14/07/31 09:48:17 WARN TaskSetManager: Loss was due to
 java.lang.OutOfMemoryError
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2271)
 at
 java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
 at
 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 14/07/31 09:48:17 ERROR TaskSetManager: Task 1.0:0 failed 1 times;
 aborting job
 14/07/31 09:48:17 INFO TaskSchedulerImpl: Cancelling stage 1
 14/07/31 09:48:17 INFO DAGScheduler: Failed to run collect at
 ComputeScores.scala:76
 14/07/31 09:48:17 INFO Executor: Executor is trying to kill task 6
 14/07/31 09:48:17 INFO TaskSchedulerImpl: Stage 1 was cancelled



Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim
Hi,

When running a Spark job, the following warning message displays and the job
seems no longer progressing.
(Detailed log message are at the bottom of this message.)

---
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
---

The specifics of the job is as follows:

- It reads 168016 files on the HDFS, by calling
sc.textFile(hdfs://cluster01/user/data/*/*/*.csv)
- The total size of the files is 164,111,123,686 bytes. (164GB)
- The max size of the files is 8,546,230 bytes. (8.5MB)
- The min size of the files is 3,920 bytes. (3KB)
- The command line options are: --master yarn-client --executor-memory 14G
--executor-cores 7 --num-executors 3

The WARN occurred when processing reduceByKey().
On Spark Stage web page, the number of tasks for reduceByKey() was 168016.
(same as the number of the files)
The WARN occurred when reduceByKey() was progressing about 10%. (not exact)

Actual method call sequence is:
sc.textFile(...).filter().mapToPair().reduceByKey().map().saveAsTextFile().

How can I fix this?

Detailed log messages are as follows: (IP and hostname was replaced.)

14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
14/07/02 17:00:16 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(driver, DriverHostName, 63548, 0) with no recent heart
beats: 63641ms exceeds 45000ms
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
14/07/02 17:00:14 ERROR Utils: Uncaught exception in thread Result resolver
thread-2
14/07/02 17:00:37 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space
at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178

RE: Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim
Also, the machine on which the driver program runs constantly uses about
7~8% of 100Mbps network connection.
Is the driver program involved in the reduceByKey() somehow?
BTW, currently an accumulator is used, but the network usage does not drop
even when accumulator is removed.

Thanks in advance.


-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] 
Sent: Wednesday, July 02, 2014 5:58 PM
To: user@spark.apache.org
Subject: Help: WARN AbstractNioSelector: Unexpected exception in the
selector loop. java.lang.OutOfMemoryError: Java heap space

Hi,

When running a Spark job, the following warning message displays and the job
seems no longer progressing.
(Detailed log message are at the bottom of this message.)

---
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
---

The specifics of the job is as follows:

- It reads 168016 files on the HDFS, by calling
sc.textFile(hdfs://cluster01/user/data/*/*/*.csv)
- The total size of the files is 164,111,123,686 bytes. (164GB)
- The max size of the files is 8,546,230 bytes. (8.5MB)
- The min size of the files is 3,920 bytes. (3KB)
- The command line options are: --master yarn-client --executor-memory 14G
--executor-cores 7 --num-executors 3

The WARN occurred when processing reduceByKey().
On Spark Stage web page, the number of tasks for reduceByKey() was 168016.
(same as the number of the files)
The WARN occurred when reduceByKey() was progressing about 10%. (not exact)

Actual method call sequence is:
sc.textFile(...).filter().mapToPair().reduceByKey().map().saveAsTextFile().

How can I fix this?

Detailed log messages are as follows: (IP and hostname was replaced.)

14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
14/07/02 17:00:16 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(driver, DriverHostName, 63548, 0) with no recent heart
beats: 63641ms exceeds 45000ms
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
14/07/02 17:00:14 ERROR Utils: Uncaught exception in thread Result resolver
thread-2
14/07/02 17:00:37 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer

RE: Help: WARN AbstractNioSelector: Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space

2014-07-02 Thread innowireless TaeYun Kim
It seems that the driver program gets out of memory.
In Windows Task Manager, the driver program's memory constantly grows until
around 3,434,796, then java OutOfMemory exception occurs.
(BTW, the driver program runs on Windows 7 64bit machine, and cluster is on
CentOS.)

Why the memory of the driver program constantly grows?

-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr] 
Sent: Wednesday, July 02, 2014 6:05 PM
To: user@spark.apache.org
Subject: RE: Help: WARN AbstractNioSelector: Unexpected exception in the
selector loop. java.lang.OutOfMemoryError: Java heap space

Also, the machine on which the driver program runs constantly uses about
7~8% of 100Mbps network connection.
Is the driver program involved in the reduceByKey() somehow?
BTW, currently an accumulator is used, but the network usage does not drop
even when accumulator is removed.

Thanks in advance.


-Original Message-
From: innowireless TaeYun Kim [mailto:taeyun@innowireless.co.kr]
Sent: Wednesday, July 02, 2014 5:58 PM
To: user@spark.apache.org
Subject: Help: WARN AbstractNioSelector: Unexpected exception in the
selector loop. java.lang.OutOfMemoryError: Java heap space

Hi,

When running a Spark job, the following warning message displays and the job
seems no longer progressing.
(Detailed log message are at the bottom of this message.)

---
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
---

The specifics of the job is as follows:

- It reads 168016 files on the HDFS, by calling
sc.textFile(hdfs://cluster01/user/data/*/*/*.csv)
- The total size of the files is 164,111,123,686 bytes. (164GB)
- The max size of the files is 8,546,230 bytes. (8.5MB)
- The min size of the files is 3,920 bytes. (3KB)
- The command line options are: --master yarn-client --executor-memory 14G
--executor-cores 7 --num-executors 3

The WARN occurred when processing reduceByKey().
On Spark Stage web page, the number of tasks for reduceByKey() was 168016.
(same as the number of the files)
The WARN occurred when reduceByKey() was progressing about 10%. (not exact)

Actual method call sequence is:
sc.textFile(...).filter().mapToPair().reduceByKey().map().saveAsTextFile().

How can I fix this?

Detailed log messages are as follows: (IP and hostname was replaced.)

14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
at java.lang.Thread.run(Thread.java:745)
14/07/02 17:00:16 WARN BlockManagerMasterActor: Removing BlockManager
BlockManagerId(driver, DriverHostName, 63548, 0) with no recent heart
beats: 63641ms exceeds 45000ms
14/07/02 17:00:14 WARN AbstractNioSelector: Unexpected exception in the
selector loop.
java.lang.OutOfMemoryError: Java heap space at
org.jboss.netty.buffer.HeapChannelBuffer.init(HeapChannelBuffer.java:42)
at
org.jboss.netty.buffer.BigEndianHeapChannelBuffer.init(BigEndianHeapChanne
lBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at
org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferF
actory.java:68)
at
org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChanne
lBufferFactory.java:48)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:80)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWork
er.java:109)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelect
or.java:312)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.j
ava:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
Le 20 juin 2014 01:46, Shivani Rao raoshiv...@gmail.com a écrit :

 Hello Andrew,

 i wish I could share the code, but for proprietary reasons I can't. But I
can give some idea though of what i am trying to do. The job reads a file
and for each line of that file and processors these lines. I am not doing
anything intense in the processLogs function

 import argonaut._
 import argonaut.Argonaut._


 /* all of these case classes are created from json strings extracted from
the line in the processLogs() function
 *
 */
 case class struct1…
 case class struct2…
 case class value1(struct1, struct2)

 def processLogs(line:String): Option[(key1, value1)] {…
 }

 def run(sparkMaster, appName, executorMemory, jarsPath) {
   val sparkConf = new SparkConf()
sparkConf.setMaster(sparkMaster)
sparkConf.setAppName(appName)
sparkConf.set(spark.executor.memory, executorMemory)
 sparkConf.setJars(jarsPath) // This includes all the jars relevant
jars..
val sc = new SparkContext(sparkConf)
   val rawLogs = sc.textFile(hdfs://my-hadoop-namenode:8020:myfile.txt)

rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)

rawLogs.flatMap(processLogs).saveAsTextFile(hdfs://my-hadoop-namenode:8020:outfile.txt)
 }

 If I switch to local mode, the code runs just fine, it fails with the
error I pasted above. In the cluster mode, even writing back the file we
just read fails
(rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)

 I still believe this is a classNotFound error in disguise


Indeed you are right, this can be the reason. I had similar errors when
defining case classes in the shell and trying to use them in the RDDs. Are
you shading argonaut in the fat jar ?

 Thanks
 Shivani



 On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash and...@andrewash.com wrote:

 Wait, so the file only has four lines and the job running out of heap
space?  Can you share the code you're running that does the processing?
 I'd guess that you're doing some intense processing on every line but just
writing parsed case classes back to disk sounds very lightweight.

 I


 On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com
wrote:

 I am trying to process a file that contains 4 log lines (not very long)
and then write my parsed out case classes to a destination folder, and I
get the following error:


 java.lang.OutOfMemoryError: Java heap space

 at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)

 at
org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)

 at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)

 at
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)

 at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

 at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

 at
org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)

 at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)


 Sadly, there are several folks that have faced this error while trying
to execute Spark jobs and there are various solutions, none

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Shivani Rao
Hello Abhi, I did try that and it did not work

And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So
how did you overcome this problem?

Shivani


On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi cepoi.eu...@gmail.com wrote:


 Le 20 juin 2014 01:46, Shivani Rao raoshiv...@gmail.com a écrit :

 
  Hello Andrew,
 
  i wish I could share the code, but for proprietary reasons I can't. But
 I can give some idea though of what i am trying to do. The job reads a file
 and for each line of that file and processors these lines. I am not doing
 anything intense in the processLogs function
 
  import argonaut._
  import argonaut.Argonaut._
 
 
  /* all of these case classes are created from json strings extracted
 from the line in the processLogs() function
  *
  */
  case class struct1…
  case class struct2…
  case class value1(struct1, struct2)
 
  def processLogs(line:String): Option[(key1, value1)] {…
  }
 
  def run(sparkMaster, appName, executorMemory, jarsPath) {
val sparkConf = new SparkConf()
 sparkConf.setMaster(sparkMaster)
 sparkConf.setAppName(appName)
 sparkConf.set(spark.executor.memory, executorMemory)
  sparkConf.setJars(jarsPath) // This includes all the jars relevant
 jars..
 val sc = new SparkContext(sparkConf)
val rawLogs = sc.textFile(hdfs://my-hadoop-namenode:8020:myfile.txt)
 
 rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
 rawLogs.flatMap(processLogs).saveAsTextFile(hdfs://my-hadoop-namenode:8020:outfile.txt)
  }
 
  If I switch to local mode, the code runs just fine, it fails with the
 error I pasted above. In the cluster mode, even writing back the file we
 just read fails
 (rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
  I still believe this is a classNotFound error in disguise
 

 Indeed you are right, this can be the reason. I had similar errors when
 defining case classes in the shell and trying to use them in the RDDs. Are
 you shading argonaut in the fat jar ?

  Thanks
  Shivani
 
 
 
  On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash and...@andrewash.com
 wrote:
 
  Wait, so the file only has four lines and the job running out of heap
 space?  Can you share the code you're running that does the processing?
  I'd guess that you're doing some intense processing on every line but just
 writing parsed case classes back to disk sounds very lightweight.
 
  I
 
 
  On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com
 wrote:
 
  I am trying to process a file that contains 4 log lines (not very
 long) and then write my parsed out case classes to a destination folder,
 and I get the following error:
 
 
  java.lang.OutOfMemoryError: Java heap space
 
  at
 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
 
  at
 org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
 
  at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
 
  at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
 
  at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
 
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 
  at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
 
  at
 org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
 
  at
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
 
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 
  at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
In my case it was due to a case class I was defining in the spark-shell and
not being available on the workers. So packaging it in a jar and adding it
with ADD_JARS solved the problem. Note that I don't exactly remember if it
was an out of heap space exception or pergmen space. Make sure your
jarsPath is correct.

Usually to debug this kind of problems I am using the spark-shell (you can
do the same in your job but its more time constuming to repackage, deploy,
run, iterate). Try for example
1) read the lines (without any processing) and count them
2) apply processing and count



2014-06-20 17:15 GMT+02:00 Shivani Rao raoshiv...@gmail.com:

 Hello Abhi, I did try that and it did not work

 And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So
 how did you overcome this problem?

 Shivani


 On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi cepoi.eu...@gmail.com
 wrote:


 Le 20 juin 2014 01:46, Shivani Rao raoshiv...@gmail.com a écrit :

 
  Hello Andrew,
 
  i wish I could share the code, but for proprietary reasons I can't. But
 I can give some idea though of what i am trying to do. The job reads a file
 and for each line of that file and processors these lines. I am not doing
 anything intense in the processLogs function
 
  import argonaut._
  import argonaut.Argonaut._
 
 
  /* all of these case classes are created from json strings extracted
 from the line in the processLogs() function
  *
  */
  case class struct1…
  case class struct2…
  case class value1(struct1, struct2)
 
  def processLogs(line:String): Option[(key1, value1)] {…
  }
 
  def run(sparkMaster, appName, executorMemory, jarsPath) {
val sparkConf = new SparkConf()
 sparkConf.setMaster(sparkMaster)
 sparkConf.setAppName(appName)
 sparkConf.set(spark.executor.memory, executorMemory)
  sparkConf.setJars(jarsPath) // This includes all the jars relevant
 jars..
 val sc = new SparkContext(sparkConf)
val rawLogs =
 sc.textFile(hdfs://my-hadoop-namenode:8020:myfile.txt)
 
 rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
 rawLogs.flatMap(processLogs).saveAsTextFile(hdfs://my-hadoop-namenode:8020:outfile.txt)
  }
 
  If I switch to local mode, the code runs just fine, it fails with the
 error I pasted above. In the cluster mode, even writing back the file we
 just read fails
 (rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
  I still believe this is a classNotFound error in disguise
 

 Indeed you are right, this can be the reason. I had similar errors when
 defining case classes in the shell and trying to use them in the RDDs. Are
 you shading argonaut in the fat jar ?

  Thanks
  Shivani
 
 
 
  On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash and...@andrewash.com
 wrote:
 
  Wait, so the file only has four lines and the job running out of heap
 space?  Can you share the code you're running that does the processing?
  I'd guess that you're doing some intense processing on every line but just
 writing parsed case classes back to disk sounds very lightweight.
 
  I
 
 
  On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com
 wrote:
 
  I am trying to process a file that contains 4 log lines (not very
 long) and then write my parsed out case classes to a destination folder,
 and I get the following error:
 
 
  java.lang.OutOfMemoryError: Java heap space
 
  at
 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
 
  at
 org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
 
  at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
 
  at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
 
  at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597)
 
  at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
 
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 
  at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 
  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 
  at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
 
  at
 org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
 
  at
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-20 Thread Eugen Cepoi
In short, ADD_JARS will add the jar to your driver classpath and also send
it to the workers (similar to what you are doing when you do sc.addJars).

ex: MASTER=master/url ADD_JARS=/path/to/myJob.jar ./bin/spark-shell


You also have SPARK_CLASSPATH var but it does not distribute the code, it
is only used to compute the driver classpath.


BTW, you are not supposed to change the compute_classpath.script


2014-06-20 19:45 GMT+02:00 Shivani Rao raoshiv...@gmail.com:

 Hello Eugene,

 You are right about this. I did encounter the pergmgenspace in the spark
 shell. Can you tell me a little more about ADD_JARS. In order to ensure
 my spark_shell has all required jars, I added the jars to the $CLASSPATH
 in the compute_classpath.sh script. is there another way of doing it?

 Shivani


 On Fri, Jun 20, 2014 at 9:47 AM, Eugen Cepoi cepoi.eu...@gmail.com
 wrote:

 In my case it was due to a case class I was defining in the spark-shell
 and not being available on the workers. So packaging it in a jar and adding
 it with ADD_JARS solved the problem. Note that I don't exactly remember if
 it was an out of heap space exception or pergmen space. Make sure your
 jarsPath is correct.

 Usually to debug this kind of problems I am using the spark-shell (you
 can do the same in your job but its more time constuming to repackage,
 deploy, run, iterate). Try for example
 1) read the lines (without any processing) and count them
 2) apply processing and count



 2014-06-20 17:15 GMT+02:00 Shivani Rao raoshiv...@gmail.com:

 Hello Abhi, I did try that and it did not work

 And Eugene, Yes I am assembling the argonaut libraries in the fat jar.
 So how did you overcome this problem?

 Shivani


 On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi cepoi.eu...@gmail.com
 wrote:


 Le 20 juin 2014 01:46, Shivani Rao raoshiv...@gmail.com a écrit :

 
  Hello Andrew,
 
  i wish I could share the code, but for proprietary reasons I can't.
 But I can give some idea though of what i am trying to do. The job reads a
 file and for each line of that file and processors these lines. I am not
 doing anything intense in the processLogs function
 
  import argonaut._
  import argonaut.Argonaut._
 
 
  /* all of these case classes are created from json strings extracted
 from the line in the processLogs() function
  *
  */
  case class struct1…
  case class struct2…
  case class value1(struct1, struct2)
 
  def processLogs(line:String): Option[(key1, value1)] {…
  }
 
  def run(sparkMaster, appName, executorMemory, jarsPath) {
val sparkConf = new SparkConf()
 sparkConf.setMaster(sparkMaster)
 sparkConf.setAppName(appName)
 sparkConf.set(spark.executor.memory, executorMemory)
  sparkConf.setJars(jarsPath) // This includes all the jars
 relevant jars..
 val sc = new SparkContext(sparkConf)
val rawLogs =
 sc.textFile(hdfs://my-hadoop-namenode:8020:myfile.txt)
 
 rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
 rawLogs.flatMap(processLogs).saveAsTextFile(hdfs://my-hadoop-namenode:8020:outfile.txt)
  }
 
  If I switch to local mode, the code runs just fine, it fails with
 the error I pasted above. In the cluster mode, even writing back the file
 we just read fails
 (rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)
 
  I still believe this is a classNotFound error in disguise
 

 Indeed you are right, this can be the reason. I had similar errors when
 defining case classes in the shell and trying to use them in the RDDs. Are
 you shading argonaut in the fat jar ?

  Thanks
  Shivani
 
 
 
  On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash and...@andrewash.com
 wrote:
 
  Wait, so the file only has four lines and the job running out of
 heap space?  Can you share the code you're running that does the
 processing?  I'd guess that you're doing some intense processing on every
 line but just writing parsed case classes back to disk sounds very
 lightweight.
 
  I
 
 
  On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com
 wrote:
 
  I am trying to process a file that contains 4 log lines (not very
 long) and then write my parsed out case classes to a destination folder,
 and I get the following error:
 
 
  java.lang.OutOfMemoryError: Java heap space
 
  at
 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
 
  at
 org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
 
  at
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
 
  at
 org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
 
  at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 
  at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
  at java.lang.reflect.Method.invoke(Method.java:597

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread Shivani Rao
Hello Andrew,

i wish I could share the code, but for proprietary reasons I can't. But I
can give some idea though of what i am trying to do. The job reads a file
and for each line of that file and processors these lines. I am not doing
anything intense in the processLogs function

import argonaut._
import argonaut.Argonaut._


/* all of these case classes are created from json strings extracted from
the line in the processLogs() function
*
*/
case class struct1…
case class struct2…
case class value1(struct1, struct2)

def processLogs(line:String): Option[(key1, value1)] {…
}

def run(sparkMaster, appName, executorMemory, jarsPath) {
  val sparkConf = new SparkConf()
   sparkConf.setMaster(sparkMaster)
   sparkConf.setAppName(appName)
   sparkConf.set(spark.executor.memory, executorMemory)
sparkConf.setJars(jarsPath) // This includes all the jars relevant
jars..
   val sc = new SparkContext(sparkConf)
  val rawLogs = sc.textFile(hdfs://my-hadoop-namenode:8020:myfile.txt)

rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)

rawLogs.flatMap(processLogs).saveAsTextFile(hdfs://my-hadoop-namenode:8020:outfile.txt)
}

If I switch to local mode, the code runs just fine, it fails with the
error I pasted above. In the cluster mode, even writing back the file we
just read fails
(rawLogs.saveAsTextFile(hdfs://my-hadoop-namenode:8020:writebackForTesting)

I still believe this is a classNotFound error in disguise

Thanks
Shivani



On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash and...@andrewash.com wrote:

 Wait, so the file only has four lines and the job running out of heap
 space?  Can you share the code you're running that does the processing?
  I'd guess that you're doing some intense processing on every line but just
 writing parsed case classes back to disk sounds very lightweight.

 I


 On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com wrote:

 I am trying to process a file that contains 4 log lines (not very long)
 and then write my parsed out case classes to a destination folder, and I
 get the following error:


 java.lang.OutOfMemoryError: Java heap space

 at
 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)

 at
 org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)

 at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)

 at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)

 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

 at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)

 at
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)


 Sadly, there are several folks that have faced this error while trying to
 execute Spark jobs and there are various solutions, none of which work for
 me


 a) I tried (
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736)
 changing the number of partitions in my RDD by using coalesce(8) and the
 error persisted

 b

Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-19 Thread abhiguruvayya
Once you have generated the final RDD before submitting it to reducer try to
repartition the RDD either using coalesce(partitions) or repartition() into
known partitions. 2. Rule of thumb to create number of data partitions (3 *
num_executors * cores_per_executor). 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-0-9-1-java-lang-outOfMemoryError-Java-Heap-Space-tp7861p7970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-18 Thread Shivani Rao
I am trying to process a file that contains 4 log lines (not very long) and
then write my parsed out case classes to a destination folder, and I get
the following error:


java.lang.OutOfMemoryError: Java heap space

at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)

at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)

at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)

at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)

at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)

at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)


Sadly, there are several folks that have faced this error while trying to
execute Spark jobs and there are various solutions, none of which work for
me


a) I tried (
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736)
changing the number of partitions in my RDD by using coalesce(8) and the
error persisted

b)  I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g, and
both did not work

c) I strongly suspect there is a class path error (
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html)
Mainly because the call stack is repetitive. Maybe the OOM error is a
disguise ?

d) I checked that i am not out of disk space and that i do not have too
many open files (ulimit -u  sudo ls /proc/spark_master_process_id/fd |
wc -l)


I am also noticing multiple reflections happening to find the right class
i guess, so it could be class Not Found: error disguising itself as a
memory error.


Here are other threads that are encountering same situation .. but have not
been resolved in any way so far..


http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html


Any help is greatly appreciated. I am especially calling out on creators of
Spark and Databrick folks. This seems like a known bug waiting to happen.


Thanks,

Shivani

-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA


Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space

2014-06-18 Thread Andrew Ash
Wait, so the file only has four lines and the job running out of heap
space?  Can you share the code you're running that does the processing?
 I'd guess that you're doing some intense processing on every line but just
writing parsed case classes back to disk sounds very lightweight.

I


On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao raoshiv...@gmail.com wrote:

 I am trying to process a file that contains 4 log lines (not very long)
 and then write my parsed out case classes to a destination folder, and I
 get the following error:


 java.lang.OutOfMemoryError: Java heap space

 at
 org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)

 at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)

 at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)

 at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)

 at
 org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

 at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)

 at
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

 at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)


 Sadly, there are several folks that have faced this error while trying to
 execute Spark jobs and there are various solutions, none of which work for
 me


 a) I tried (
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736)
 changing the number of partitions in my RDD by using coalesce(8) and the
 error persisted

 b)  I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g, and
 both did not work

 c) I strongly suspect there is a class path error (
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html)
 Mainly because the call stack is repetitive. Maybe the OOM error is a
 disguise ?

 d) I checked that i am not out of disk space and that i do not have too
 many open files (ulimit -u  sudo ls /proc/spark_master_process_id/fd |
 wc -l)


 I am also noticing multiple reflections happening to find the right
 class i guess, so it could be class Not Found: error disguising itself
 as a memory error.


 Here are other threads that are encountering same situation .. but have
 not been resolved in any way so far..



 http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html


 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html


 Any help is greatly appreciated. I am especially calling out on creators
 of Spark and Databrick folks. This seems like a known bug waiting to
 happen.


 Thanks,

 Shivani

 --
 Software Engineer
 Analytics Engineering Team@ Box
 Mountain View, CA



Spark 1.0.0 java.lang.outOfMemoryError: Java Heap Space

2014-06-17 Thread Sguj
I've been trying to figure out how to increase the heap space for my spark
environment in 1.0.0, and all of the things I've found tell me I have export
something in Java Opts, which is deprecated in 1.0.0, or in increase the
spark.executor.memory, which is at 6G. I'm only trying to process about
400-500 mB of text, but I get this error whenever I try to collect:

14/06/17 11:00:21 INFO MapOutputTrackerMasterActor: Asked to send map output
locations for shuffle 0 to sp...@salinger.ornl.gov:50251
14/06/17 11:00:21 INFO MapOutputTrackerMaster: Size of output statuses for
shuffle 0 is 165 bytes
14/06/17 11:00:35 INFO BlockManagerInfo: Added taskresult_14 in memory on
salinger.ornl.gov:50253 (size: 123.7 MB, free: 465.1 MB)
14/06/17 11:00:35 INFO BlockManagerInfo: Added taskresult_13 in memory on
salinger.ornl.gov:50253 (size: 127.7 MB, free: 337.4 MB)
14/06/17 11:00:36 ERROR Utils: Uncaught exception in thread Result resolver
thread-2
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at
org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at
org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at
org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:481)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:53)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)

Any idea how to fix heap space errors in 1.0.0?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-tp7733.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Spark 1.0.0 java.lang.outOfMemoryError: Java Heap Space

2014-06-17 Thread Sguj
I've been trying to figure out how to increase the heap space for my spark
environment in 1.0.0, and all of the things I've found tell me I have export
something in Java Opts, which is deprecated in 1.0.0, or in increase the
spark.executor.memory, which is at 6G. I'm only trying to process about
400-500 mB of text, but I get this error whenever I try to collect: 

java.lang.OutOfMemoryError: Java heap space 
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39) 
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) 
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94) 
at
org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176) 
at
org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63) 
at
org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
 
at
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
 
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
 
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
 
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
at
org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487) 
at
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:481) 
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:53)
 
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2$$anonfun$run$1.apply(TaskResultGetter.scala:47)
 
at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160) 
at
org.apache.spark.scheduler.TaskResultGetter$$anon$2.run(TaskResultGetter.scala:46)
 
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) 
at java.lang.Thread.run(Thread.java:695) 

Any idea how to fix heap space errors in 1.0.0?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-tp7735.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Spark 1.0.0 java.lang.outOfMemoryError: Java Heap Space

2014-06-17 Thread abhiguruvayya
Try repartitioning the RDD using coalsce(int partitions) before performing
any transforms.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-tp7735p7736.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.