Re: Unable to Hive program from Spark Programming Guide (OutOfMemoryError)

2015-03-26 Thread ๏̯͡๏
Resolved. Bold text is FIX. ./bin/spark-submit -v --master yarn-cluster --jars

OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread SLiZn Liu
for this trivial query. Additionally, after restarted the spark-shell and re-run the limit 5 query , the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select(some_field).show() and so forth. I understand that the RDD can

OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Todd Leo
for this trivial query. Additionally, after restarted the spark-shell and re-run the limit 5 query , the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select(some_field).show() and so forth. I understand that the RDD can

Re: OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Ted Yu
, the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select(some_field).show() and so forth. I understand that the RDD can be collected to master hence further transmutations can be applied, as DataFrame has “richer

Unable to Hive program from Spark Programming Guide (OutOfMemoryError)

2015-03-25 Thread ๏̯͡๏
http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables I modified the Hive query but run into same error. ( http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

Re: OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Michael Armbrust
for this trivial query. Additionally, after restarted the spark-shell and re-run the limit 5 query , the df object is returned and can be printed by df.show(), but other APIs fails on OutOfMemoryError, namely, df.count(), df.select(some_field).show() and so forth. I understand that the RDD can

Re: Unable to Hive program from Spark Programming Guide (OutOfMemoryError)

2015-03-25 Thread ๏̯͡๏
Can someone please respond to this ? On Wed, Mar 25, 2015 at 11:18 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: http://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables I modified the Hive query but run into same error. (

OutOfMemoryError during reduce tasks

2015-03-19 Thread Balazs Meszaros
. There is always an OutOfMemoryError at the end of the reduce tasks [2] when I'm using a 1g input while 100m of data don't make a problem. Spark is v1.2.1 (but with v1.3 I'm having the same problem) and it runs on a VM with Ubuntu 14.04, 8G RAM and 4VCPU. (If something else is of interest, please ask

Re: OutofMemoryError: Java heap space

2015-02-12 Thread Yifan LI
Thanks, Kelvin :) The error seems to disappear after I decreased both spark.storage.memoryFraction and spark.shuffle.memoryFraction to 0.2 And, some increase on driver memory. Best, Yifan LI On 10 Feb 2015, at 18:58, Kelvin Chu 2dot7kel...@gmail.com wrote: Since the stacktrace

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread didmar
Ok, I would suggest adding SPARK_DRIVER_MEMORY in spark-env.sh, with a larger amount of memory than the default 512m -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small-training-dataset-tp21598p21618.html Sent from

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread Sean Owen
://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small-training-dataset-tp21598p21620.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-12 Thread poiuytrez
) but the memory is not correctly allocated as we can see on the webui executor page). I am going to file an issue in the bug tracker. Thank you for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
true spark.eventLog.dir gs://-spark/spark-eventlog-base/spark-m spark.executor.memory 83971m spark.yarn.executor.memoryOverhead 83971m I am using spark-submit. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest

Re: OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
spark.eventLog.dir gs://databerries-spark/spark-eventlog-base/spark-m spark.executor.memory 83971m spark.yarn.executor.memoryOverhead 83971m I am using spark-submit. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small

OutOfMemoryError with ramdom forest and small training dataset

2015-02-11 Thread poiuytrez
ActorSystem [sparkDriver] java.lang.OutOfMemoryError: Java heap space That's very weird. Any idea of what's wrong with my configuration? PS : I am running Spark 1.2 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-ramdom-forest-and-small

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Kelvin Chu
Since the stacktrace shows kryo is being used, maybe, you could also try increasing spark.kryoserializer.buffer.max.mb. Hope this help. Kelvin On Tue, Feb 10, 2015 at 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You could try increasing the driver memory. Also, can you be more specific

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Akhil Das
You could try increasing the driver memory. Also, can you be more specific about the data volume? Thanks Best Regards On Mon, Feb 9, 2015 at 3:30 PM, Yifan LI iamyifa...@gmail.com wrote: Hi, I just found the following errors during computation(graphx), anyone has ideas on this? thanks so

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
Hi Akhil, Excuse me, I am trying a random-walk algorithm over a not that large graph(~1GB raw dataset, including ~5million vertices and ~60million edges) on a cluster which has 20 machines. And, the property of each vertex in graph is a hash map, of which size will increase dramatically

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Yifan LI
Yes, I have read it, and am trying to find some way to do that… Thanks :) Best, Yifan LI On 10 Feb 2015, at 12:06, Akhil Das ak...@sigmoidanalytics.com wrote: Did you have a chance to look at this doc http://spark.apache.org/docs/1.2.0/tuning.html

Re: OutofMemoryError: Java heap space

2015-02-10 Thread Akhil Das
Did you have a chance to look at this doc http://spark.apache.org/docs/1.2.0/tuning.html Thanks Best Regards On Tue, Feb 10, 2015 at 4:13 PM, Yifan LI iamyifa...@gmail.com wrote: Hi Akhil, Excuse me, I am trying a random-walk algorithm over a not that large graph(~1GB raw dataset, including

OutofMemoryError: Java heap space

2015-02-09 Thread Yifan LI
Hi, I just found the following errors during computation(graphx), anyone has ideas on this? thanks so much! (I think the memory is sufficient, spark.executor.memory 30GB ) 15/02/09 00:37:12 ERROR Executor: Exception in task 162.0 in stage 719.0 (TID 7653) java.lang.OutOfMemoryError: Java

Getting OutOfMemoryError and Worker.run caught exception

2014-12-17 Thread A.K.M. Ashrafuzzaman
Hi guys, Getting the following errors, 2014-12-17 09:05:02,391 [SocialInteractionDAL.scala:Executor task launch worker-110:20] - --- Inserting into mongo - 2014-12-17 09:05:06,768 [ Logging.scala:Executor task launch worker-110:96] - Exception in task 1.0 in stage

Re: Getting OutOfMemoryError and Worker.run caught exception

2014-12-17 Thread Akhil Das
You can go through this doc for tuning http://spark.apache.org/docs/latest/tuning.html Looks like you are creating a lot of objects and the JVM is spending more time clearing these. If you can paste the code snippet, then it will be easy to understand whats happening. Thanks Best Regards On

broadcast: OutOfMemoryError

2014-12-11 Thread ll
-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands

Re: broadcast: OutOfMemoryError

2014-12-11 Thread Sameer Farooqui
. what is the best way to handle this? should i split the array into smaller arrays before broadcasting, and then combining them locally at each node? thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-OutOfMemoryError-tp20633.html Sent

SparkSql OutOfMemoryError

2014-10-28 Thread Zhanfeng Huo
Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when data is large. So how to tune it ? spark-defaults.conf: spark.shuffle.consolidateFiles true spark.shuffle.manager SORT spark.akka.threads 4 spark.sql.inMemoryColumnarStorage.compressed

Re: SparkSql OutOfMemoryError

2014-10-28 Thread Yanbo Liang
Try to increase the driver memory. 2014-10-28 17:33 GMT+08:00 Zhanfeng Huo huozhanf...@gmail.com: Hi,friends: I use spark(spark 1.1) sql operate data in hive-0.12, and the job fails when data is large. So how to tune it ? spark-defaults.conf: spark.shuffle.consolidateFiles true

Re: OutOfMemoryError with basic kmeans

2014-09-17 Thread st553
/OutOfMemoryError-with-basic-kmeans-tp1651p14472.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-09 Thread Ankur Dave
At 2014-09-05 12:13:18 +0200, Yifan LI iamyifa...@gmail.com wrote: But how to assign the storage level to a new vertices RDD that mapped from an existing vertices RDD, e.g. *val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId, a:Array[VertexId]) = (id,

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-05 Thread Yifan LI
Thank you, Ankur! :) But how to assign the storage level to a new vertices RDD that mapped from an existing vertices RDD, e.g. *val newVertexRDD = graph.collectNeighborIds(EdgeDirection.Out).map{case(id:VertexId, a:Array[VertexId]) = (id, initialHashMap(a))}* the new one will be combined with

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-03 Thread Yifan LI
Hi Ankur, Thanks so much for your advice. But it failed when I tried to set the storage level in constructing a graph. val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK)

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-09-03 Thread Ankur Dave
At 2014-09-03 17:58:09 +0200, Yifan LI iamyifa...@gmail.com wrote: val graph = GraphLoader.edgeListFile(sc, edgesFile, minEdgePartitions = numPartitions).partitionBy(PartitionStrategy.EdgePartition2D).persist(StorageLevel.MEMORY_AND_DISK) Error: java.lang.UnsupportedOperationException: Cannot

Re: OutofMemoryError when generating output

2014-08-28 Thread SK
that I can output to console and to a file? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847p13056.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: OutofMemoryError when generating output

2014-08-28 Thread Burak Yavuz
skrishna...@gmail.com To: u...@spark.incubator.apache.org Sent: Thursday, August 28, 2014 12:45:22 PM Subject: Re: OutofMemoryError when generating output Hi, Thanks for the response. I tried to use countByKey. But I am not able to write the output to console or to a file. Neither collect() nor

OutofMemoryError when generating output

2014-08-26 Thread SK
:744) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OutofMemoryError-when-generating-output-tp12847.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: OutofMemoryError when generating output

2014-08-26 Thread Burak Yavuz
) (fields(11), fields(6)) // extract (month, user_id) }.distinct().countByKey() instead Best, Burak - Original Message - From: SK skrishna...@gmail.com To: u...@spark.incubator.apache.org Sent: Tuesday, August 26, 2014 12:38:00 PM Subject: OutofMemoryError when generating output Hi, I have

Re: [GraphX] how to set memory configurations to avoid OutOfMemoryError GC overhead limit exceeded

2014-08-18 Thread Ankur Dave
On Mon, Aug 18, 2014 at 6:29 AM, Yifan LI iamyifa...@gmail.com wrote: I am testing our application(similar to personalised page rank using Pregel, and note that each vertex property will need pretty much more space to store after new iteration) [...] But when we ran it on larger graph(e.g.

Re: Re: Spark program thows OutOfMemoryError

2014-04-17 Thread Qin Wei
. Thanks for your help! qinwei  From: Andre Bois-Crettez [via Apache Spark User List]Date: 2014-04-16 17:50To:  Qin WeiSubject: Re: Spark program thows OutOfMemoryError Seem you have not enough memory on the spark driver. Hints below : On 2014-04-15 12:10, Qin Wei wrote:      val

Re: Spark program thows OutOfMemoryError

2014-04-17 Thread yypvsxf19870706
email] Subject: Re: Spark program thows OutOfMemoryError Seem you have not enough memory on the spark driver. Hints below : On 2014-04-15 12:10, Qin Wei wrote: val resourcesRDD = jsonRDD.map(arg = arg.get(rid).toString.toLong).distinct // the program crashes at this line

Re: Spark program thows OutOfMemoryError

2014-04-16 Thread Andre Bois-Crettez
Seem you have not enough memory on the spark driver. Hints below : On 2014-04-15 12:10, Qin Wei wrote: val resourcesRDD = jsonRDD.map(arg = arg.get(rid).toString.toLong).distinct // the program crashes at this line of code val bcResources =

Spark program thows OutOfMemoryError

2014-04-15 Thread Qin Wei
-Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://192.168.2.184:7077 Is there anybody who can help me? Thanks very much!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-tp4268.html Sent from

Streaming job having Cassandra query : OutOfMemoryError

2014-04-15 Thread sonyjv
Hi All, I am desperately looking for some help. My cluster is 6 nodes having dual core and 8GB ram each. Spark version running on the cluster is spark-0.9.0-incubating-bin-cdh4. I am getting OutOfMemoryError when running a Spark Streaming job (non-streaming version works fine) which queries

<    1   2